isocirc


Nameisocirc JSON
Version 1.0.4 PyPI version JSON
download
home_pagehttps://github.com/Xinglab/isoCirc
SummaryisoCirc: computational pipeline to identify high-confidence BSJs and full-length circRNA isoforms from isoCirc long-read data
upload_time2021-07-15 10:25:24
maintainer
docs_urlNone
authorYan Gao
requires_python
licenseGLP
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # isoCirc: computational pipeline to identify high-confidence BSJs and full-length circRNA isoforms from isoCirc long-read data
<!-- [![Github All Releases](https://img.shields.io/github/downloads/Xinglab/isoCirc/total.svg?label=Download)](https://github.com/Xinglab/isoCirc/releases) -->
[![Latest Release](https://img.shields.io/github/release/Xinglab/isoCirc.svg?label=Release)](https://github.com/Xinglab/isoCirc/releases/latest)
[![PyPI](https://img.shields.io/pypi/dm/isocirc.svg?label=pip%20install)](https://pypi.python.org/pypi/isocirc)
[![License](https://img.shields.io/badge/License-GPL-black.svg)](https://github.com/Xinglab/isoCirc/blob/master/LICENSE)
[![Published in Nat. Commun.](https://img.shields.io/badge/Published%20in-NatCommun-orange.svg)](https://doi.org/10.1038/s41467-020-20459-8)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4264644.svg)](https://doi.org/10.5281/zenodo.4264644)
[![GitHub Issues](https://img.shields.io/github/issues/Xinglab/isoCirc.svg?label=Issues)](https://github.com/Xinglab/isoCirc/issues)
<!-- [![Published in Bioinformatics](https://img.shields.io/badge/Published%20in-Bioinformatics-purple.svg)](https://doi.org/XXX) -->
<!-- [![Build Status](https://travis-ci.org/yangao07/TideHunter.svg?branch=master)](https://travis-ci.org/yangao07/TideHunter) -->
<!-- [![Anaconda-Server Badge](https://anaconda.org/darts-comp-bio/darts_dnn/badges/version.svg)](https://anaconda.org/darts-comp-bio/darts_dnn) -->
<!-- [![Anaconda-Server Badge](https://anaconda.org/darts-comp-bio/darts_dnn/badges/installer/conda.svg)](https://conda.anaconda.org/darts-comp-bio) -->

## <a name="isocirc"></a>What is isoCirc ?
isoCirc is a long-read sequencing strategy coupled with an integrated 
computational pipeline to characterize full-length circRNA isoforms 
using rolling circle amplification (RCA) followed by long-read sequencing.

## Table of Contents

- [What is isoCirc?](#isocirc)
- [Installation](#install)
  - [Dependencies](#depen)
  - [Install isoCirc with `pip`](#pip)
  - [Install isoCirc from source](#src)
- [Getting started](#start)
- [Input and output](#input_output)
  - [Input files](#input_file)
  - [Output files](#output_file)
    - [1. `isocirc.out`](#isocirc_out)
    - [2. `isocirc.bed`](#isocirc_bed)
    - [3. `isocirc_stats.out`](#isocirc_stats_out)
- [Circular long-read alignment of isoCirc read](#isocirc_plot)
- [FAQ](#FAQ)
- [Contact](#contact)
- [Changelog](#change)

## <a name="install"></a>Installation
### <a name="depen"></a>Dependencies 
isoCirc is dependent on two open-source software packages: [`bedtools`(>= v2.27.0)](https://bedtools.readthedocs.io/) and minimap2 [`minimap2`(>= 2.11)](https://github.com/lh3/minimap2).
Please ensure that these packages are installed before running isoCirc.

### <a name="pip"></a>Install isoCirc with `pip`
**isoCirc** is written with `python`, please use `pip` to install **isoCirc**:
```
pip install isocirc            # first time installation
pip install isocirc --upgrade  # update to the latest version
```
### <a name="src"></a>Install isoCirc from source
Alternatively, you can install **isoCirc** from source:
```
git clone https://github.com/Xinglab/isoCirc.git
cd isoCirc/isoCirc_pipeline && pip install .
```

## <a name="start"></a>Getting started with toy example in `test_data`
```
cd isoCirc/test_data
isocirc -t 1 read_toy.fa chr16_toy.fa chr16_toy.gtf chr16_circRNA_toy.bed output
```


Detailed arguments:
```
usage: isocirc [-h] [-v] [-t THREADS] [--bedtools BEDTOOLS]
               [--minimap2 MINIMAP2] [--short-read short.fa/fq] [--lordec LORDEC]
               [--kmer KMER] [--solid SOLID] [--trf TRF] [--match MATCH]
               [--mismatch MISMATCH] [--indel INDEL] [--match-frac MATCH_FRAC]
               [--indel-frac INDEL_FRAC] [--min-score MIN_SCORE]
               [--max-period MAX_PERIOD] [--min-len MIN_LEN]
               [--min-copy MIN_COPY] [--min-frac MIN_FRAC]
               [--high-max-ratio HIGH_MAX_RATIO]
               [--high-min-ratio HIGH_MIN_RATIO]
               [--high-iden-ratio HIGH_IDEN_RATIO]
               [--high-repeat-ratio HIGH_REPEAT_RATIO]
               [--low-repeat-ratio LOW_REPEAT_RATIO]
               [--cano-motif {GT/AG,all}] [--bsj-xid BSJ_XID]
               [--key-bsj-xid KEY_BSJ_XID] [--min-circ-dis MIN_CIRC_DIS]
               [--rescue-low] [--fsj-xid FSJ_XID] [--key-fsj-xid KEY_FSJ_XID]
               [--Alu ALU] [--flank-len FLANK_LEN] [--all-repeat ALL_REPEAT]
               long.fa/fq ref.fa anno.gtf circRNA.bed/gtf out_dir

isocirc: circular RNA profiling and analysis using long-read sequencing

positional arguments:
  long.fa/fq            Long-read sequencing data generated with isoCirc
  ref.fa                Reference genome sequence file
  anno.gtf              Gene annotation file in GTF format
  circRNA.bed/gtf       circRNA database annotation file in BED or GTF format
                        Use ',' to separate multiple circRNA annotation files
  out_dir               Output directory for final result and temporary files

optional arguments:
  -h, --help            Show this help message and exit
  -v, --version         Show program's version number and exit

General options:
  -t THREADS, --threads THREADS
                        Number of threads to use (default: 8)
  --bedtools BEDTOOLS   Path to bedtools (default: bedtools)
  --minimap2 MINIMAP2   Path to minimap2 (default: minimap2)

Hybrid error-correction with short-read data (LoRDEC):
  --short-read short.fa/fq
                        Short-read data for error correction 
                        Use ',' to connect multiple or paired-end short-read data
                        (default: )
  --lordec LORDEC       Path to lordec-correct (default: lordec-correct)
  --kmer KMER           k-mer size (default: 21)
  --solid SOLID         Solid k-mer abundance threshold (default: 3)

Consensus calling with Tandem Repeats Finder (TRF)):
  --trf TRF             Path to TRF program (default: trf409.legacylinux64)
  --match MATCH         Match score (default: 2)
  --mismatch MISMATCH   Mismatch penalty (default: 7)
  --indel INDEL         Indel penalty (default: 7)
  --match-frac MATCH_FRAC
                        Match probability (default: 80)
  --indel-frac INDEL_FRAC
                        Indel probability (default: 10)
  --min-score MIN_SCORE
                        Minimum alignment score to report (default: 100)
  --max-period MAX_PERIOD
                        Maximum period size to report (default: 2000)

Filtering and mapping of consensus sequences (minimap2):
  --min-len MIN_LEN     Minimum consensus length to keep (default: 30)
  --min-copy MIN_COPY   Minimum copy number of consensus to keep 
                        (default: 2.0)
  --min-frac MIN_FRAC   Minimum fraction of original long read to keep
                        (default: 0.0)
  --high-max-ratio HIGH_MAX_RATIO
                        Maximum mappedLen / consLen ratio for high-quality
                        alignment (default: 1.1)
  --high-min-ratio HIGH_MIN_RATIO
                        Minimum mappedLen /consLen ratio for high-quality
                        alignment (default: 0.9)
  --high-iden-ratio HIGH_IDEN_RATIO
                        Minimum identicalBases/ consLen ratio for high-quality
                        alignment (default: 0.75)
  --high-repeat-ratio HIGH_REPEAT_RATIO
                        Maximum mappedLen / consLen ratio for high-quality
                        self-tandem consensus (default: 0.6)
  --low-repeat-ratio LOW_REPEAT_RATIO
                        Minimum mappedLen / consLen ratio for low-quality
                        self-tandem alignment (default: 1.9)

Identifying high-confidence BSJs and full-length circRNAs:
  --cano-motif {GT/AG,all}
                        Canonical back-splice motif (GT/AG or all three
                        motifs: GT/AG, GC/AG, AT/AC) (default: GT/AG)
  --bsj-xid BSJ_XID     Maximum allowed mis/ins/del for 20-bp exonic sequence
                        flanking BSJ (10-bp each side) (default: 1)
  --key-bsj-xid KEY_BSJ_XID
                        Maximum allowed mis/ins/del for 4-bp exonic sequence
                        flanking BSJ (2-bp each side) (default: 0)
  --min-circ-dis MIN_CIRC_DIS
                        Minimum distance between genomic coordinates of
                        two back-splice sites (default: 150)
  --rescue-low          Use high-mapping quality reads to rescue low-mapping
                        quality reads (default: False)
  --fsj-xid SJ_XID       Maximum allowed mis/ins/del for 20-bp exonic sequence
                        flanking FSJ (10-bp each side) (default: 1)
  --key-fsj-xid KEY_SJ_XID
                        Maximum allowed mis/ins/del for 4-bp exonic sequence
                        flanking FSJ (2-bp each side) (default: 0)

Other options:
  --Alu ALU             Alu repetitive element annotation in BED format
                        (default: )
  --flank-len FLANK_LEN
                        Length of upstream and downstream flanking sequence to
                        search for Alu (default: 500)
  --all-repeat ALL_REPEAT
                        All repetitive element annotation in BED format
                        (default: )
```

## <a name="input_output"></a>Input and output
### <a name="input_file"></a>Input files
isoCirc takes a long read containing multiple copies of a circRNA sequence as input

It also requires a reference genome sequence and gene annotation to be provided.

### <a name="output_file"></a>Output files
isoCirc outputs three result files in a user-specified directory:

| No. | File name         |  Explanation | 
|:---:|   :---            | ---        |
|  1  | isocirc.out       | detailed information of each circRNA isoform with a high-confidence BSJ, in tabular format |
|  2  | isocirc.bed       | bed12 format file of `isocirc.out` |
|  3  | isocirc_stats.out | basic summary stats numbers for `isocirc.out` |

#### <a name="isocirc_out"></a>1. isocirc.out
`isocirc.out` is a 35-column tabular file, each line represents one unique circRNA isoform that has a high-confidence BSJ:

| No. | Column name     |  Explanation | 
|:---:|   :---          | ---        |
|  1  | isoformID       | assigned isoform ID |
|  2  | chrom           | chromosome ID |
|  3  | startCoor0based | start coordinate of circRNA, 0-based |
|  4  | endCoor         | end coordinate of circRNA |
|  5  | geneStrand      | gene strand (+/-) |
|  6  | geneID          | gene ID  |
|  7  | geneName        | gene name  |
|  8  | blockCount      | number of block |
|  9  | blockSize       | size of each block, separated by `,` |
|  10 | blockStarts     | relative start coordinates of each block, separated by `,`. refer to `bed12` format for further details |
|  11 | refMapLen       | total length of circRNA |
|  12 | blockType       | category of each block. E: exon, I: intron, N: intergenic |
|  13 | blockAnno       | detailed annotation for each block, in format: "TransID:E1(100)+I(50)+E2(30)", where TransID is ID of corresponding transcript; E1 and E2 are 1st and 2nd exon of transcript; multiple blocks are separated by `,`; and multiple transcripts of one block are separated by `;` |
|  14 | isKnownSS       | `True` if SS is catalogued in gene annotation, `False` if not, separated by `,` |
|  15 | isKnownFSJ      | `True` if FSJ is catalogued in gene annotation, `False` if not, separated by `,` | 
|  16 | canoFSJMotif    | strandness and canonical motifs of FSJs, e.g., `-GT/AG`, `NA` if FSJ is not canonical, separated by `,`|
|  17 | isHighFSJ       | `True` if alignment around FSJ is high-quality, `False` if not, separated by `,`  |
|  18 | isKnownExon     | `True` if block is a catalogued exon in gene annotation, `False` if not, separated by ‘,’ | 
|  19 | isKnownBSJ      | `True` if BSJ exists in circRNA annotation, `False` if not; multiple circRNA annotations are separated by `,` | 
|  20 | isCanoBSJ       | `True` if BSJ has canonical motif (GT/AG), `False` if not | 
|  21 | canoBSJMotif    | strandness and canonical motif of BSJ, e.g., `-GT/AG`, `NA` if BSJ is not canonical | 
|  22 | isFullLength    | `True` if isoform is considered as `full-length isoform`, `False` if not |
|  23 | BSJCate         | category of BSJs: `FSM`/`NIC`/`NNC`, see explanation below. |
|  24 | FSJCate         | category of FSJs: `FSM`/`NIC`/`NNC` |
|  25 | CDS             | number of bases that are mapped to CDS region |
|  26 | UTR             | number of bases that are mapped to UTR region |
|  27 | lincRNA         | number of bases that are mapped to lincRNA region |
|  28 | antisense       | number of bases that are mapped to antisense region |
|  29 | rRNA            | number of bases that are mapped to rRNA region |
|  30 | Alu             | number of bases that are mapped to Alu element; `NA` if Alu annotation is not provided |
|  31 | allRepeat       | number of bases that are mapped to all repeat elements; `NA` if repeat annotation is not provided |
|  32 | upFlankAlu      | flanking Alu element in upstream; `NA` if none or Alu annotation is not provided |
|  33 | downFlankAlu    | flanking Alu element in downstream; `NA` if none or Alu annotation is not provided |
|  34 | readCount       | number of reads that come from this circRNA isoform |
|  35 | readIDs         | ID of reads that come from this circRNA isoform, separated by `,`  |

#### <a name="isocirc_bed"></a>2. isocirc.bed
`isocirc.bed` is a bed12 format file, each line represents one unique circRNA isoform from `isocirc.out`:

| No. | Column name     |  Explanation | 
|:---:|   :---          | ---        |
|  1  | chrom           | chromosome ID |
|  2  | startCoor0based | start coordinate of circRNA, 0-based |
|  3  | endCoor         | end coordinate of circRNA |
|  4  | isoformName     | name of the circRNA isoform     |   
|  5  | score           | indicates how dark the peak will be displayed in the browser (0-1000), set as 0 by `isoCirc` |
|  6  | strand          | +/- to denote strand  |
|  7  | thickStart      | the starting position at which the feature is drawn thickly, set as 0 by `isoCirc`  |
|  8  | thickEnd        | the ending position at which the feature is drawn thickly, set as 0 by `isoCirc`  |
|  9  | itemRgb         | an RGB value of the form R,G,B (e.g. 255,0,0), set as 0 by `isoCirc` |
|  10 | blockCount      | number of block  |
|  11 | blockSize       | size of each block, separated by `,` |
|  12 | blockStarts     | relative start coordinates of each block, separated by `,`. refer to `bed12` format for further details |

#### <a name="isocirc_stats_out"></a>3. isocirc_stats.out
`isocirc_stats.out` contains 27 basic stats numbers for `isocirc.out`:

| No. | Name           |  Explanation | 
|:---:|   :---         | ---          |
|  1  | Total reads                                       | Number of raw reads in sample |
|  2  | Total reads with cons                             | Number of reads with consensus sequence called |
|  3  | Total mappable reads with cons                    | Number of reads with consensus sequence called, mappable to genome |
|  4  | Total reads with candidate BSJ                    | Number of reads with consensus sequence called, mappable to genome, and with BSJs ("candidate BSJs") |
|  5  | Total candidate BSJs                              | Number of candidate BSJs (circRNA species) |
|  6  | Total known candidate BSJs                        | Number of candidate BSJs reported in existing circRNA BSJ database (circBase / MiOncoCirc) |
|  7  | Total reads with high BSJs                        | Number of reads with consensus sequence called, mappable to genome, and with high-confidence BSJs (based on additional inspection of alignment around BSJs) |
|  8  | Total high BSJs                                   | Number of high-confidence BSJs |
|  9  | Total known high BSJs                             | Number of high-confidence BSJs that are known |
|  10 | Total isoforms with high BSJs                     | Number of circRNA isoforms with high-confidence BSJs |
|  11 | Total isoforms with high BSJs high FSJs           | Number of circRNA isoforms with high-confidence BSJs, and all FSJs are high-confidence (canonical, high-quality alignment around internal splice sites) |
|  12 | Total isoforms with high BSJ known SSs            | Number of circRNA isoforms with high-confidence BSJs, and all SS are known (based on existing transcript GTF annotations for splice sites in linear RNA) |
|  13 | Total isoforms with high BSJs high FSJs known SSs  | Number of circRNA isoforms with high-confidence BSJs, all FSJs are high-confidence, and all SS are known |
|  14 | Total full-length isoforms                        | Number of circRNA isoforms with high-confidence BSJs, and FSJs are high-confidence or all SS are known |
|  15 | Total reads for full-length isoforms              | Number of reads for circRNA isoforms with high-confidence BSJs, and all FSJs arehigh-confidence or all SS are known |
|  16 | Total full-length isoforms with FSM BSJ           | Number of full-length circRNA isoforms with FSM BSJs |
|  17 | Total reads for full-length isoforms with FSM BSJ | Number of reads for full-length circRNA isoforms with FSM BSJs |
|  18 | Total full-length isoforms with NIC BSJ           | Number of full-length circRNA isoforms with NIC BSJs |
|  19 | Total reads for full-length isoforms with NIC BSJ | Number of reads for full-length circRNA isoforms with NIC BSJs |
|  20 | Total full-length isoforms with NNC BSJ           | Number of full-length circRNA isoforms with NNC BSJs |
|  21 | Total reads for full-length isoforms with NNC BSJ | Number of reads for full-length circRNA isoforms with NNC BSJs |
|  22 | Total full-length isoforms with FSM FSJ           | Number of full-length circRNA isoforms with FSM FSJs |
|  23 | Total reads for full-length isoforms with FSM FSJ | Number of reads for full-length circRNA isoforms with FSM FSJs |
|  24 | Total full-length isoforms with NIC FSJ           | Number of full-length circRNA isoforms with NIC internal FSJs |
|  25 | Total reads for full-length isoforms with NIC FSJ | Number of reads for full-length circRNA isoforms with NIC FSJs |
|  26 | Total full-length isoforms with NNC FSJ           | Number of full-length circRNA isoforms with NNC FSJs |
|  27 | Total reads for full-length isoforms with NNC FSJ | Number of reads for full-length circRNA isoforms with NNC FSJs |

 * BSJ:  Back-Splice Junction
 * FSJ:  Forward-Splice Junction
 * FSS:  Forward-Splice Site
 * SS:   Splice Site
 * cons: consensus sequence
 * cano: canonical
 * high: high-confidence (canonical and high-quality alignment around FSJ/BSJ)
 * FSM: Full Splice Match
 * NIC: Novel In Catalog
 * NNC: Novel Not in Catalog

## <a name="isocirc_plot"></a>Circular alignment of isoCirc long read
With the result file generated by `isocirc`, we can visulize the circular alignment of full-length isoCirc reads. Let's use the toy example in the `test_data` again:
```
isocircPlot ./read_toy.fa ./chr16_toy.fa ./chr16_toy.gtf ./output/isocirc.bed ./isocircPlot_toy.list SJ ./output
```
A `.png` file will be generated in the `output` folder indicating how the isoCirc long read is aligned to the reference genome multiple times.

## <a name="FAQ"></a>FAQ
## <a name="contact"></a>Contact

Yan Gao gaoy286@mail.sysu.edu.cn

Yi Xing yi.xing@pennmedicine.upenn.edu

[github issues](https://github.com/Xinglab/isoCirc/issues)




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Xinglab/isoCirc",
    "name": "isocirc",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Yan Gao",
    "author_email": "yangao07@hit.edu.cn",
    "download_url": "https://files.pythonhosted.org/packages/ea/ef/20dc4deeb6b15d86288bd94d9fc3b699bc4f4b026d8aba82e518ea7e1647/isocirc-1.0.4.tar.gz",
    "platform": "",
    "description": "# isoCirc: computational pipeline to identify high-confidence BSJs and full-length circRNA isoforms from isoCirc long-read data\n<!-- [![Github All Releases](https://img.shields.io/github/downloads/Xinglab/isoCirc/total.svg?label=Download)](https://github.com/Xinglab/isoCirc/releases) -->\n[![Latest Release](https://img.shields.io/github/release/Xinglab/isoCirc.svg?label=Release)](https://github.com/Xinglab/isoCirc/releases/latest)\n[![PyPI](https://img.shields.io/pypi/dm/isocirc.svg?label=pip%20install)](https://pypi.python.org/pypi/isocirc)\n[![License](https://img.shields.io/badge/License-GPL-black.svg)](https://github.com/Xinglab/isoCirc/blob/master/LICENSE)\n[![Published in Nat. Commun.](https://img.shields.io/badge/Published%20in-NatCommun-orange.svg)](https://doi.org/10.1038/s41467-020-20459-8)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4264644.svg)](https://doi.org/10.5281/zenodo.4264644)\n[![GitHub Issues](https://img.shields.io/github/issues/Xinglab/isoCirc.svg?label=Issues)](https://github.com/Xinglab/isoCirc/issues)\n<!-- [![Published in Bioinformatics](https://img.shields.io/badge/Published%20in-Bioinformatics-purple.svg)](https://doi.org/XXX) -->\n<!-- [![Build Status](https://travis-ci.org/yangao07/TideHunter.svg?branch=master)](https://travis-ci.org/yangao07/TideHunter) -->\n<!-- [![Anaconda-Server Badge](https://anaconda.org/darts-comp-bio/darts_dnn/badges/version.svg)](https://anaconda.org/darts-comp-bio/darts_dnn) -->\n<!-- [![Anaconda-Server Badge](https://anaconda.org/darts-comp-bio/darts_dnn/badges/installer/conda.svg)](https://conda.anaconda.org/darts-comp-bio) -->\n\n## <a name=\"isocirc\"></a>What is isoCirc ?\nisoCirc is a long-read sequencing strategy coupled with an integrated \ncomputational pipeline to characterize full-length circRNA isoforms \nusing rolling circle amplification (RCA) followed by long-read sequencing.\n\n## Table of Contents\n\n- [What is isoCirc?](#isocirc)\n- [Installation](#install)\n  - [Dependencies](#depen)\n  - [Install isoCirc with `pip`](#pip)\n  - [Install isoCirc from source](#src)\n- [Getting started](#start)\n- [Input and output](#input_output)\n  - [Input files](#input_file)\n  - [Output files](#output_file)\n    - [1. `isocirc.out`](#isocirc_out)\n    - [2. `isocirc.bed`](#isocirc_bed)\n    - [3. `isocirc_stats.out`](#isocirc_stats_out)\n- [Circular long-read alignment of isoCirc read](#isocirc_plot)\n- [FAQ](#FAQ)\n- [Contact](#contact)\n- [Changelog](#change)\n\n## <a name=\"install\"></a>Installation\n### <a name=\"depen\"></a>Dependencies \nisoCirc is dependent on two open-source software packages: [`bedtools`(>= v2.27.0)](https://bedtools.readthedocs.io/) and minimap2 [`minimap2`(>= 2.11)](https://github.com/lh3/minimap2).\nPlease ensure that these packages are installed before running isoCirc.\n\n### <a name=\"pip\"></a>Install isoCirc with `pip`\n**isoCirc** is written with `python`, please use `pip` to install **isoCirc**:\n```\npip install isocirc            # first time installation\npip install isocirc --upgrade  # update to the latest version\n```\n### <a name=\"src\"></a>Install isoCirc from source\nAlternatively, you can install **isoCirc** from source:\n```\ngit clone https://github.com/Xinglab/isoCirc.git\ncd isoCirc/isoCirc_pipeline && pip install .\n```\n\n## <a name=\"start\"></a>Getting started with toy example in `test_data`\n```\ncd isoCirc/test_data\nisocirc -t 1 read_toy.fa chr16_toy.fa chr16_toy.gtf chr16_circRNA_toy.bed output\n```\n\n\nDetailed arguments:\n```\nusage: isocirc [-h] [-v] [-t THREADS] [--bedtools BEDTOOLS]\n               [--minimap2 MINIMAP2] [--short-read short.fa/fq] [--lordec LORDEC]\n               [--kmer KMER] [--solid SOLID] [--trf TRF] [--match MATCH]\n               [--mismatch MISMATCH] [--indel INDEL] [--match-frac MATCH_FRAC]\n               [--indel-frac INDEL_FRAC] [--min-score MIN_SCORE]\n               [--max-period MAX_PERIOD] [--min-len MIN_LEN]\n               [--min-copy MIN_COPY] [--min-frac MIN_FRAC]\n               [--high-max-ratio HIGH_MAX_RATIO]\n               [--high-min-ratio HIGH_MIN_RATIO]\n               [--high-iden-ratio HIGH_IDEN_RATIO]\n               [--high-repeat-ratio HIGH_REPEAT_RATIO]\n               [--low-repeat-ratio LOW_REPEAT_RATIO]\n               [--cano-motif {GT/AG,all}] [--bsj-xid BSJ_XID]\n               [--key-bsj-xid KEY_BSJ_XID] [--min-circ-dis MIN_CIRC_DIS]\n               [--rescue-low] [--fsj-xid FSJ_XID] [--key-fsj-xid KEY_FSJ_XID]\n               [--Alu ALU] [--flank-len FLANK_LEN] [--all-repeat ALL_REPEAT]\n               long.fa/fq ref.fa anno.gtf circRNA.bed/gtf out_dir\n\nisocirc: circular RNA profiling and analysis using long-read sequencing\n\npositional arguments:\n  long.fa/fq            Long-read sequencing data generated with isoCirc\n  ref.fa                Reference genome sequence file\n  anno.gtf              Gene annotation file in GTF format\n  circRNA.bed/gtf       circRNA database annotation file in BED or GTF format\n                        Use ',' to separate multiple circRNA annotation files\n  out_dir               Output directory for final result and temporary files\n\noptional arguments:\n  -h, --help            Show this help message and exit\n  -v, --version         Show program's version number and exit\n\nGeneral options:\n  -t THREADS, --threads THREADS\n                        Number of threads to use (default: 8)\n  --bedtools BEDTOOLS   Path to bedtools (default: bedtools)\n  --minimap2 MINIMAP2   Path to minimap2 (default: minimap2)\n\nHybrid error-correction with short-read data (LoRDEC):\n  --short-read short.fa/fq\n                        Short-read data for error correction \n                        Use ',' to connect multiple or paired-end short-read data\n                        (default: )\n  --lordec LORDEC       Path to lordec-correct (default: lordec-correct)\n  --kmer KMER           k-mer size (default: 21)\n  --solid SOLID         Solid k-mer abundance threshold (default: 3)\n\nConsensus calling with Tandem Repeats Finder (TRF)):\n  --trf TRF             Path to TRF program (default: trf409.legacylinux64)\n  --match MATCH         Match score (default: 2)\n  --mismatch MISMATCH   Mismatch penalty (default: 7)\n  --indel INDEL         Indel penalty (default: 7)\n  --match-frac MATCH_FRAC\n                        Match probability (default: 80)\n  --indel-frac INDEL_FRAC\n                        Indel probability (default: 10)\n  --min-score MIN_SCORE\n                        Minimum alignment score to report (default: 100)\n  --max-period MAX_PERIOD\n                        Maximum period size to report (default: 2000)\n\nFiltering and mapping of consensus sequences (minimap2):\n  --min-len MIN_LEN     Minimum consensus length to keep (default: 30)\n  --min-copy MIN_COPY   Minimum copy number of consensus to keep \n                        (default: 2.0)\n  --min-frac MIN_FRAC   Minimum fraction of original long read to keep\n                        (default: 0.0)\n  --high-max-ratio HIGH_MAX_RATIO\n                        Maximum mappedLen / consLen ratio for high-quality\n                        alignment (default: 1.1)\n  --high-min-ratio HIGH_MIN_RATIO\n                        Minimum mappedLen /consLen ratio for high-quality\n                        alignment (default: 0.9)\n  --high-iden-ratio HIGH_IDEN_RATIO\n                        Minimum identicalBases/ consLen ratio for high-quality\n                        alignment (default: 0.75)\n  --high-repeat-ratio HIGH_REPEAT_RATIO\n                        Maximum mappedLen / consLen ratio for high-quality\n                        self-tandem consensus (default: 0.6)\n  --low-repeat-ratio LOW_REPEAT_RATIO\n                        Minimum mappedLen / consLen ratio for low-quality\n                        self-tandem alignment (default: 1.9)\n\nIdentifying high-confidence BSJs and full-length circRNAs:\n  --cano-motif {GT/AG,all}\n                        Canonical back-splice motif (GT/AG or all three\n                        motifs: GT/AG, GC/AG, AT/AC) (default: GT/AG)\n  --bsj-xid BSJ_XID     Maximum allowed mis/ins/del for 20-bp exonic sequence\n                        flanking BSJ (10-bp each side) (default: 1)\n  --key-bsj-xid KEY_BSJ_XID\n                        Maximum allowed mis/ins/del for 4-bp exonic sequence\n                        flanking BSJ (2-bp each side) (default: 0)\n  --min-circ-dis MIN_CIRC_DIS\n                        Minimum distance between genomic coordinates of\n                        two back-splice sites (default: 150)\n  --rescue-low          Use high-mapping quality reads to rescue low-mapping\n                        quality reads (default: False)\n  --fsj-xid SJ_XID       Maximum allowed mis/ins/del for 20-bp exonic sequence\n                        flanking FSJ (10-bp each side) (default: 1)\n  --key-fsj-xid KEY_SJ_XID\n                        Maximum allowed mis/ins/del for 4-bp exonic sequence\n                        flanking FSJ (2-bp each side) (default: 0)\n\nOther options:\n  --Alu ALU             Alu repetitive element annotation in BED format\n                        (default: )\n  --flank-len FLANK_LEN\n                        Length of upstream and downstream flanking sequence to\n                        search for Alu (default: 500)\n  --all-repeat ALL_REPEAT\n                        All repetitive element annotation in BED format\n                        (default: )\n```\n\n## <a name=\"input_output\"></a>Input and output\n### <a name=\"input_file\"></a>Input files\nisoCirc takes a long read containing multiple copies of a circRNA sequence as input\n\nIt also requires a reference genome sequence and gene annotation to be provided.\n\n### <a name=\"output_file\"></a>Output files\nisoCirc outputs three result files in a user-specified directory:\n\n| No. | File name         |  Explanation | \n|:---:|   :---            | ---        |\n|  1  | isocirc.out       | detailed information of each circRNA isoform with a high-confidence BSJ, in tabular format |\n|  2  | isocirc.bed       | bed12 format file of `isocirc.out` |\n|  3  | isocirc_stats.out | basic summary stats numbers for `isocirc.out` |\n\n#### <a name=\"isocirc_out\"></a>1. isocirc.out\n`isocirc.out` is a 35-column tabular file, each line represents one unique circRNA isoform that has a high-confidence BSJ:\n\n| No. | Column name     |  Explanation | \n|:---:|   :---          | ---        |\n|  1  | isoformID       | assigned isoform ID |\n|  2  | chrom           | chromosome ID |\n|  3  | startCoor0based | start coordinate of circRNA, 0-based |\n|  4  | endCoor         | end coordinate of circRNA |\n|  5  | geneStrand      | gene strand (+/-) |\n|  6  | geneID          | gene ID  |\n|  7  | geneName        | gene name  |\n|  8  | blockCount      | number of block |\n|  9  | blockSize       | size of each block, separated by `,` |\n|  10 | blockStarts     | relative start coordinates of each block, separated by `,`. refer to `bed12` format for further details |\n|  11 | refMapLen       | total length of circRNA |\n|  12 | blockType       | category of each block. E: exon, I: intron, N: intergenic |\n|  13 | blockAnno       | detailed annotation for each block, in format: \"TransID:E1(100)+I(50)+E2(30)\", where TransID is ID of corresponding transcript; E1 and E2 are 1st and 2nd exon of transcript; multiple blocks are separated by `,`; and multiple transcripts of one block are separated by `;` |\n|  14 | isKnownSS       | `True` if SS is catalogued in gene annotation, `False` if not, separated by `,` |\n|  15 | isKnownFSJ      | `True` if FSJ is catalogued in gene annotation, `False` if not, separated by `,` | \n|  16 | canoFSJMotif    | strandness and canonical motifs of FSJs, e.g., `-GT/AG`, `NA` if FSJ is not canonical, separated by `,`|\n|  17 | isHighFSJ       | `True` if alignment around FSJ is high-quality, `False` if not, separated by `,`  |\n|  18 | isKnownExon     | `True` if block is a catalogued exon in gene annotation, `False` if not, separated by \u2018,\u2019 | \n|  19 | isKnownBSJ      | `True` if BSJ exists in circRNA annotation, `False` if not; multiple circRNA annotations are separated by `,` | \n|  20 | isCanoBSJ       | `True` if BSJ has canonical motif (GT/AG), `False` if not | \n|  21 | canoBSJMotif    | strandness and canonical motif of BSJ, e.g., `-GT/AG`, `NA` if BSJ is not canonical | \n|  22 | isFullLength    | `True` if isoform is considered as `full-length isoform`, `False` if not |\n|  23 | BSJCate         | category of BSJs: `FSM`/`NIC`/`NNC`, see explanation below. |\n|  24 | FSJCate         | category of FSJs: `FSM`/`NIC`/`NNC` |\n|  25 | CDS             | number of bases that are mapped to CDS region |\n|  26 | UTR             | number of bases that are mapped to UTR region |\n|  27 | lincRNA         | number of bases that are mapped to lincRNA region |\n|  28 | antisense       | number of bases that are mapped to antisense region |\n|  29 | rRNA            | number of bases that are mapped to rRNA region |\n|  30 | Alu             | number of bases that are mapped to Alu element; `NA` if Alu annotation is not provided |\n|  31 | allRepeat       | number of bases that are mapped to all repeat elements; `NA` if repeat annotation is not provided |\n|  32 | upFlankAlu      | flanking Alu element in upstream; `NA` if none or Alu annotation is not provided |\n|  33 | downFlankAlu    | flanking Alu element in downstream; `NA` if none or Alu annotation is not provided |\n|  34 | readCount       | number of reads that come from this circRNA isoform |\n|  35 | readIDs         | ID of reads that come from this circRNA isoform, separated by `,`  |\n\n#### <a name=\"isocirc_bed\"></a>2. isocirc.bed\n`isocirc.bed` is a bed12 format file, each line represents one unique circRNA isoform from `isocirc.out`:\n\n| No. | Column name     |  Explanation | \n|:---:|   :---          | ---        |\n|  1  | chrom           | chromosome ID |\n|  2  | startCoor0based | start coordinate of circRNA, 0-based |\n|  3  | endCoor         | end coordinate of circRNA |\n|  4  | isoformName     | name of the circRNA isoform     |   \n|  5  | score           | indicates how dark the peak will be displayed in the browser (0-1000), set as 0 by `isoCirc` |\n|  6  | strand          | +/- to denote strand  |\n|  7  | thickStart      | the starting position at which the feature is drawn thickly, set as 0 by `isoCirc`  |\n|  8  | thickEnd        | the ending position at which the feature is drawn thickly, set as 0 by `isoCirc`  |\n|  9  | itemRgb         | an RGB value of the form R,G,B (e.g. 255,0,0), set as 0 by `isoCirc` |\n|  10 | blockCount      | number of block  |\n|  11 | blockSize       | size of each block, separated by `,` |\n|  12 | blockStarts     | relative start coordinates of each block, separated by `,`. refer to `bed12` format for further details |\n\n#### <a name=\"isocirc_stats_out\"></a>3. isocirc_stats.out\n`isocirc_stats.out` contains 27 basic stats numbers for `isocirc.out`:\n\n| No. | Name           |  Explanation | \n|:---:|   :---         | ---          |\n|  1  | Total reads                                       | Number of raw reads in sample |\n|  2  | Total reads with cons                             | Number of reads with consensus sequence called |\n|  3  | Total mappable reads with cons                    | Number of reads with consensus sequence called, mappable to genome |\n|  4  | Total reads with candidate BSJ                    | Number of reads with consensus sequence called, mappable to genome, and with BSJs (\"candidate BSJs\") |\n|  5  | Total candidate BSJs                              | Number of candidate BSJs (circRNA species) |\n|  6  | Total known candidate BSJs                        | Number of candidate BSJs reported in existing circRNA BSJ database (circBase / MiOncoCirc) |\n|  7  | Total reads with high BSJs                        | Number of reads with consensus sequence called, mappable to genome, and with high-confidence BSJs (based on additional inspection of alignment around BSJs) |\n|  8  | Total high BSJs                                   | Number of high-confidence BSJs |\n|  9  | Total known high BSJs                             | Number of high-confidence BSJs that are known |\n|  10 | Total isoforms with high BSJs                     | Number of circRNA isoforms with high-confidence BSJs |\n|  11 | Total isoforms with high BSJs high FSJs           | Number of circRNA isoforms with high-confidence BSJs, and all FSJs are high-confidence (canonical, high-quality alignment around internal splice sites) |\n|  12 | Total isoforms with high BSJ known SSs            | Number of circRNA isoforms with high-confidence BSJs, and all SS are known (based on existing transcript GTF annotations for splice sites in linear RNA) |\n|  13 | Total isoforms with high BSJs high FSJs known SSs  | Number of circRNA isoforms with high-confidence BSJs, all FSJs are high-confidence, and all SS are known |\n|  14 | Total full-length isoforms                        | Number of circRNA isoforms with high-confidence BSJs, and FSJs are high-confidence or all SS are known |\n|  15 | Total reads for full-length isoforms              | Number of reads for circRNA isoforms with high-confidence BSJs, and all FSJs arehigh-confidence or all SS are known |\n|  16 | Total full-length isoforms with FSM BSJ           | Number of full-length circRNA isoforms with FSM BSJs |\n|  17 | Total reads for full-length isoforms with FSM BSJ | Number of reads for full-length circRNA isoforms with FSM BSJs |\n|  18 | Total full-length isoforms with NIC BSJ           | Number of full-length circRNA isoforms with NIC BSJs |\n|  19 | Total reads for full-length isoforms with NIC BSJ | Number of reads for full-length circRNA isoforms with NIC BSJs |\n|  20 | Total full-length isoforms with NNC BSJ           | Number of full-length circRNA isoforms with NNC BSJs |\n|  21 | Total reads for full-length isoforms with NNC BSJ | Number of reads for full-length circRNA isoforms with NNC BSJs |\n|  22 | Total full-length isoforms with FSM FSJ           | Number of full-length circRNA isoforms with FSM FSJs |\n|  23 | Total reads for full-length isoforms with FSM FSJ | Number of reads for full-length circRNA isoforms with FSM FSJs |\n|  24 | Total full-length isoforms with NIC FSJ           | Number of full-length circRNA isoforms with NIC internal FSJs |\n|  25 | Total reads for full-length isoforms with NIC FSJ | Number of reads for full-length circRNA isoforms with NIC FSJs |\n|  26 | Total full-length isoforms with NNC FSJ           | Number of full-length circRNA isoforms with NNC FSJs |\n|  27 | Total reads for full-length isoforms with NNC FSJ | Number of reads for full-length circRNA isoforms with NNC FSJs |\n\n * BSJ:  Back-Splice Junction\n * FSJ:  Forward-Splice Junction\n * FSS:  Forward-Splice Site\n * SS:   Splice Site\n * cons: consensus sequence\n * cano: canonical\n * high: high-confidence (canonical and high-quality alignment around FSJ/BSJ)\n * FSM: Full Splice Match\n * NIC: Novel In Catalog\n * NNC: Novel Not in Catalog\n\n## <a name=\"isocirc_plot\"></a>Circular alignment of isoCirc long read\nWith the result file generated by `isocirc`, we can visulize the circular alignment of full-length isoCirc reads. Let's use the toy example in the `test_data` again:\n```\nisocircPlot ./read_toy.fa ./chr16_toy.fa ./chr16_toy.gtf ./output/isocirc.bed ./isocircPlot_toy.list SJ ./output\n```\nA `.png` file will be generated in the `output` folder indicating how the isoCirc long read is aligned to the reference genome multiple times.\n\n## <a name=\"FAQ\"></a>FAQ\n## <a name=\"contact\"></a>Contact\n\nYan Gao gaoy286@mail.sysu.edu.cn\n\nYi Xing yi.xing@pennmedicine.upenn.edu\n\n[github issues](https://github.com/Xinglab/isoCirc/issues)\n\n\n\n",
    "bugtrack_url": null,
    "license": "GLP",
    "summary": "isoCirc: computational pipeline to identify high-confidence BSJs and full-length circRNA isoforms from isoCirc long-read data",
    "version": "1.0.4",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "acf93bf6cee0108c8e6bf7de143d42f3b128773b837da80f90cabd8a5a1ba6a7",
                "md5": "302a208ac7528f9f55c62e8cee60aca7",
                "sha256": "c58fb3e659d344c96aa8a341c7a5f6589e1d2fe9d6dc069bd056124420eb2cf6"
            },
            "downloads": -1,
            "filename": "isocirc-1.0.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "302a208ac7528f9f55c62e8cee60aca7",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 25453378,
            "upload_time": "2021-07-15T10:24:50",
            "upload_time_iso_8601": "2021-07-15T10:24:50.177943Z",
            "url": "https://files.pythonhosted.org/packages/ac/f9/3bf6cee0108c8e6bf7de143d42f3b128773b837da80f90cabd8a5a1ba6a7/isocirc-1.0.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "eaef20dc4deeb6b15d86288bd94d9fc3b699bc4f4b026d8aba82e518ea7e1647",
                "md5": "3027030201183b32da365a6d1ab3e046",
                "sha256": "01574c0b1a789fd4a3ac3000a08eb8949aea87ae47f94521f9e7761fe18f7562"
            },
            "downloads": -1,
            "filename": "isocirc-1.0.4.tar.gz",
            "has_sig": false,
            "md5_digest": "3027030201183b32da365a6d1ab3e046",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 25293220,
            "upload_time": "2021-07-15T10:25:24",
            "upload_time_iso_8601": "2021-07-15T10:25:24.598915Z",
            "url": "https://files.pythonhosted.org/packages/ea/ef/20dc4deeb6b15d86288bd94d9fc3b699bc4f4b026d8aba82e518ea7e1647/isocirc-1.0.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-07-15 10:25:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "Xinglab",
    "github_project": "isoCirc",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "isocirc"
}
        
Elapsed time: 0.09512s