cLoops2


NamecLoops2 JSON
Version 0.0.5 PyPI version JSON
download
home_pagehttps://github.com/YaqiangCao/cLoops2
SummaryLoop-calling and peak-calling for sequencing-based interaction data, including related analysis utilities.
upload_time2023-07-20 15:37:19
maintainer
docs_urlNone
authorYaqiang Cao
requires_python>=3
license
keywords peak-calling loop-calling hi-trac interaction visualization
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## cLoops2: full stack analysis tool for chromatin interactions
<p align="center">
<img align="center" src="https://github.com/YaqiangCao/cLoops2/blob/master/pngs/FlowChart.png">
</p>   


-------
-------
## Introduction
cLoops2 is an extension of our previous work, [cLoops](https://github.com/YaqiangCao/cLoops). From loop-calling based on assumption-free clustering to a full suite of analysis tools for 3D genomic interaction data, cLoops2 has been adapted specifically for data such as Hi-TrAC/Trac-looping, for which interactions are enriched over the genome through experimental steps. cLoops2 still supports Hi-C -like data, of which the interaction signals are evenly distributed at enzyme cutting sites.  The changes from cLoops to cLoops2 are designed to address challenges around aiming for higher resolutions with the next-generation of genome architecture mapping technologies. 

cLoops2 is designed with respect reference to [bedtools](https://bedtools.readthedocs.io/en/latest/) and [Samtools](http://www.htslib.org/) for command-line style programming. If you have experience with them, you will find cLoops2 easy and efficient to use and combine commands, integrate as steps in your processing pipeline. 

Please refer to our [Hi-TrAC method manuscript]() or [cLoops2 manuscript](https://www.biorxiv.org/content/10.1101/2021.07.20.453068v1) for what cLoops2 can do and show. 

If you use cLoops2 in your research (the idea, the algorithm, the analysis scripts or the supplemental data), please give us a star on the GitHub repo page and cite our paper as follows:    

Preprint bioRxiv: [Yaqiang Cao et al. "cLoops2: a full-stack comprehensive analytical tool for chromatin interactions"](https://www.biorxiv.org/content/10.1101/2021.07.20.453068v1)


-------
-------
## Install
#### 1. Easy way through pip for stable version
Python3 is requried.  
```
pip install cLoops2
```

-------
#### 2. Install from source with test data for latest version
cLoops2 is written purely in Python3 (cLoops was written in Python2). If you are familiar with [conda](https://docs.conda.io/en/latest/), cLoops2 can be installed easily with the following Linux shell commands (also tested well in win10 ubuntu subsystem, MacOS). 
```
# for most updated code, or download the release version 
git clone --depth=1 https://github.com/YaqiangCao/cLoops2
cd cLoops2
conda env create --name cLoops2 --file cLoops2_env.yaml
conda activate cLoops2 
python3 setup.py install
```

Necessary Python3 third-party packages are listed below, all of which can be installed through conda. If you like to install cLoops2 through the old school way ***python setup.py install***, please install the 3rd dependencies first. 
```
tqdm
numpy 
scipy 
pandas
sklearn
seaborn
pyBigWig
matplotlib
joblib
networkx
```

After installation, whenever you want to run cLoops2, just activate the environment with conda: **conda activate cLoops2**. 
Happy peak/loop-calling and have fun exploring all the other kinds of analyses.     


------
------
## cLoops2 Main Functions
Run ***cLoops2*** or ***cLoops2 -h*** can show the main functions of cLoops2 with short descriptions and examples.     
```
An enhanced, accurate and flexible peak/domain/loop-calling and analysis tool 
for 3D genomic interaction data.

Use cLoops2 sub-command -h to see detail options and examples for sub-commands.
Available sub-commands are: 
    qc: quality control of BEDPE files before analysis.
    pre: preprocess input BEDPE files into cLoops2 data.
    update: update cLoops2 data files locations.
    combine: combine multiple cLooops2 data directories.
    dump: convert cLoops2 data files to others (BEDPE, HIC, washU, bedGraph and
          contact matrix)
    estEps: estimate eps using Gaussian mixture models or k-distance plot.
    estRes: estimate reasonable contact matrix resolution based on signal 
            enrichment.
    estDis: estimate significant interactions distance range.
    estSat: estimate sequencing saturation based on contact matrix.
    estSim: estimate similarities among samples based on contact matrix.
    filterPETs: filter PETs based on peaks, loops, singleton mode or knn mode. 
    samplePETs: sample PETs according to specific target size.
    callPeaks: call peaks for ChIP-seq, ATAC-seq, ChIC-seq and CUT&Tag or the 
               3D genomic data such as Trac-looping, Hi-TrAC, HiChIP and more.
    callLoops: call loops for 3D genomic data.
    callDiffLoops: call differentially enriched loops for two datasets. 
    callDomains: call domains for 3D genomic data. 
    plot: plot the interaction matrix, genes, view point plot, 1D tracks, 
          peaks, loops and domains for a specific region. 
    montage: analysis of specific regions, producing Westworld Season 3 -like 
             Rehoboam plot. 
    agg: aggregated feature analysis and plots, features can be peaks, view 
         points, loops and domains.
    quant: quantify peaks, loops and domains.
    anaLoops: anotate loops for target genes.
    findTargets: find target genes of genomic regions through networks from 
                 anaLoops.

Examples:
    cLoops2 qc -f trac_rep1.bedpe.gz,trac_rep2.bedpe,trac_rep3.bedpe.gz \
               -o trac_stat -p 3
    cLoops2 pre -f ../test_GM12878_chr21_trac.bedpe -o trac
    cLoops2 update -d ./trac
    cLoops2 combine -ds ./trac1,./trac2,./trac3 -o trac_combined -keep 1
    cLoops2 dump -d ./trac -o trac -hic
    cLoops2 estEps -d trac -o trac_estEps_gmm -p 10 -method gmm
    cLoops2 estRes -d trac -o trac_estRes -p 10 -bs 25000,5000,1000,200
    cLoops2 estDis -d trac -o trac -plot -bs 1000 
    cLoops2 estSim -ds Trac1,Trac2 -o trac_sim -p 10 -bs 2000 -m pcc -plot
    cLoops2 filterPETs -d trac -peaks trac_peaks.bed -o trac_peaksFiltered -p 10
    cLoops2 samplePETs -d trac -o trac_sampled -t 5000000 -p 10
    cLoops2 callPeaks -d H3K4me3_ChIC -bgd IgG_ChIC -o H3K4me3_cLoops2 -eps 150 \
                      -minPts 10
    cLoops2 callLoops -d Trac -eps 200,500,1000 -minPts 3 -filter -o Trac -w -j \
                      -cut 2000
    cLoops2 callLoops -d HiC -eps 1000,5000,10000 -minPts 10,20,50,100 -w -j \
                      -trans -o HiC_trans 
    cLoops2 callDiffLoops -tloop target_loop.txt -cloop control_loop.txt \
                          -td ./target -cd ./control -o target_diff
    cLoops2 callDomains -d trac -o trac -bs 10000 -ws 200000
    cLoops2 plot -f test/chr21-chr21.ixy -o test -bs 500 -start 34840000 \
                 -end 34895000 -triu -1D -loop test_loops.txt -log \
                 -gtf hg38.gtf -bws ctcf.bw -beds enhancer.bed
    cLoops2 montage -f test/chr21-chr21.ixy -o test -bed test.bed
    cLoops2 agg -d trac -loops trac.loop -peaks trac_peaks.bed \
                -domains hic_domains.bed -bws CTCF.bw,ATAC.bw -p 20 -o trac 
    cLoops2 quant -d trac -peaks trac_peaks.bed -loops trac.loop \
                  -domains trac_domain.txt -p 20 -o trac
    cLoops2 anaLoops -loops test_loop.txt -gtf gene.gtf -net -o test
    cLoops2 findTargets -net test_ep_net.sif -tg test_targets.txt \
                        -bed GWAS.bed -o test 
    More usages and examples are shown when run with cLoops2 sub-command -h.
    

optional arguments:
  -h, --help  show this help message and exit
  -d PREDIR   Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT    Output data directory / file name prefix, default is cLoops2_output.
  -p CPU      CPUs used to run the job, default is 1, set -1 to use all CPUs
              available. Too many CPU could cause out-of-memory problem if there are
              too many PETs.
  -cut CUT    Distance cutoff to filter cis PETs, only keep PETs with distance
              >=cut. Default is 0, no filtering.
  -mcut MCUT  Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v          Show cLoops2 verison number and exit.
  ---         Following are sub-commands specific options. This option just show
              version of cLoops2.

Bug reports are welcome and can be put as issue at github repo or sent to 
caoyaqiang0410@gmail.com or yaqiang.cao@nih.gov. Thank you.
```

------
### 1. Quality control for BEDPE files
Run **cLoops2 qc -h** to see details. 
```
Get the basic quality control statistical information from interaction BEDPE
files.

Example: 
    cLoops2 qc -f trac_rep1.bedpe.gz,trac_rep2.bedpe,trac_rep3.bedpe.gz -p 3 \
               -o trac_stat
    

optional arguments:
  -h, --help  show this help message and exit
  -d PREDIR   Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT    Output data directory / file name prefix, default is cLoops2_output.
  -p CPU      CPUs used to run the job, default is 1, set -1 to use all CPUs
              available. Too many CPU could cause out-of-memory problem if there are
              too many PETs.
  -cut CUT    Distance cutoff to filter cis PETs, only keep PETs with distance
              >=cut. Default is 0, no filtering.
  -mcut MCUT  Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v          Show cLoops2 verison number and exit.
  ---         Following are sub-commands specific options. This option just show
              version of cLoops2.
  -f FNIN     Input BEDPE file(s), .bedpe and .bedpe.gz are both suitable. Multiple
              samples can be assigned as -f A.bedpe.gz,B.bedpe.gz,C.bedpe.gz.
```

------
### 2. Pre-process BEDPE into cLoops2 data
Run **cLoops2 pre -h** to see details. 
```
Preprocess mapped PETs into cLoops2 data files.

Support input file formats:
BEDPE: https://bedtools.readthedocs.io/en/latest/content/general-usage.html 
PAIRS: https://pairtools.readthedocs.io/en/latest/formats.html#pairs

The output directory contains one .json file for the basic statistics of PETs 
information and .ixy files which are coordinates for every PET. The coordinate
files will be used to call peaks, loops or any other analyses implemented in 
cLoops2. For data backup/sharing purposes, the directory can be saved as 
.tar.gz file through tar. If changed and moved location, run 
***cLoops2 update -d*** to update.

Examples:
    1. keep high quality PETs of chromosome chr21
        cLoops2 pre -f trac_rep1.bepee.gz,trac_rep2.bedpe.gz -o trac -c chr21

    2. keep all cis PETs that have distance > 1kb
        cLoops2 pre -f trac_rep1.bedpe.gz,trac_rep2.bedpe.gz -o trac -mapq 0

    

optional arguments:
  -h, --help            show this help message and exit
  -d PREDIR             Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT              Output data directory / file name prefix, default is cLoops2_output.
  -p CPU                CPUs used to run the job, default is 1, set -1 to use all CPUs
                        available. Too many CPU could cause out-of-memory problem if there are
                        too many PETs.
  -cut CUT              Distance cutoff to filter cis PETs, only keep PETs with distance
                        >=cut. Default is 0, no filtering.
  -mcut MCUT            Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v                    Show cLoops2 verison number and exit.
  ---                   Following are sub-commands specific options. This option just show
                        version of cLoops2.
  -f FNIN               Input BEDPE or PAIR file(s), .bedpe and .bedpe.gz are both suitable.
                        Replicates or multiple samples can be assigned as -f A.bedpe.gz,
                        B.bedpe.gz,C.bedpe.gz to get merged PETs.
  -c CHROMS             Argument to process limited set of chromosomes, specify it as chr1,
                        chr2,chr3. Use this option to filter reads from such as
                        chr22_KI270876v1. The default setting is to use the entire set of
                        chromosomes from the data.
  -trans                Whether to parse trans- (inter-chromosomal) PETs. The default is to
                        ignore trans-PETs. Set this flag to pre-process all PETs.
  -mapq MAPQ            MAPQ cutoff to filter raw PETs, default is >=10. This option is not
                        valid when input is .pairs file.
  -format {bedpe,pairs}
                        cLoops2 currently supports BEDPE and PAIRs file format. Default is bedpe.
```

------
### 3. Update cLoops2 data directory
Run **cLoops2 update -h** to see details. 
```
Update cLoops2 data files generated by **cLoops2 pre**.

In the **cLoops2 pre** output directory, there is a .json file annotated with 
the .ixy **absolute paths** and other information. So if the directory is 
moved, or some .ixy files are removed or changed, this command is needed to 
update the paths, otherwise the other analysis modules will not work.

Example:
    cLoops2 update -d ./Trac
    

optional arguments:
  -h, --help  show this help message and exit
  -d PREDIR   Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT    Output data directory / file name prefix, default is cLoops2_output.
  -p CPU      CPUs used to run the job, default is 1, set -1 to use all CPUs
              available. Too many CPU could cause out-of-memory problem if there are
              too many PETs.
  -cut CUT    Distance cutoff to filter cis PETs, only keep PETs with distance
              >=cut. Default is 0, no filtering.
  -mcut MCUT  Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v          Show cLoops2 verison number and exit.
  ---         Following are sub-commands specific options. This option just show
              version of cLoops2.
```

------
### 4. Convert cLoops2 data to others    
Run **cLoops2 dump -h** to see details.   
```
Convert cLoops2 data files to other types. Currently supports BED file,BEDPE 
file, HIC file, washU long-range track, bedGraph file and matrix txt file. 

Converting cLoops2 data to .hic file needs "juicer_tools pre" in the command
line enviroment. 
Converting cLoops2 data to legacy washU browser long-range track needs bgzip
and tabix. Format reference: http://wiki.wubrowse.org/Long-range. 
Converting cLoops2 data to UCSC bigInteract track needs bedToBigBed. Format 
reference: https://genome.ucsc.edu/goldenPath/help/interact.html.
Converting cLoops2 data to bedGraph track will normalize value as RPM 
(reads per million). Run with -bdg_pe flag for 1D data such as ChIC-seq,
ChIP-seq and ATAC-seq. 
Converting cLoops2 data to matrix txt file will need specific resolution. 
The output txt file can be loaded in TreeView for visualization or further
analysis. 

Examples:
    1. convert cLoops2 data to single-end .bed file fo usage of BEDtools or 
       MACS2 for peak-calling with close PETs
        cLoops2 dump -d trac -o trac -bed -mcut 1000

    2. convert cLoops2 data to .bedpe file for usage of BEDtools, only keep 
       PETs distance >1kb and < 1Mb
        cLoops2 dump -d trac -o trac -bedpe -bedpe_ext -cut 1000 -mcut 1000000 

    3. convert cLoops2 data to .hic file to load in juicebox
        cLoops2 dump -d trac -o trac -hic -hic_org hg38 \
                    -hic_res 200000,20000,5000
    
    4. convert cLoops2 data to washU long-range track file, only keep PETs 
       distance > 1kb 
        cLoops2 dump -d trac -o trac -washU -washU_ext 50 -cut 1000
    
    5. convert cLoops2 data to UCSC bigInteract track file 
        cLoops2 dump -d trac -o trac -ucsc -ucsc_cs ./hg38.chrom.sizes 

    6. convert interacting cLoops2 data to bedGraph file with all PETs
        cLoops2 dump -d trac -o trac -bdg -bdg_ext 100

    7. convert 1D cLoops2 data (such as ChIC-seq/ChIP-seq/ATAC-seq) to bedGraph 
       file 
        cLoops2 dump -d trac -o trac -bdg -pe 

    8. convert 3D cLoops2 data (such as Trac-looping) to bedGraph file for peaks
        cLoops2 dump -d trac -o trac -bdg -mcut 1000

    9. convert one region in chr21 to contact matrix correlation matrix txt file 
        cLoops2 dump -d test -mat -o test -mat_res 10000 \
                    -mat_chrom chr21-chr21 -mat_start 36000000 \
                    -mat_end 40000000 -log -corr
    

optional arguments:
  -h, --help            show this help message and exit
  -d PREDIR             Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT              Output data directory / file name prefix, default is cLoops2_output.
  -p CPU                CPUs used to run the job, default is 1, set -1 to use all CPUs
                        available. Too many CPU could cause out-of-memory problem if there are
                        too many PETs.
  -cut CUT              Distance cutoff to filter cis PETs, only keep PETs with distance
                        >=cut. Default is 0, no filtering.
  -mcut MCUT            Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v                    Show cLoops2 verison number and exit.
  ---                   Following are sub-commands specific options. This option just show
                        version of cLoops2.
  -bed                  Convert data to single-end BED file.
  -bed_ext BED_EXT      Extension from the center of the read to both ends for BED file.
                        Default is 50.
  -bedpe                Convert data to BEDPE file.
  -bedpe_ext BEDPE_EXT  Extension from the center of the PET to both ends for BEDPE file.
                        Default is 50.
  -hic                  Convert data to .hic file.
  -hic_org HIC_ORG      Organism required to generate .hic file,default is hg38. If the
                        organism is not available, assign a chrom.size file.
  -hic_res HIC_RES      Resolutions used to generate .hic file. Default is 1000,5000,25000,
                        50000,100000,200000.
  -washU                Convert data to legacy washU browser long-range track.
  -washU_ext WASHU_EXT  Extension from the center of the PET to both ends for washU track.
                        Default is 50.
  -ucsc                 Convert data to UCSC bigInteract file track.
  -ucsc_ext UCSC_EXT    Extension from the center of the PET to both ends for ucsc
                        track. Default is 50.
  -ucsc_cs UCSC_CS      A chrom.sizes file. Can be obtained through fetchChromSizese.
                        Required for -ucsc option.
  -bdg                  Convert data to 1D bedGraph track file.
  -bdg_ext BDG_EXT      Extension from the center of the PET to both ends for
                        bedGraph track. Default is 50.
  -bdg_pe               When converting to bedGraph, argument determines whether to treat PETs
                        as ChIP-seq, ChIC-seq or ATAC-seq paired-end libraries. Default is not.
                        PETs are treated as single-end library for interacting data.
  -mat                  Convert data to matrix txt file with required resolution.
  -mat_res MAT_RES      Bin size/matrix resolution (bp) to generate the contact matrix. 
                        Default is 5000 bp. 
  -mat_chrom CHROM      The chrom-chrom set will be processed. Specify it as chr1-chr1.
  -mat_start START      Start genomic coordinate for the target region. Default will be the
                        smallest coordinate from specified chrom-chrom set.
  -mat_end END          End genomic coordinate for the target region. Default will be the
                        largest coordinate from specified chrom-chrom set.
  -log                  Whether to log transform the matrix. Default is not.
  -m {obs,obs/exp}      The type of matrix, observed matrix or observed/expected matrix, 
                        expected matrix will be generated by shuffling PETs. Default is
                        observed.
  -corr                 Whether to get the correlation matrix. Default is not. 
  -norm                 Whether to normalize the matrix with z-score. Default is not.

```


------
### 5. Estimate eps
Run **cLoops2 estEps -h** to see details. 
```
Estimate key parameter eps. 

Two methods are implemented: 1) unsupervised Gaussian mixture model (gmm), and 
2) k-distance plot (k-dis,-k needed). Gmm is based on the assumption that PETs 
can be classified into self-ligation (peaks) and inter-ligation (loops). K-dis
is based on the k-nearest neighbors distance distribution to find the "knee", 
which is where the distance (eps) between neighbors has a sharp increase along
the k-distance curve. K-dis is the traditional approach literatures, but it is
much more time consuming than gmm, and maybe only fit to small cases. If both 
methods do not give nice plots, please turn to the empirical parameters you 
like, such as 100,200 for ChIP-seq -like data, 5000,1000 for Hi-C and etc.

Examples: 
    1. estimate eps with Gaussian mixture model    
        cLoops2 estEps -d trac -o trac_estEps_gmm -p 10 -method gmm

    2. estimate eps with k-nearest neighbors distance distribution
        cLoops2 estEps -d trac -o trac_estEps_kdis -p 10 -method k-dis -k 5
    

optional arguments:
  -h, --help           show this help message and exit
  -d PREDIR            Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT             Output data directory / file name prefix, default is cLoops2_output.
  -p CPU               CPUs used to run the job, default is 1, set -1 to use all CPUs
                       available. Too many CPU could cause out-of-memory problem if there are
                       too many PETs.
  -cut CUT             Distance cutoff to filter cis PETs, only keep PETs with distance
                       >=cut. Default is 0, no filtering.
  -mcut MCUT           Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v                   Show cLoops2 verison number and exit.
  ---                  Following are sub-commands specific options. This option just show
                       version of cLoops2.
  -fixy FIXY           Assign the .ixy file to estimate eps inside of the whole directory
                       generated by cLoops2 pre. For very large data, especially Hi-C, this
                       option is recommended for chr1 (or the smaller one) to save time.
  -k KNN               The k-nearest neighbors used to draw the k-distance plot. Default is 0
                       (not running), set this when -method k-dis. Suggested 5 for
                       ChIA-PET/Trac-looping data, 20 or 30 for Hi-C like data.
  -method {gmm,k-dis}  Two methods can be chosen to estimate eps. Default is Gmm. See above
                       for difference of the methods.

```

------
### 6. Estimate reasonable contact matrix resolution 
Run **cLoops2 estRes -h** to see details. 
```
Estimate reasonable genome-wide contact matrix resolution based on signal 
enrichment. 

PETs will be assigned to contact matrix bins according to input resolution. A 
bin is marked as [nx,ny], and a PET is assigned to a bin by nx = int((x-s)/bs),
ny = int((y-s)/bs), where s is the minimal coordinate for all PETs and bs is 
the bin size. Self-interaction bins (nx=ny) will be ignored. The bins only 
containing singleton PETs are assumed as noise. 

The output is a PDF plot, for each resolution, a line is separated into two 
parts: 1) dash line indicated linear increased trend of singleton PETs/bins; 2)
solid thicker line indicated non-linear increased trend of higher potential 
signal PETs/bins. The higher the ratio of signal PETs/bins, the easier it it to
find loops in that resolution. The closer to the random line, the higher the 
possibility to observe evenly distributed signals.  

We expect the highest resolution with >=50% PETs are not singletons.

Example:
    cLoops2 estRes -d trac -o trac -bs 10000,5000,1000 -p 20

optional arguments:
  -h, --help   show this help message and exit
  -d PREDIR    Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT     Output data directory / file name prefix, default is cLoops2_output.
  -p CPU       CPUs used to run the job, default is 1, set -1 to use all CPUs
               available. Too many CPU could cause out-of-memory problem if there are
               too many PETs.
  -cut CUT     Distance cutoff to filter cis PETs, only keep PETs with distance
               >=cut. Default is 0, no filtering.
  -mcut MCUT   Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v           Show cLoops2 verison number and exit.
  ---          Following are sub-commands specific options. This option just show
               version of cLoops2.
  -bs BINSIZE  Candidate contact matrix resolution (bin size) to estimate signal
               enrichment. A series of comma-separated values or a single value can
               be used as input. For example,-bs 1000,5000,10000. Default is 5000.

```

------
### 7. Estimate significant interaction distance range
Run **cLoops2 estDis -h** to see details. 
```
Estimate the significant interaction distance limitation by getting the observed
and expected random background of the genomic distance vs interaction frequency.

Example:
    cLoops2 estDis -d trac -o trac -bs 5000 -p 20 -plot
    

optional arguments:
  -h, --help   show this help message and exit
  -d PREDIR    Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT     Output data directory / file name prefix, default is cLoops2_output.
  -p CPU       CPUs used to run the job, default is 1, set -1 to use all CPUs
               available. Too many CPU could cause out-of-memory problem if there are
               too many PETs.
  -cut CUT     Distance cutoff to filter cis PETs, only keep PETs with distance
               >=cut. Default is 0, no filtering.
  -mcut MCUT   Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v           Show cLoops2 verison number and exit.
  ---          Following are sub-commands specific options. This option just show
               version of cLoops2.
  -c CHROMS    Whether to process limited chroms, specify it as chr1,chr2,chr3, 
               default is not. Use this to save time for quite big data.
  -bs BINSIZE  Bin size / contact matrix resolution (bp) to generate the contact
               matrix for estimation, default is 5000 bp.
  -r REPEATS   The reapet times to shuffle PETs to get the mean expected background,
               default is 10.
  -plot        Set to plot the result.
```

------
### 8. Filter PETs    
Run **cLoops2 filterPETs -h** to see details 
```
Filter PETs according to peaks/domains/loops/singletons/KNNs. 

If any end of the PETs overlap with features such as peaks or loops, the PET 
will be kept. Filtering can be done before or after peak/loop-calling. Input 
can be peaks or loops, but should not be be mixed. The -singleton mode is based
on a specified contact matrix resolution, if there is only one PET in the bin, 
the singleton PETs will be filtered. The -knn is based on noise removing step 
of blockDBSCAN. 

Examples:
    1. keep PETs overlapping with peaks
        cLoops2 filterPETs -d trac -peaks peaks.bed -o trac_filtered

    2. keep PETs that do not overlap with any blacklist regions
        cLoops2 filterPETs -d trac -peaks bg.bed -o trac_filtered -iv

    3. keep PETs that overlap with loop anchors
        cLoops2 filterPETs -d trac -loops test_loops.txt -o trac_filtered

    4. keep PETs that both ends overlap with loop anchors
        cLoops2 filterPETs -d trac -loops test_loops.txt -o trac_filtered -both

    5. keep non-singleton PETs based on 1kb contact matrix
        cLoops2 filterPETs -d trac -o trac_filtered -singleton -bs 1000

    6. filter PETs based on blockDBSCAN knn noise removing
        cLoops2 filterPETs -d trac -o trac_filtered -knn -eps 1000 -minPts 5

optional arguments:
  -h, --help      show this help message and exit
  -d PREDIR       Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT        Output data directory / file name prefix, default is cLoops2_output.
  -p CPU          CPUs used to run the job, default is 1, set -1 to use all CPUs
                  available. Too many CPU could cause out-of-memory problem if there are
                  too many PETs.
  -cut CUT        Distance cutoff to filter cis PETs, only keep PETs with distance
                  >=cut. Default is 0, no filtering.
  -mcut MCUT      Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v              Show cLoops2 verison number and exit.
  ---             Following are sub-commands specific options. This option just show
                  version of cLoops2.
  -peaks FBED     BED file of genomic features (such as promoters, enhancers, ChIP-seq,
                  ATAC-seq peaks,TADs) to filter PETs.
  -loops FLOOP    The loop.txt file generated by cLoops2, can be loops or domains, to
                  filter PETs.
  -gap GAP        If the distance between two genomic features is <=gap, the two regions
                  will be combined. Default is 1. Set to >=1.
  -singleton      Whether to use singleton mode to filter PETs. Contact matrix
                  resolution with -bs is required. Singleton PETs in contact matrix bins
                  will be filtered.
  -bs BINSIZE     The contact matrix bin size for -singleton mode filtering. Default is
                  5000.
  -knn            Whether to use noise removing method in blockDBSCAN to filter PETs,
                  -eps and -minPts are required.
  -eps EPS        Same to callPeaks and callLoops, only used to filter PETs for -knn
                  mode. Default is 1000. Only one value is supported.
  -minPts MINPTS  Same to callPeaks and callLoops, only used to filter PETs for -knn
                  mode. Default is 5. Only one value is supported.
  -iv             Whether to only keep PETs not in the assigned regions, behaves like
                  grep -v.
  -both           Whether to only keep PETs that both ends overlap with loop anchors.
                  Default is not.
```

------
### 9. Sampling PETs     
Run **cLoops2 samplePETs -h** to see details.
```
Sampling PETs to target total size. 

If there are multiple sample libraries and the total sequencing depths vary a 
lot, and you want to compare the data fairly, it's better to sample them to 
similar total PETs (either down-sampling or up-sampling), then call peaks/loops
with the same parameters. 

Example:
    cLoops2 samplePETs -d trac -o trac_sampled -tot 5000000 -p 10
    

optional arguments:
  -h, --help  show this help message and exit
  -d PREDIR   Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT    Output data directory / file name prefix, default is cLoops2_output.
  -p CPU      CPUs used to run the job, default is 1, set -1 to use all CPUs
              available. Too many CPU could cause out-of-memory problem if there are
              too many PETs.
  -cut CUT    Distance cutoff to filter cis PETs, only keep PETs with distance
              >=cut. Default is 0, no filtering.
  -mcut MCUT  Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v          Show cLoops2 verison number and exit.
  ---         Following are sub-commands specific options. This option just show
              version of cLoops2.
  -tot TOT    Target total number of PETs.
```

------
### 10. Call peaks for 1D or 3D data
Run **cLoops2 callPeaks -h** to see details.
```
Call peaks based on clustering. 

Well tested work for ChIP-seq, ChIC-seq, ATAC-seq, CUT&RUN -like or the 3D
genomic data such as Hi-TrAC/Trac-looping, ChIA-PET and HiChIP.

There are three steps in the algorithm: 1) cluster the PETs to find 
self-ligation clusters, which are candidate peaks; 2) estimate the significance
of candidate peaks with local background; 3) if given control data, further 
compare candidate peaks to control data. If running multiple clusterings with
separated parameters, the clusters will be combined and callPeaks will output 
the most significant one based on overlaps. 

Key parameters are -eps and -minPts, both are key parameters in the clustering
algorithm blockDBSCAN. Eps indicates the distance that define two points (PETs) 
being neighbors, while minPts indicatess the minial number of points required 
for a cluster to form.  For sharp-peak like data (ATAC-seq, TF ChIC-seq), set
-eps small such as 100 or 150. For broad-peak like data, such as H3K27me3 
ChIP-seq and ChIC-seq, set -eps large as 500 or 1000. 

Eps will affect more than minPts for sensitivity.

Examples:
    1. call peaks for Trac-looping  
        cLoops2 callPeaks -d trac -eps 100 -minPts 10 -o trac -p 10

    2. call peaks for sharp-peak like ChIC-seq without control data
        cLoops2 callPeaks -d ctcf_chic -o ctcf_chic -p 10

    3. call peaks for broad-peak like ChIC-seq with IgG as control
        cLoops2 callPeaks -d H3K27me3 -bgd IgG -eps 500,1000 -minPts 10 \
                          -o H3K27me3 

    4. call peaks for sharp-peak ChIC-seq with linear fitting scaled control 
       data
        cLoops2 callPeaks -d ctcf -bgd IgG -eps 150 -minPts 10 -o ctcf -p 10\
                          -bgm lf

    5. call peaks with sentitive mode to get comprehensive peaks for CUT&TAG
        cLoops2 callPeaks -d H3K27ac -bgd IgG -sen -p 10

    6. filter PETs first and then call peaks for H3K27ac HiChIP, resulting much
       much accurate peaks
        cLoops2 filterPETs -d h3k27ac_hichip -o h3k27ac_hichip_filtered -knn \
                           -eps 500 -minPts 5
        cLoops2 callPeaks -d h3k27ac_hichip_filtered -eps 200,500 -minPts 10 \
                          -p 10

    7. call peaks for interaction data as single-end data 
        cLoops2 callPeaks -d h3k27ac -o h3k27ac -split -eps 200,500 -minPts 10 \
                          -p 10

    8. call differential peaks between WT and KO condition
        cLoops2 callPeaks -d MLL4_WT -bgd MLL4_KO -o MLL4_WTvsKO -p 10
        cLoops2 callPeaks -d MLL4_KO -bgd MLL4_WT -o MLL4_KOvsWT -p 10
    

optional arguments:
  -h, --help          show this help message and exit
  -d PREDIR           Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT            Output data directory / file name prefix, default is cLoops2_output.
  -p CPU              CPUs used to run the job, default is 1, set -1 to use all CPUs
                      available. Too many CPU could cause out-of-memory problem if there are
                      too many PETs.
  -cut CUT            Distance cutoff to filter cis PETs, only keep PETs with distance
                      >=cut. Default is 0, no filtering.
  -mcut MCUT          Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v                  Show cLoops2 verison number and exit.
  ---                 Following are sub-commands specific options. This option just show
                      version of cLoops2.
  -eps EPS            Distance that defines two points (PETs) being neighbors, eps in
                      blockDBSCAN as key parameter, multiple eps can be assigned such as
                      100,200,300 to run multiple clusterings, the results will be combined.
                      For callPeaks, the default is 100,200. If the data show much more broad
                      feature such as H3K27me3 and H3K4me1, increase it to 500,1000 or larger.
                      If expecting both narrow and broad peaks in the data, set -eps 100,200,
                      500,1000.
  -minPts MINPTS      Points required in a cluster, minPts in blockDBSCAN, key parameter,
                      multiple minPts can be assigned such as 3,5 to run multiple
                      clusterings, the results will be combined. For callPeaks, the default
                      is 5. If the data have many reads, increasing minPts such as 10,20.
  -pcut PCUT          Bonferroni corrected poisson p-value cutoff to determine significant
                      peaks. Default is 1e-2.
  -bgd BGD            Assign control data (IgG, Input) directory generated by cLoops2 pre to
                      carry out analysis. Default is no background.
  -bgm {ratio,lf}     How to scale the target data with control data. Available options are
                      'ratio' and 'lf'. 'ratio' is based on library size and 'lf' means
                      linear fitting for control and target candidate peaks nearby regions.
                      Default is 'lf'. The scaling factor estimated by lf usually is a little
                      larger than ratio. In other words, the higher the scaling factor, the
                      less sensitive the results.
  -pseudo PSEUDO      Pseudo counts for local background or control data to estimate the
                      significance of peaks if no PETs/reads in the background. Default is
                      1. Set it larger for noisy data, 0 is recommend for very clean data
                      such as well prepared CUT&Tag.
  -sen                Whether to use sensitive mode to call peaks. Default is not. If only a
                      few peaks were called, while a lot more can be observed
                      from visualization, try this option. Adjust -pcut or filter by
                      yourself to select significant ones.
  -split              Whether to split paired-end as single end data to call peaks. Sometimes
                      works well for Trac-looping and HiChIP.
  -splitExt SPLITEXT  When run with -split, the extension to upstraem and downstream, 
                      default is 50.
```


------
### 11. Call loops
Run **cLoops2 callLoops -h** to see details.
```
Call loops based on clustering. 

Well tested work for Hi-TrAC/TrAC-looping, HiCHiP, ChIA-PET and Hi-C.

Similar to call peaks, there are three main steps in the algorithm: 1) cluster 
the PETs to find inter-ligation clusters, which are candidate loops; 2) 
estimate the significance of candidate loops with permutated local background. 
3) If -hic option not selected, the loop anchors will be checked for peak-like 
features, only peak-like anchors are kept. If running multiple clusterings, 
the clusters will be combined and callLoops will output the most significant 
one based on overlaps. 

Similar to callPeaks, keys parameters are -eps and -minPts. For sharp-peak like 
interaction data, set -eps small such as 500,1000. For broad-peak like data, 
such as H3K27ac HiChIP, set -eps big as 1000,2000. For Hi-C and HiChIP data, 
bigger -minPts is also needed, such as 20,50. 

Please note that the blockDBSCAN implementation in cLoops2 is much more 
sensitive than cDBSCAN in cLoops, so the same parameters can generate quite 
different results. With -hic option, cDBSCAN will be used. 

Examples:
    1. call loops for Hi-TrAC/Trac-looping
        cLoops2 callLoops -d trac -o trac -eps 200,500,1000,2000 -minPts 5 -w -j

    2. call loops for Hi-TrAC/Trac-looping with filtering short distance PETs 
       and using maximal estimated distance cutoff
        cLoops2 callLoops -d trac -o trac -eps 200,500,1000,2000 -minPts 5 \
                          -cut 1000 -max_cut -w -j

    3. call loops for Hi-TrAC/Trac-looping and get the PETs with any end 
       overlapping loop anchors
        cLoops2 callLoops -d trac -o trac -eps 200,500,1000,2000 -minPts 5 -w \
                          -j -filterPETs

    4. call loops for high-resolution Hi-C like data 
        cLoops2 callLoops -d hic -o hic -eps 2000,5000,10000 -minPts 20,50 -w -j
    
    5. call inter-chromosomal loops (for most data, there will be no significant 
       inter-chromosomal loops)
        cLoops2 callLoops -d HiC -eps 5000 -minPts 10,20,50,100,200 -w -j -trans\                          
                          -o HiC_trans
    

optional arguments:
  -h, --help      show this help message and exit
  -d PREDIR       Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT        Output data directory / file name prefix, default is cLoops2_output.
  -p CPU          CPUs used to run the job, default is 1, set -1 to use all CPUs
                  available. Too many CPU could cause out-of-memory problem if there are
                  too many PETs.
  -cut CUT        Distance cutoff to filter cis PETs, only keep PETs with distance
                  >=cut. Default is 0, no filtering.
  -mcut MCUT      Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v              Show cLoops2 verison number and exit.
  ---             Following are sub-commands specific options. This option just show
                  version of cLoops2.
  -eps EPS        Distance that defines two points (PETs) being neighbors, eps in
                  blockDBSCAN as key parameter, multiple eps can be assigned such as
                  200,500,1000,2000 to run multiple clusterings, the results will be
                  combined. No default value, please give the input.
  -minPts MINPTS  Points required in a cluster. minPts in blockDBSCAN is a key parameter.
                  Empirically 5 is good for TFs and histone modification ChIA-PET data
                  and Trac-looping. For data like HiChIP and Hi-C, set it larger, like
                  >=20. The input can be a series, and the final loops will have the
                  PETs>= max(minPts). 
  -plot           Whether to plot estimated inter-ligation and self-ligation PETs
                  distance distribution. Default is not to generate a plot.
  -i              Whether to convert loops to UCSC Interact track to visualize in UCSC.
                  Default is not, set this flag to save.
  -j              Whether to convert loops to 2D feature annotations to visualize in
                  Juicebox. Default is not, set this flag to save.
  -w              Whether to save tracks of loops to visualize in legacy and new washU.
                  Default is not, set this flag to save two files.
  -max_cut        When running cLoops with multiple eps or minPts, multiple distance
                  cutoffs for self-ligation and inter-ligation PETs will be estimated
                  based on the overlaps of anchors. Default option is the minimal one
                  will be used to filter PETs for candidate loop significance test.
                  Set this flag to use maximal one, will speed up for significance test.
  -hic            Whether to use statistical cutoffs for Hi-C to output significant loops.
                  Default is not, set this option to enable. Additionally, with -hic
                  option, there is no check for anchors requiring they looking like peaks.
  -filter         Whether to filter raw PETs according to called loops. The filtered
                  PETs can show clear view of interactions or be used to call loops again.
  -trans          Whether to call trans- (inter-chromosomal) loops. Default is not, set
                  this flag to call. For most common cases, not recommended, only for
                  data there are obvious visible trans loops.
  -emPair         By default eps and minPts combinations will be used to run clustering.
                  With this option, for example eps=500,1000 and minPts=5,10, only (500,5)
                  and (1000,10) as parameters of clustering will be run. Input number of
                  eps and minPts should be same.

```

------
### 12. Call differentially enriched intra-chromosomal loops
Run **cLoops2 callDiffLoops -h** to see details.
```
Call differentially enriched intra-chromosomal loops between two conditions.

Similar to calling peaks with control data, calling differentially enriched 
loops is based on scaled PETs and the Poisson test. There are three main steps 
in the algorithm: 1) merge the overlapped loops, quantify them and their 
permutated local background regions; 2) fit the linear transformation of 
background target interaction density to control background data based on 
MANorm2; therefore, if there are more than than two samples, others can be 
scaled to the reference sample for quantitative comparison; 3) estimate the 
fold change (M) cutoff and average (A) cutoff using the background data with 
the control of FDR, assuming there should no differentially significant 
interactions called from the background data; or using the assigned cutoffs; 4) 
estimate the significance based on the Poisson test for transformed data, both 
for the loop and loop anchors. For example, if transformed PETs for target is 
5, PETs for control is 3 while control nearby permutated background median is 
4, then for the Poisson test, lambda=4-1 is used to test the observed 5 to call
p-value.

Example:
    1. classical usage 
        cLoops2 callDiffLoops -tloop target_loop.txt -cloop control_loop.txt \
                          -td ./target -cd ./control -o target_diff

    2. customize MA cutoffs 
        cLoops2 callDiffLoops -tloop target_loop.txt -cloop control_loop.txt \
                          -td ./target -cd ./control -o target_diff -cutomize \
                          -acut 5 -mcut 0.5
    

optional arguments:
  -h, --help            show this help message and exit
  -d PREDIR             Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT              Output data directory / file name prefix, default is cLoops2_output.
  -p CPU                CPUs used to run the job, default is 1, set -1 to use all CPUs
                        available. Too many CPU could cause out-of-memory problem if there are
                        too many PETs.
  -cut CUT              Distance cutoff to filter cis PETs, only keep PETs with distance
                        >=cut. Default is 0, no filtering.
  -mcut MCUT            Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v                    Show cLoops2 verison number and exit.
  ---                   Following are sub-commands specific options. This option just show
                        version of cLoops2.
  -tloop TLOOP          The target loops in _loop.txt file called by cLoops2.
  -cloop CLOOP          The control loops in _loop.txt file called by cLoops2.
  -td TPRED             The data directory generated by cLoops2 for target data.
  -cd CPRED             The data directory generated by cLoops2 for control data.
  -pcut PCUT            Poisson p-value cutoff to determine significant differentially
                        enriched loops after Bonferroni correction , default is 1e-2.
  -igp                  Ignore Poisson p-value cutoff and only using FDR to control MA plot
                        cutoffs.
  -noPCorr              Do not performe Bonferroni correction of Poisson p-values. Will get
                        more loops. Default is always performing.
  -fdr FDR              FDR cutoff for estimating fold change (M) and average value (A) after
                        normalization with background data. Default is 0.1.
  -j                    Whether to convert loops to 2D feature annotations to visualize in
                        Juicebox. Default is not, set this flag to save.
  -w                    Whether to save tracks of loops to visualize in legacy and new washU.
                        Default is not, set this flag to save two files.
  -customize            Whether to use cutomized cutoffs of MA plot. Defulat is not. If enable
                        -acut and -mcut is needed.
  -cacut CACUT          Average cutoff for MA plot of normalized PETs of loops. Assign when
                        -customize option used.
  -cmcut CMCUT          Fold change cutoff for MA plot of normalized PETs of loops. Assign when
                        -customize option used.
  -vmin VMIN            The minimum value shown in the heatmap and colorbar.
  -vmax VMAX            The maxmum value shown in the heatmap and colorbar.
  -cmap {summer,red,div,cool}
                        The heatmap style. Default is summer.


```

------
### 13. Call domains
Run **cLoops2 callDomains -h** to see details.
```
Call domains for the 3D genomic data based on correlation matrix and local 
segregation score.

Well tested work for Hi-TrAC/Trac-looping data.

Examples:
    1. call Hi-C like TADs
        cLoops2 callDomains -d trac -o trac -bs 5000,10000 -ws 500000 -p 20

    2. call Hi-TrAC/Trac-looping specific small domains
        cLoops2 callDomains -d trac -o trac -bs 1000 -ws 100000 -p 20 

    3. call domains for Hi-C
        cLoops2 callDomains -d hic -o hic -bs 10000 -ws 500000 -hic 

optional arguments:
  -h, --help   show this help message and exit
  -d PREDIR    Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT     Output data directory / file name prefix, default is cLoops2_output.
  -p CPU       CPUs used to run the job, default is 1, set -1 to use all CPUs
               available. Too many CPU could cause out-of-memory problem if there are
               too many PETs.
  -cut CUT     Distance cutoff to filter cis PETs, only keep PETs with distance
               >=cut. Default is 0, no filtering.
  -mcut MCUT   Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v           Show cLoops2 verison number and exit.
  ---          Following are sub-commands specific options. This option just show
               version of cLoops2.
  -bs BINSIZE  Candidate contact matrix resolution (bin size) to call domains. A
               series of values or a single value can be used as input. Default is
               10000. If given multiple values, callDomains will try to call nested
               domains. Samll value may lead to samller domains.
  -ws WINSIZE  The half of the sliding window size used to caculate local correlation,
               Default is 500000 (500kb). Larger value may lead to larger domains.
  -hic         Whether to use cutoffs for Hi-C to output significant domains.
               Default is not. Set this option to enable, cutoffs will be more loose.
```

------
### 14. Plot the interaction as heatmap/scatter/arches, 1D signals, peaks, loops and domains
Run **cLoops2 plot -h** to see details.
```
Plot the interaction data as a heatmap (or arches/scatter) with additional of 
virtual 4C view point, 1D tracks (bigWig files), 1D annotations (peaks, genes) 
and 2D annotations (domains). If -f is not assigned, will just plot profiles 
from bigWig file or bed files.

Examples:
    1. plot the simple square heatmap for a specific region with 1kb resolution 
       with genes 
        cLoops2 plot -f test/chr21-chr21.ixy -o test -bs 1000 -start 34840000 \
                     -end 34895000 -log -gtf test.gtf

    2. plot the upper triangle heatmap with domains such as TAD and CTCF bigWig
       track
        cLoops2 plot -f test/chr21-chr21.ixy -o test_domain -bs 10000 \
                     -start 34600000 -end 35500000 -domains HiC_TAD.bed -log \
                    -triu -bws GM12878_CTCF_chr21.bw

    3. plot the heatmap as upper triangle with 1D signal track and filter the 
       PETs shorter than 1kb
        cLoops2 plot -f test/chr21-chr21.ixy -o test -bs 500 -start 34840000 \
                     -end 34895000 -log -triu -1D -cut 1000

    4. plot the observation/expectation interaction heatmap with 1D signal 
        cLoops2 plot -f test/chr21-chr21.ixy -o test -m obs/exp -1D -triu \ 
                     -bs 500 -start 34840000 -end 34895000

    5. plot the chromosome-wide correlation heatmap 
        cLoops2 plot -f test/chr21-chr21.ixy -o test -corr 

    6. plot upper triangle interaction heatmap together with genes, bigWig 
       files, peaks, loops, domains, control the heatmap scale
        cLoops2 plot -f test/chr21-chr21.ixy -o test -bs 500 -start 34840000 \
                     -end 34895000 -triu -bws ATAC.bw,CTCF.bw -1D \
                     -loop test_loops.txt -beds Enh.bed,Tss.bed \
                     -domains tad.bed -m obs -log -vmin 0.2 -vmax 2 -gtf genes.gtf
    
    7. plot small regions interacting PETs as arches 
        cLoops2 plot -f test/chr21-chr21.ixy -o test -start 46228500 \
                     -end 46290000 -1D -loops gm_loops.txt -arch -aw 0.05

    8. plot small regions interacting PETs as scatter plot
        cLoops2 plot -f test/chr21-chr21.ixy -o test -start 46228500 \
                     -end 46290000 -1D -loops gm_loops.txt -scatter

    9. plot Hi-C compartments and eigenvector  
        cLoops2 plot -f test/chr21-chr21.ixy -o test -bs 100000 -log -corr -eig  

optional arguments:
  -h, --help            show this help message and exit
  -d PREDIR             Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT              Output data directory / file name prefix, default is cLoops2_output.
  -p CPU                CPUs used to run the job, default is 1, set -1 to use all CPUs
                        available. Too many CPU could cause out-of-memory problem if there are
                        too many PETs.
  -cut CUT              Distance cutoff to filter cis PETs, only keep PETs with distance
                        >=cut. Default is 0, no filtering.
  -mcut MCUT            Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v                    Show cLoops2 verison number and exit.
  ---                   Following are sub-commands specific options. This option just show
                        version of cLoops2.
  -f FIXY               Input .ixy file generated by cLoops2 pre. If not assigned, no heatmaps
                        or arches will be shown and -chrom is needed to generate plots similar
                        to IGV or other browser.
  -bs BINSIZE           Bin size/matrix resolution (bp) to generate the contact matrix for
                        plotting, default is 5000 bp.
  -chrom CHROM          Chromosome for the target region if -f is not assigned.
  -start START          Start genomic coordinate for the target region. Default is 0.
  -end END              End genomic coordinate for the target region. Default is to infer
                        from the data.
  -loops FLOOP          The _loop.txt file generated by cLoops2, will be used to plot loops as
                        arches.
  -loopCut LOOPCUT      Only show loops with more than loopCut PETs. Default is 0.
  -domains FDOMAIN      The domains need to annotated in the heatmap such as TADs, should be
                        .bed file.
  -beds BEDS            BED tracks of genomic features to plot above the heatmap, such as
                        promoters and enhancers, track name will be inferred from file name,
                        for example enhancer.bed,promoter.bed.
  -gtf GTF              GTF track of genes to plot above the heatmap.
  -bws BWS              BigWig tracks to plot above the heatmap, track name will be inferred
                        from file name, for example a.bw,b.bw,c.bw. 
  -bwvs BWVS            BigWig tracks y-axis limitations. Default is atuo-determined. Assign
                        as 'vmin,vmax;vmin,vmax;vmin,vmax'. For example, '0,1;;0,1' for three
                        bigWig tracks, as the second track kept atuo-determined. Due to
                        argparse limitation for parsing minus value, also can be assigned as
                        vmax,vmin.
  -bwcs BWCS            BigWig tracks colors. Default is atuo-determined. Assign as 
                        0,1,2 for three bigWig tracks. Values seperated by comma.
  -log                  Whether to log transform the matrix.
  -m {obs,obs/exp}      The type of matrix to plot, observed matrix or observed/expected
                        matrix, expected matrix will be generated by shuffling PETs, default
                        is observed.
  -corr                 Whether to plot the correlation matrix. Default is not. Correlation
                        heatmap will use dark mode color map, used together with obs method.
  -norm                 Whether to normalize the matrix with z-score.
  -triu                 Whether to rotate the heatmap only show upper triangle, default is
                        False.
  -vmin VMIN            The minimum value shown in the heatmap and colorbar.
  -vmax VMAX            The maxmum value shown in the heatmap and colorbar.
  -1D                   Whether to plot the pileup 1D signal for the region. Default is not.
                        Please note, the 1D signal is aggregated from the visualization region.
                        If want to check the signal from each position of all genome/chromosome,
                        use cLoops2 dump -bdg to get the bigWig file.
  -1Dv ONEDV            1D profile y-axis limitations. Default is auto-determined. Assign as
                        vmin,vmax, for example 0,1.
  -virtual4C            Whether to plot the virtual 4C view point 1D signal. Default is not.
                        If assigned, -view_start and -view_end are needed.
  -view_start VIEWSTART
                        Start genomic coordinate for the view point start region, only valid
                        when -vitrutal4C is set, should >=start and <=end.
  -view_end VIEWEND     End genomic coordinate for the view point end region, only valid
                        when -vitrutal4C is set, should >=start and <=end.
  -4Cv VIEWV            Virtual 4C profile y-axis limitations. Default is auto-determined.
                        Assign as vmin,vmax, for example 0,1.
  -arch                 Whether to plot interacting PETs as arches. Default is not. If
                        set, only original one PET one arch will be shown. Usefule to check
                        small region for raw data, especially when heatmap is not clear.
  -aw AW                Line width for each PET in arches plot. Default is 1. Try to
                        change it if too many or few PETs.
  -ac AC                Line color for each PET in arches plot. Default is 4. Try to
                        change it see how many colors are supported by cLoops2.
  -aa AA                Alpha to control arch color saturation. Default is 1.
  -scatter              Whether to plot interacting PETs as scatter dots. Default is not.
                        If set, only original one PET one dot will be shown. Usefule to check
                        raw data, especially when heatmap is not clear that -vmax is too small.
  -ss SS                Dot size for each PET in scatter plot. Default is 1. Try to
                        change it to optimize the plot.
  -sc SC                Dot color for each PET in scatter plot. Default is 0. Try to
                        change it see how many colors are supported by cLoops2.
  -sa SA                Alpha to control dot color saturation. Default is 1.
  -eig                  Whether to plot the PC1 of correlation matirx to show compartments
                        Default is not. Only work well for big regions such as resolution
                        of 100k.
  -eig_r                Whether to flip the PC1 values of -eig. It should be dependend on
                        inactivate or activate histone markers, as actually the PCA values do
                        not have directions, especially comparing different samples.
  -figWidth {4,8}       Figure width. 4 is good to show the plot as half of a A4 figure
                        width and 8 is good to show more wider. Default is 4.


```

------
### 15. Montage analysis for regions of interactions
Run **cLoops2 montage -h** to see details.
```
Montage analysis of specific regions, producing Westworld Season 3 -like 
Rehoboam plot. 

Examples: 
    1. showing all PETs for a gene's promoter and enhancers
        cLoops2 montage -f test/chr21-chr21.ixy -bed test.bed -o test 

    2. showing simplified PETs for a gene's promoter and enhancers
        cLoops2 montage -f test/chr21-chr21.ixy -bed test.bed -o test -simple
    
    3. adjust interacting link width 
        cLoops2 montage -f test/chr21-chr21.ixy -bed test.bed -o test -simple \
                        -ppmw 10
    
    4. showing all PETs for a region, if in the bed file only contains one region
        cLoops2 montage -f test/chr21-chr21.ixy -bed test.bed -o test -ext 0
    

optional arguments:
  -h, --help     show this help message and exit
  -d PREDIR      Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT       Output data directory / file name prefix, default is cLoops2_output.
  -p CPU         CPUs used to run the job, default is 1, set -1 to use all CPUs
                 available. Too many CPU could cause out-of-memory problem if there are
                 too many PETs.
  -cut CUT       Distance cutoff to filter cis PETs, only keep PETs with distance
                 >=cut. Default is 0, no filtering.
  -mcut MCUT     Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v             Show cLoops2 verison number and exit.
  ---            Following are sub-commands specific options. This option just show
                 version of cLoops2.
  -f FIXY        Input .ixy file generated by cLoops2 pre.
  -bed BED       Input .bed file for target regions, 4th columns should be id/name for
                 the region.
  -ext EXT       Up-stream and down-stream extesion of target region length. Default is
                 2. If the input bed already include up/down-stream regions, assign as 0.
  -simple        Whether to only draw the representative interactions between two target
                 regions as one arch, and not include the interactions in extended
                 regions. Default is not, all interactions will be shown as archs..
  -vp VIEWPOINT  Only show interactions with specific regions from all other regions.
                 Name/id (4th column in .bed file) is need. Default is to show all
                 releated interactions. Multiple names/ids can be assigned by seperation
                 of comma.
  -vmin VMIN     The minial scale for 1D pileup data. Default will be inferred from the
                 data.
  -vmax VMAX     The maxmial scale for 1D pileup data. Default will be inferred from the
                 data.
  -ppmw PPMW     Link line width indicator, short for 1 PETs per Million PETs line
                 width, default is 10. Adjust this value when -simple is used. Decrease
                 it if links are too bold and increase it when links are too thin.
  -aw AW         Line width for each PET if -simple is not selected. Default is 1.
  -no1D          Whether to not plot 1D profiles. Default is plot. Set this for Hi-C
                 like data.
```

------
### 16. Aggregation analysis for peaks, loops and domains
Run **cLoops2 agg -h** to see details.
```
Do the aggregation analysis for peaks, loops, view points and domains.

The output figures can be used directly, and the data to generate the plot are 
also saved for further customized analysis. 

For the aggregated peaks analysis,input is a .bed file annotated with the 
coordinates for the target regions/peaks/anchors. Output is a .pdf file 
containing a mean density plot and heatmap and a .txt file for the data. The 
data in the .txt file and plot were normalized to RPM (reads per million).

For the aggregated view points analysis, input is a .bed file annotated with 
coordinates for the target regions/peaks/anchors as view point. Output is a 
.pdf file containing a mean density plot and heatmap and a .txt file for the 
data. The data in the .txt file and plot were normalized to 
log2( RPM (reads per million)+1).

For the aggregated loops analysis, input is a _loops.txt file annotated with 
the coordinates for target loops, similar to the format of BEDPE. Output is a 
.pdf file for mean heatmap and .npz file generated through numpy.savez for all 
loops and nearby regions matrix. The enrichment score (ES) in the plot is 
calculated as: ES = mean( (PETs in loop)/(mean PETs of nearby regions) ). Other 
files except _loops.txt can be used as input, as long as the file contains key 
information in the first columns separated by tabs:
loopId	chrA	startA	endA	chrB	startB	endB	distance
loop-1	chr21	1000	2000	chr21	8000	9000	7000

There is another option for loops analysis, termed as two anchors. Input file is 
same to aggregated loops analysis. The whole region with assigned extesion
between two anchors will be aggregated and 1D profile can show two anchors. The 
analysis could be usefule to study/comapre different classes of anchors and 
combinations, for example, considering CTCT motif directions, all left anchors
CTCF motifs are in positive strand and in negative strand for all right anchors. 
It could be interesting for some loops one anchor only bound by transcription 
factor a and another anchor only bound by transcription b. 

For the aggregated domains analysis, input is a .bed file annotated with the
coordinates for the domains, such as TADs. Output are a .pdf file for the upper 
triangular heatmap and .npz file generated through numpy.savez for all domains 
and nearby region matrix. The enrichment score (ES) in the plot is calculated 
as mean( (two ends both with in domain PETs number)/( only one end in domain 
PETs number) ).

Examples:
    1. show aggregated peaks heatmap and profile 
        cLoops2 agg -d test -peaks peaks.bed -o test -peak_ext 2500 \ 
                    -peak_bins 200 -peak_norm -skipZeros

    2. show aggregated view points and aggregated bigWig signal
        cLoops2 agg -d test -o test -viewPoints test_peaks.bed -bws CTCF.bw 

    3. show aggregated loops heatmap, 1D profile and aggregated bigWig signal
        cLoops2 agg -d test -o test -loops test_loops.txt -bws CTCF.bw -1D \
                    -loop_norm
    
    3. show aggregated loops heatmap, 1D profile and aggregated bigWig signal
       in two anchors mode
        cLoops2 agg -d test -o test -twoAnchors test_loops.txt -bws CTCF.bw -1D \
                    -loop_norm

    4. show aggregated domains heatmap, 1D profile and aggregated bigWig signal
        cLoops2 agg -d test -o test -domains TAD.bed -bws CTCF.bw -1D 
    

optional arguments:
  -h, --help            show this help message and exit
  -d PREDIR             Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT              Output data directory / file name prefix, default is cLoops2_output.
  -p CPU                CPUs used to run the job, default is 1, set -1 to use all CPUs
                        available. Too many CPU could cause out-of-memory problem if there are
                        too many PETs.
  -cut CUT              Distance cutoff to filter cis PETs, only keep PETs with distance
                        >=cut. Default is 0, no filtering.
  -mcut MCUT            Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v                    Show cLoops2 verison number and exit.
  ---                   Following are sub-commands specific options. This option just show
                        version of cLoops2.
  -peaks PEAKF          The .bed file for peaks-centric aggregation analysis.
  -peak_ext PEAK_EXT    The nearby upstream and downstream regions (bp) from the peak center.
                        Default is 5000.
  -peak_bins PEAK_BINS  The bin size for the profile array of peaks. Default is 100.
  -peak_norm            Whether to normalize the data in the peaks profile plot and
                        heatmap with row-wise z-score. Default is not.
  -viewPoints VIEWPOINTF
                        The .bed file for view points -centric aggregation analysis.
  -viewPointUp VIEWPOINTUP
                        The upstream regions included for the aggreaged view points analysis.
                        Default is 100000 bp.
  -viewPointDown VIEWPOINTDOWN
                        The downstream regions included for the aggreaged view points analysis.
                        Default is 100000 bp.
  -viewPointBs VIEWPOINTBS
                        Contact matrix bin size for view points heatmap. Default is 1000 bp. 
  -viewPoint_norm       Whether to normalize the sub-matrix for each loop as divide the mean
                        PETs for the matrix. Default is not.
  -loops LOOPF          The _loop.txt file generated by cLoops2 for loops-centric
                        aggregation analysis. The file first 8 columns are necessary.
  -loop_ext LOOP_EXT    The nearby regions included to plot in the heatmap and calculation of
                        enrichment for aggregation loop analysis, default is 10, should be
                        even number.
  -loop_cut LOOP_CUT    Distance cutoff for loops to filter. Default is 0.
  -loop_norm            Whether to normalize the sub-matrix for each loop as divide the mean
                        PETs for the matrix (except the loop region). Default is not.
  -twoAnchors TWOANCHORSF
                        The similar _loop.txt file generated by cLoops2 for two anchors
                        aggregation analysis. The file first 8 columns are necessary.
  -twoAnchor_ext TWOANCHOR_EXT
                        The nearby regions of fold included to plot in heatmap.
                        Default is 0.1.
  -twoAnchor_vmin TWOANCHOR_VMIN
                        The minimum value shown in the domain heatmap and colorbar.
  -twoAnchor_vmax TWOANCHOR_VMAX
                        The maxmum value shown in the domain heatmap and colorbar.
  -domains DOMAINF      The .bed file annotated the domains such as TADs for aggregated
                        domains-centric analysis.
  -domain_ext DOMAIN_EXT
                        The nearby regions of fold included to plot in heatmap and
                        caculation of enrichment, default is 0.5.
  -domain_vmin DOMAIN_VMIN
                        The minimum value shown in the domain heatmap and colorbar.
  -domain_vmax DOMAIN_VMAX
                        The maxmum value shown in the domain heatmap and colorbar.
  -1D                   Whether to plot the pileup 1D signal for aggregated loops, 
                        aggregated view points or aggregated domains. Default is not.
  -bws BWS              BigWig tracks to plot above the aggregated loops heatmap (or under
                        the aggregated domains heatmap), track name will be inferred from file
                        name, for example a.bw,b.bw,c.bw. 
  -skipZeros            Whether to remove all 0 records. Default is not.

```

------
### 17. Quantification of peaks, loops and domains
Run **cLoops2 quant -h** to see details.
```
Quantify the peaks, loops and domains.  The output file will be the same as
outputs of callPeaks, callLoops and callDomains.

Examples:
    1. quantify peaks 
        cLoops2 quant -d test -peaks peaks.bed -o test 

    2. quantify loops 
        cLoops2 quant -d test -loops test_loops.txt -o test
    
    3. quantify domains 
        cLoops2 quant -d test -domains test_domains.txt -o test

optional arguments:
  -h, --help            show this help message and exit
  -d PREDIR             Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT              Output data directory / file name prefix, default is cLoops2_output.
  -p CPU                CPUs used to run the job, default is 1, set -1 to use all CPUs
                        available. Too many CPU could cause out-of-memory problem if there are
                        too many PETs.
  -cut CUT              Distance cutoff to filter cis PETs, only keep PETs with distance
                        >=cut. Default is 0, no filtering.
  -mcut MCUT            Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v                    Show cLoops2 verison number and exit.
  ---                   Following are sub-commands specific options. This option just show
                        version of cLoops2.
  -peaks PEAKF          The .bed file for peaks-centric quantification.
  -loops LOOPF          The _loop.txt file generated by cLoops2 for loops-centric
                        quantification, as long as there are first 8 columns.
  -domains DOMAINF      The _domains.txt file generated by cLoops2 for domains-centric
                        quantification, as long as there are first 3 columns
  -domain_bs DOMAINBINSIZE
                        Candidate contact matrix resolution (bin size) to quantify domains, 
                        default is 10000. Only one integer is supported.
  -domain_ws DOMAINWINSIZE
                        The half window size used to calculate local correlation to quantify
                        domains. Default is 500000 (500kb).
  -domain_bdg           Whether to save the segregation score ad bedGraph file, default.
                        is not.
```

------
### 18. Annotation of loops to genes 
Run **cLoops2 anaLoops -h** to see details.
```
Annotating loops:
- find the closest TSS for each loop anchors
- merge the loop anchors and classify them as enhancers or promoters based on 
  distance to nearest TSS
- build the interaction networks for merged anchors 
- find the all interacted enhancers/promoters for each promoter  

Basic mode 1: with -gtf, loops will be annotated as enhancer or promoter based 
on distance to nearest gene. If a anchor overlapped with two/multiple promoters
(often seen for close head-to-head genes), all will be reported. If no TSS 
overlaps, then nearest one will be assigned.  

Basic mode 2: with -gtf -net, overlapped anchors will be merged and annoated as 
enhancer or promoter considering distance to genes. For each promoter, all 
linked enhancer and promoter will be shown. If there are more than 3 direct or 
indirect enhancers for a promoter, HITS algorithm will be used to identify one
hub for indirect enhancer and one hub for indirect enhancer. 

Examples:
    1. annotate loops for target gene, basic mode 1
        cLoops2 anaLoops -loops test_loops.txt -gtf genecode.gtf
    
    2. annotate loops for target transcripts (alternative TSS), basic mode 1
        cLoops2 anaLoops -loops test_loops.txt -gtf genecode.gtf -tid
    
    3. find a gene's all linked enhancer or promoter, basic mode 2
        cLoops2 anaLoops -loops test_loops.txt -gtf genecode.gtf -net

optional arguments:
  -h, --help    show this help message and exit
  -d PREDIR     Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT      Output data directory / file name prefix, default is cLoops2_output.
  -p CPU        CPUs used to run the job, default is 1, set -1 to use all CPUs
                available. Too many CPU could cause out-of-memory problem if there are
                too many PETs.
  -cut CUT      Distance cutoff to filter cis PETs, only keep PETs with distance
                >=cut. Default is 0, no filtering.
  -mcut MCUT    Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v            Show cLoops2 verison number and exit.
  ---           Following are sub-commands specific options. This option just show
                version of cLoops2.
  -loops FLOOP  The _loop.txt file generated by cLoops2 callLoops or callDiffLoops.
  -gtf GTF      GTF file annotation for genes.
  -tid          Whether to use transcript id instead of gene id for annotation. Default
                is not.
  -pdis PDIS    Distance limitation for anchor to nearest gene/transcript TSS to define
                as promoter. Default is 2000 bp.
  -net          Whether to use network method to find all enhancer/promoter links based
                on loops. Default is not. In this mode, overlapped anchors will be
                merged and annotated as enhancer/promoter, then for a gene, all linked
                node will be output.
  -gap GAP      When -net is set, the distance for close anchors to merge. Default is 1.

```

------
### 19. Find target genes of genomic regions with cLoops2 anaLoops output
Run **cLoops2 findTargets -h** to see details.
```
Find target genes of genomic regions (peaks, SNPs) through enhancer-promoter 
networks. Output from cLoops2 anaLoops with suffix of _ep_net.sif and
_targets.txt are needed.

Examples:
    1. find target genes of peaks/SNPs
        cLoops2 findTargets -net test_ep_net.sif -tg test_targets.txt \
                            -bed GWAS.bed -o test 

optional arguments:
  -h, --help  show this help message and exit
  -d PREDIR   Assign data directory generated by cLoops2 pre to carry out analysis. 
  -o FNOUT    Output data directory / file name prefix, default is cLoops2_output.
  -p CPU      CPUs used to run the job, default is 1, set -1 to use all CPUs
              available. Too many CPU could cause out-of-memory problem if there are
              too many PETs.
  -cut CUT    Distance cutoff to filter cis PETs, only keep PETs with distance
              >=cut. Default is 0, no filtering.
  -mcut MCUT  Keep the PETs with distance <=mcut. Default is -1, no filtering.
  -v          Show cLoops2 verison number and exit.
  ---         Following are sub-commands specific options. This option just show
              version of cLoops2.
  -net FNET   The _ep_net.sif file generated by cLoops2 anaLoops.
  -tg FTG     The _targets.txt file generated by cLoops2 anaLoops.
  -bed FBED   Find target genes for regions, such as anchors, SNPs or peaks.

```

------
------
## Extended Analysis Application Scripts
The following analysis application scripts are available when cLoops2 is installed. The majority of them can be independently run. The -h option can show example usages and details of parameters. Some of them will be integrated into cLoops sub-programmes if well tested and frequently used. More will be added. 

### File Format Conversion
- [hicpro2bedpe.py](https://github.com/YaqiangCao/cLoops2/blob/master/scripts/hicpro2bedpe.py) : convert HiC-Pro output allValidPairs file to BEDPE file as input of cLoops2.   
- [juicerLong2bedpe.py](https://github.com/YaqiangCao/cLoops2/blob/master/scripts/juicerLong2bedpe.py): convert Juicer output long format interaction file to BEDPE file as input of cLoops2.   
- [getBedpeFBed.py](https://github.com/YaqiangCao/cLoops2/blob/master/scripts/getBedpeFBed.py): convert single-end reads in BED format to paired-end reads in BEDPE format with expected fragment size as input of cLoops2 to call peaks.    

---
### Analysis without plot
- [getDI.py](https://github.com/YaqiangCao/cLoops2/blob/master/scripts/getDI.py): calculate the [Directionality Index](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3356448/) as <img align="center" src="https://latex.codecogs.com/svg.latex?\Large&space;DI_{x}=\frac{(B-A)}{|B+A|}*\frac{(A-E)^2+(B-E)^2}{E},E=\frac{A+B}{2}"/>, where **x** is the bin and **A** is the interaction reads within the region from specific upstream to bin **x**, and **B** is the downstream reads.  

- [getFRiF.py](https://github.com/YaqiangCao/cLoops2/blob/master/scripts/getFRiF.py): calculate the **F**raction of **R**eads **i**n **F**eatures (FRiF), the features could be domains and peaks annotated with .bed file or domains/stripes/loops with .txt file such as the \_loop.txt file.

- [getIS.py](https://github.com/YaqiangCao/cLoops2/blob/master/scripts/getIS.py): calculate the [insulation score](https://www.nature.com/articles/nature20158) with a little modification for the data with output of a bedGraph file, the math formula used is <img align="center" src="https://latex.codecogs.com/svg.latex?\Large&space;IS_{x}=-log2(\frac{I(x-s.x+s)-I(x,x+s),I(x-s,x)}{I(x-s,x+s)})" />, where ***x*** is the genomic location, which can be bins or exact base pair, ***I(x-s,x+s)*** is the interactions/PETs observed in the region from ***x-s*** to ***x+s***, and ***s*** should be set a little large, such as 100kb to observe a good fit for the insulation score and TAD boundaries.  

- [getLocalIDS.py](https://github.com/YaqiangCao/cLoops2/blob/master/scripts/getcLocalIDS.py): calculate the local interaction density score for the data with output a bedGraph file, the math formula used is <img align="center" src="https://latex.codecogs.com/svg.latex?\Large&space;IDS_{x}=\sum_{i=-5}^{5}{\frac{I(x,x_{i})}{N}},i\neq0" />, where ***x*** is the genomic location for the target bin, ***N*** is the total PETs in the target chromosomal, ***I(x,x_i)*** is the observed PETs linking the region bin ***x*** and the ith nearby bin of the same size. 

- [getPETsAno.py](https://github.com/YaqiangCao/cLoops2/blob/master/scripts/getPETsAno.py): get the PETs ratio of enhancer-promoter, enhancer-enhancer, promoter-promoter, enhancer-none, promoter-none, none-none interactions.

- [tracPre.py](https://github.com/YaqiangCao/cLoops2/blob/master/scripts/tracPre.py): pre-process the raw reads of FASTQ files of Trac-looping data to the reference genome and obtain the unique PETs with quality control results.

- [tracPre2.py](https://github.com/YaqiangCao/cLoops2/blob/master/scripts/tracPre2.py): pre-process the raw reads of FASTQ files of Hi-TrAC data to the reference genome and obtain the unique PETs with quality control results.

-----
------
## Input, Intermediate, Output Files
- [.bedpe](#.bedpe)
- [.ixy](#.ixy)
- [_peaks.txt](#_peaks.txt)
- [_loops.txt](#_loops.txt)
- [_dloops.txt](#_dloops.txt)
- [_domains.txt](#_domains.txt)

----
<a name=".bedpe"></a>
### Input .bedpe file 
Mapped PETs in [BEDPE format](http://bedtools.readthedocs.io/en/latest/content/general-usage.html), compressed files with gzip are also accepted, following columns are necessary: chrom1 (1st),start1 (2),end1 (3),chrom2 (4),start2 (5),end2 (6),strand1 (9),strand2 (10). For the column of name or score, "." is accepted. Columns are separated by "\t".
For example as following:
```
chr1	9945	10095	chr1	248946216	248946366	.	.	+	+
chr1	10034	10184	chr1	180987	181137	.	.	+	-
chr1	10286	10436	chr1	181103	181253	.	.	+	-
chr1	10286	10436	chr11	181103	181253	.	.	+	-
chr11	10286	10436	chr1	181103	181253	.	.	+	-
...
```

------
<a name=".ixy"></a>
### Intermediate .ixy file
numpy.array of (x,y) saved to [joblib.dump](https://joblib.readthedocs.io/en/latest/generated/joblib.dump.html) for fast access of the interaction EPTs and contact matrix at any resolution, nearly all cLoops2 related analysis are based on this file type.
```
10099025	10099048
39943889	39943890
18391007	18391853
35502951	35502951
10061555	10061557
...
```

------
<a name="_peaks.txt"></a>
### Output \_peaks.txt file 
column | name | explanation
------ | ---- | ------------
0th | peakId | id for a peak, for example peak\_chr1-chr1-1
1th | chrom | chromosomal for the peak 
2th | start | genomic coordinate of the start site
3th | end | genomic coordinate of the end site 
4th | summit | genomic coordinate of peak summit
5th | length | length of the peak
6th | counts | observed reads number in the peak 
7th | RPKM | RPKM for the reads density in the peak
8th | enrichmentScore | enrichment score for the peak, calculated by observed PETs number divided by the mean PETs number of nearby 10 fold and 20 fold regions
9th | poissonPvalue | Poisson test p-value for the loop after Bonferroni correction
10th | controlCounts| if control data such as input/IgG is assigned, the observed reads number in peak region for control data
11th | controlRPKM | if control data assigned, RPKM for the reads density in the peak region for control data
12th | controlScaledCount | if control data assigned, the scaled expected counts used for Poisson test/enrichment score against control data
13th | enrichmentScoreVsControl | if control data assigned, enrichment score of target vs. control
14th | poissonPvalueVsControl | if control data assigned, Poisson test p-value of target vs. control after Bonferroni correction
15th | significant | 1 or 0, 1 means we think the peak is significant compared to local background and control (if assigned)

------
<a name="_loops.txt"></a>
### Output \_loops.txt file 
column | name | explanation
------ | ---- | ------------
0th | loopId | id for a loop, for example loop\_chr1-chr1-1
1th | chromA | chromosomal for the loop first anchor
2th | startA | genomic coordinate of the start site for the first anchor
3th | endA | genomic coordinate of the end site for the first anchor
4th | chromB | chromosomal for the loop second anchor
5th | startB | genomic coordinate of the start site for the second anchor
6th | endB | genomic coordinate of the end site for the second anchor
7th | distance | distance (bp) between the centers of the anchors for the loop
8th | centerA | genomic coordinate of the center site for the first anchor
9th | centerB | genomic coordinate of the center site for the second anchor
10th | readsA | observed PETs number for the first anchor
11th | readsB | observed PETs number for the second anchor
12th | cis | whether the loop is a intra-chromosomal loop (cis)
13th | PETs | observed PETs number linking the two anchors
14th | density | similarly to that of RPKM (reads per kilobase per million):<img align="center" src="https://latex.codecogs.com/svg.latex?\Large&space;density=\frac{r}{N\times(anchorLengthA+anchorLengthB)}\times10^9" />
15th | enrichmentScore | enrichment score for the loop, calculated by observed PETs number divided by the mean PETs number of nearby permutated regions
16th | P2LL | peak to the lower left, calculated similar to that of Juicer
17th | FDR | false discovery rate for the loop, calculated as the number of permutated regions that there are more observed PETs than the region  
18th | binomalPvalue | binomal test p-value for the loop, updated caculation, different from cLoops
19th | hypergeometricPvalue | hypergeometric test p-value for the loop
20th | poissonPvalue | Poisson test p-value for the loop
21th | xPeakpoissonPvalue | Poisson test p-value for the left anchor potential peak p-value
22th | yPeakpoissonPvalue | Poisson test p-value for the right anchor potential peak p-value
23th | significant | 1 or 0, 1 means we think the loop is significant compared to permutated regions. In cLoops2, only significant loops are written to the file. 

------
<a name="_dloops.txt"></a>
### Output \_dloops.txt file 
column | name | explanation
------ | ---- | ------------
0th | loopId | id for a loop, for example loop\_chr1-chr1-1
1th | chromA | chromosomal for the loop first anchor
2th | startA | genomic coordinate of the start site for the first anchor
3th | endA | genomic coordinate of the end site for the first anchor
4th | chromB | chromosomal for the loop second anchor
5th | startB | genomic coordinate of the start site for the second anchor
6th | endB | genomic coordinate of the end site for the second anchor
7th | distance | distance (bp) between the centers of the anchors for the loop
8th | centerA | genomic coordinate of the center site for the first anchor
9th | centerB | genomic coordinate of the center site for the second anchor
10th | rawTargetAnchorAReads | observed PETs number for the first anchor in target sample 
11th | rawTargetAnchorBReads | observed PETs number for the second anchor in target sample 
12th | rawControlAnchorAReads | observed PETs number for the first anchor in control sample 
13th | rawControlAnchorBReads | observed PETs number for the second anchor in control sample 
14th | scaledTargetAnchorAReads | scaled PETs number for the first anchor in target sample 
15th | scaledTargetAnchorBReads | scaled PETs number for the second anchor in target sample 
16th | rawTargetCounts | raw PETs number for the loop in target sample 
17th | scaledTargetCounts | scaled PETs number for the loop in target sample, fitting to control sample
18th | rawControlCounts | raw PETs number for the loop in control sample 
19th | rawTargetNearbyMedianCounts | raw median PETs number for the loop nearby permutation regions in target sample
20th | scaledTargetNearbyMedianCounts | scaled median PETs number for the loop nearby permutation regions in target sample, fitting to control sample
21th | rawControlNearbyMedianCounts | raw median PETs number for the loop nearby permutation regions in control sample 
22th | rawTargetES | target sample rawTargetCounts/rawTargetNearbyMedianCounts 
23th | rawControlES | control sample rawControlCounts/rawControlNearbyMedianCounts 
24th | targetDensity | raw interaction density in target sample, RPKM
25th | controlDensity | raw interaction density in control sample, RPKM
26th | rawFc | raw fold change of the interaction density, log2(target/control)
27th | scaledFc | scaled fold change of PETs, log2( scaledTargetCounts/rawControlCounts )
28th | poissonPvalue | possion p-value for the significance test after Bonferroni correction
29th | significant | 1 or 0, 1 means we think the loop is significant differentlly enriched

------
<a name="_domains.txt"></a>
### Output \_domains.txt file 
column | name | explanation
------ | ---- | ------------
0th | domainId | id for a domain, for example domain\_0
1th | chrom | chromosomal for the loop first anchor
2th | start | genomic coordinate of the start site for the domain
3th | end | genomic coordinate of the end site for the domain 
4th | length | length of the domain
5th | binSize | bin size used for the matrix to call the domain  
6th | winSize | window size used for the matrix to call the domain  
7th | segregationScore | mean segregation score for all bins within the domain  
8th | totalPETs | number of total PETs in the domain
9th | withinDomainPETs | number of PETs only interacting within the domain
10th | enrichmentScore | (withinDomainPETs) / (totalPETs-withinDomainPETs)
11th | density | similarly to that of RPKM (reads per kilobase per million):<img align="center" src="https://latex.codecogs.com/svg.latex?\Large&space;density=\frac{withinDomainPETs}{(libraryTotalPETs)\times(domainLength)}\times10^9" />

------
<a name="_loopsGtfAno.txt"></a>
### Output \_loopsGtfAno.txt file 
column | name | explanation
------ | ---- | ------------
0th | loopId  | loopId from input file
1th | typeAnchorA  | annotated type of anchor a (left anchor), enhancer or promoter
2th | typeAnchorB  | annotated type of anchor b (right anchor)
3th | nearestDistanceToGeneAnchorA  | distance of anchor a to nearest TSS 
4th | nearestDistanceToGeneAnchorB  | distance of anchor b to nearest TSS 
5th | nearestTargetGeneAnchorA  | anchor a nearest TSS gene, for example chr21:34836286-34884882\|+\|AP000331.1 (named by rules of chrom:start-end\|strand\|geneName). If a promoter overlaps two head-to-head genes, all genes will be reported by seperation of a comma.
6th | nearestTargetGeneAnchorB  | anchor b nearest TSS gene

------
<a name="_mergedAnchors.txt"></a>
### Output \_mergedAnchors.txt file 
column | name | explanation
------ | ---- | ------------
0th | anchorId  | id for merged anchors. For example, chr21:14025126-14026192\|Promoter (named by the rule of: chrom:start-end\|type)
1th | chrom  | chromosome
2th | start  | start
3th | end  | end
4th | type  | annotated type for the anchor, enhancer or promoter
5th | nearestDistanceToTSS  | distance of anchor a to nearest TSS
6th | nearestGene  | nearest gene name. If a promoter overlaps two head-to-head genes, all genes will be reported by seperation of a comma.    
7th | nearestGeneLoc | neart gene information. For example, chr21:34787801-35049344\|-\|RUNX1 (named by the rule of: chrom:start-end\|strand\|name). If a promoter overlaps two head-to-head genes, all genes will be reported by seperation of a comma.    

------
<a name="_loop2anchors.txt"></a>
### Output \_loop2anchors.txt file 
column | name | explanation
------ | ---- | ------------
0th | loopId  | loopId from input file
1th | mergedAnchorA  | original anchor a (left anchor) to new merged anchor id
2th | mergedAnchorB  | original anchor b (right anchor) to new merged anchor id

------
<a name="_targets.txt"></a>
### Output \_targets.txt file 
column | name | explanation
------ | ---- | ------------
0th | promoter  | annotated anchors that overlapped or very close to gene's transcription start site. For example, chr21:35043062-35051895\|Promoter (named by the rule of: chrom:start-end\|Promoter).
1th | PromoterTarget  | promoter target genes. If a promoter is shared by multiple genes, all genes will be reported and seperated by comma. For example, chr21:34787801-35049344\|-\|RUNX1 (named by the rule of: chorm:start-end\|strand\|name.
2th | directEnhancer  | enhancers that directly looping with target promoter. Multiple enhancers will be reported and seperated by comma. For example, chr21:35075636-35077527\|Enhancer,chr21:35026356-35028520\|Enhancer,chr21:34801302-34805056\|Enhancer.
3th | indirectEnhancer  | enhancers that indirectly looping with target promoter, by enhancer-enhancer-promoter or enhancer-promoter-promoter. Multiple enhancers will be reported and seperated by comma.
4th | directPromoter  | other promoters directly looping with target promoter. 
5th | indirectPromoter | other promoters indirectly looping with target promoter, by promoter-enhancer-promoter or promoter-promoter-promoter. 
6th | directEnhancerHub | hub of direct enhancer. If there are more than 2 direct enhancers, using HITS algorithm to find the most linked one and report. 
7th | indirectEnhancerHub | hub of indirect enhancer. If there are more than 2 indirect enhancers, using HITS algorithm to find the most linked one and report. 


--------
--------
## cLoops2 citations

--------
--------
## cLoops2 updates
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/YaqiangCao/cLoops2",
    "name": "cLoops2",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3",
    "maintainer_email": "",
    "keywords": "peak-calling loop-calling Hi-Trac interaction visualization",
    "author": "Yaqiang Cao",
    "author_email": "caoyaqiang0410@gmail.com",
    "download_url": "",
    "platform": null,
    "description": "## cLoops2: full stack analysis tool for chromatin interactions\n<p align=\"center\">\n<img align=\"center\" src=\"https://github.com/YaqiangCao/cLoops2/blob/master/pngs/FlowChart.png\">\n</p>   \n\n\n-------\n-------\n## Introduction\ncLoops2 is an extension of our previous work, [cLoops](https://github.com/YaqiangCao/cLoops). From loop-calling based on assumption-free clustering to a full suite of analysis tools for 3D genomic interaction data, cLoops2 has been adapted specifically for data such as Hi-TrAC/Trac-looping, for which interactions are enriched over the genome through experimental steps. cLoops2 still supports Hi-C -like data, of which the interaction signals are evenly distributed at enzyme cutting sites.  The changes from cLoops to cLoops2 are designed to address challenges around aiming for higher resolutions with the next-generation of genome architecture mapping technologies. \n\ncLoops2 is designed with respect reference to [bedtools](https://bedtools.readthedocs.io/en/latest/) and [Samtools](http://www.htslib.org/) for command-line style programming. If you have experience with them, you will find cLoops2 easy and efficient to use and combine commands, integrate as steps in your processing pipeline. \n\nPlease refer to our [Hi-TrAC method manuscript]() or [cLoops2 manuscript](https://www.biorxiv.org/content/10.1101/2021.07.20.453068v1) for what cLoops2 can do and show. \n\nIf you use cLoops2 in your research (the idea, the algorithm, the analysis scripts or the supplemental data), please give us a star on the GitHub repo page and cite our paper as follows:    \n\nPreprint bioRxiv: [Yaqiang Cao et al. \"cLoops2: a full-stack comprehensive analytical tool for chromatin interactions\"](https://www.biorxiv.org/content/10.1101/2021.07.20.453068v1)\n\n\n-------\n-------\n## Install\n#### 1. Easy way through pip for stable version\nPython3 is requried.  \n```\npip install cLoops2\n```\n\n-------\n#### 2. Install from source with test data for latest version\ncLoops2 is written purely in Python3 (cLoops was written in Python2). If you are familiar with [conda](https://docs.conda.io/en/latest/), cLoops2 can be installed easily with the following Linux shell commands (also tested well in win10 ubuntu subsystem, MacOS). \n```\n# for most updated code, or download the release version \ngit clone --depth=1 https://github.com/YaqiangCao/cLoops2\ncd cLoops2\nconda env create --name cLoops2 --file cLoops2_env.yaml\nconda activate cLoops2 \npython3 setup.py install\n```\n\nNecessary Python3 third-party packages are listed below, all of which can be installed through conda. If you like to install cLoops2 through the old school way ***python setup.py install***, please install the 3rd dependencies first. \n```\ntqdm\nnumpy \nscipy \npandas\nsklearn\nseaborn\npyBigWig\nmatplotlib\njoblib\nnetworkx\n```\n\nAfter installation, whenever you want to run cLoops2, just activate the environment with conda: **conda activate cLoops2**. \nHappy peak/loop-calling and have fun exploring all the other kinds of analyses.     \n\n\n------\n------\n## cLoops2 Main Functions\nRun ***cLoops2*** or ***cLoops2 -h*** can show the main functions of cLoops2 with short descriptions and examples.     \n```\nAn enhanced, accurate and flexible peak/domain/loop-calling and analysis tool \nfor 3D genomic interaction data.\n\nUse cLoops2 sub-command -h to see detail options and examples for sub-commands.\nAvailable sub-commands are: \n    qc: quality control of BEDPE files before analysis.\n    pre: preprocess input BEDPE files into cLoops2 data.\n    update: update cLoops2 data files locations.\n    combine: combine multiple cLooops2 data directories.\n    dump: convert cLoops2 data files to others (BEDPE, HIC, washU, bedGraph and\n          contact matrix)\n    estEps: estimate eps using Gaussian mixture models or k-distance plot.\n    estRes: estimate reasonable contact matrix resolution based on signal \n            enrichment.\n    estDis: estimate significant interactions distance range.\n    estSat: estimate sequencing saturation based on contact matrix.\n    estSim: estimate similarities among samples based on contact matrix.\n    filterPETs: filter PETs based on peaks, loops, singleton mode or knn mode. \n    samplePETs: sample PETs according to specific target size.\n    callPeaks: call peaks for ChIP-seq, ATAC-seq, ChIC-seq and CUT&Tag or the \n               3D genomic data such as Trac-looping, Hi-TrAC, HiChIP and more.\n    callLoops: call loops for 3D genomic data.\n    callDiffLoops: call differentially enriched loops for two datasets. \n    callDomains: call domains for 3D genomic data. \n    plot: plot the interaction matrix, genes, view point plot, 1D tracks, \n          peaks, loops and domains for a specific region. \n    montage: analysis of specific regions, producing Westworld Season 3 -like \n             Rehoboam plot. \n    agg: aggregated feature analysis and plots, features can be peaks, view \n         points, loops and domains.\n    quant: quantify peaks, loops and domains.\n    anaLoops: anotate loops for target genes.\n    findTargets: find target genes of genomic regions through networks from \n                 anaLoops.\n\nExamples:\n    cLoops2 qc -f trac_rep1.bedpe.gz,trac_rep2.bedpe,trac_rep3.bedpe.gz \\\n               -o trac_stat -p 3\n    cLoops2 pre -f ../test_GM12878_chr21_trac.bedpe -o trac\n    cLoops2 update -d ./trac\n    cLoops2 combine -ds ./trac1,./trac2,./trac3 -o trac_combined -keep 1\n    cLoops2 dump -d ./trac -o trac -hic\n    cLoops2 estEps -d trac -o trac_estEps_gmm -p 10 -method gmm\n    cLoops2 estRes -d trac -o trac_estRes -p 10 -bs 25000,5000,1000,200\n    cLoops2 estDis -d trac -o trac -plot -bs 1000 \n    cLoops2 estSim -ds Trac1,Trac2 -o trac_sim -p 10 -bs 2000 -m pcc -plot\n    cLoops2 filterPETs -d trac -peaks trac_peaks.bed -o trac_peaksFiltered -p 10\n    cLoops2 samplePETs -d trac -o trac_sampled -t 5000000 -p 10\n    cLoops2 callPeaks -d H3K4me3_ChIC -bgd IgG_ChIC -o H3K4me3_cLoops2 -eps 150 \\\n                      -minPts 10\n    cLoops2 callLoops -d Trac -eps 200,500,1000 -minPts 3 -filter -o Trac -w -j \\\n                      -cut 2000\n    cLoops2 callLoops -d HiC -eps 1000,5000,10000 -minPts 10,20,50,100 -w -j \\\n                      -trans -o HiC_trans \n    cLoops2 callDiffLoops -tloop target_loop.txt -cloop control_loop.txt \\\n                          -td ./target -cd ./control -o target_diff\n    cLoops2 callDomains -d trac -o trac -bs 10000 -ws 200000\n    cLoops2 plot -f test/chr21-chr21.ixy -o test -bs 500 -start 34840000 \\\n                 -end 34895000 -triu -1D -loop test_loops.txt -log \\\n                 -gtf hg38.gtf -bws ctcf.bw -beds enhancer.bed\n    cLoops2 montage -f test/chr21-chr21.ixy -o test -bed test.bed\n    cLoops2 agg -d trac -loops trac.loop -peaks trac_peaks.bed \\\n                -domains hic_domains.bed -bws CTCF.bw,ATAC.bw -p 20 -o trac \n    cLoops2 quant -d trac -peaks trac_peaks.bed -loops trac.loop \\\n                  -domains trac_domain.txt -p 20 -o trac\n    cLoops2 anaLoops -loops test_loop.txt -gtf gene.gtf -net -o test\n    cLoops2 findTargets -net test_ep_net.sif -tg test_targets.txt \\\n                        -bed GWAS.bed -o test \n    More usages and examples are shown when run with cLoops2 sub-command -h.\n    \n\noptional arguments:\n  -h, --help  show this help message and exit\n  -d PREDIR   Assign data directory generated by cLoops2 pre to carry out analysis. \n  -o FNOUT    Output data directory / file name prefix, default is cLoops2_output.\n  -p CPU      CPUs used to run the job, default is 1, set -1 to use all CPUs\n              available. Too many CPU could cause out-of-memory problem if there are\n              too many PETs.\n  -cut CUT    Distance cutoff to filter cis PETs, only keep PETs with distance\n              >=cut. Default is 0, no filtering.\n  -mcut MCUT  Keep the PETs with distance <=mcut. Default is -1, no filtering.\n  -v          Show cLoops2 verison number and exit.\n  ---         Following are sub-commands specific options. This option just show\n              version of cLoops2.\n\nBug reports are welcome and can be put as issue at github repo or sent to \ncaoyaqiang0410@gmail.com or yaqiang.cao@nih.gov. Thank you.\n```\n\n------\n### 1. Quality control for BEDPE files\nRun **cLoops2 qc -h** to see details. \n```\nGet the basic quality control statistical information from interaction BEDPE\nfiles.\n\nExample: \n    cLoops2 qc -f trac_rep1.bedpe.gz,trac_rep2.bedpe,trac_rep3.bedpe.gz -p 3 \\\n               -o trac_stat\n    \n\noptional arguments:\n  -h, --help  show this help message and exit\n  -d PREDIR   Assign data directory generated by cLoops2 pre to carry out analysis. \n  -o FNOUT    Output data directory / file name prefix, default is cLoops2_output.\n  -p CPU      CPUs used to run the job, default is 1, set -1 to use all CPUs\n              available. Too many CPU could cause out-of-memory problem if there are\n              too many PETs.\n  -cut CUT    Distance cutoff to filter cis PETs, only keep PETs with distance\n              >=cut. Default is 0, no filtering.\n  -mcut MCUT  Keep the PETs with distance <=mcut. Default is -1, no filtering.\n  -v          Show cLoops2 verison number and exit.\n  ---         Following are sub-commands specific options. This option just show\n              version of cLoops2.\n  -f FNIN     Input BEDPE file(s), .bedpe and .bedpe.gz are both suitable. Multiple\n              samples can be assigned as -f A.bedpe.gz,B.bedpe.gz,C.bedpe.gz.\n```\n\n------\n### 2. Pre-process BEDPE into cLoops2 data\nRun **cLoops2 pre -h** to see details. \n```\nPreprocess mapped PETs into cLoops2 data files.\n\nSupport input file formats:\nBEDPE: https://bedtools.readthedocs.io/en/latest/content/general-usage.html \nPAIRS: https://pairtools.readthedocs.io/en/latest/formats.html#pairs\n\nThe output directory contains one .json file for the basic statistics of PETs \ninformation and .ixy files which are coordinates for every PET. The coordinate\nfiles will be used to call peaks, loops or any other analyses implemented in \ncLoops2. For data backup/sharing purposes, the directory can be saved as \n.tar.gz file through tar. If changed and moved location, run \n***cLoops2 update -d*** to update.\n\nExamples:\n    1. keep high quality PETs of chromosome chr21\n        cLoops2 pre -f trac_rep1.bepee.gz,trac_rep2.bedpe.gz -o trac -c chr21\n\n    2. keep all cis PETs that have distance > 1kb\n        cLoops2 pre -f trac_rep1.bedpe.gz,trac_rep2.bedpe.gz -o trac -mapq 0\n\n    \n\noptional arguments:\n  -h, --help            show this help message and exit\n  -d PREDIR             Assign data directory generated by cLoops2 pre to carry out analysis. \n  -o FNOUT              Output data directory / file name prefix, default is cLoops2_output.\n  -p CPU                CPUs used to run the job, default is 1, set -1 to use all CPUs\n                        available. Too many CPU could cause out-of-memory problem if there are\n                        too many PETs.\n  -cut CUT              Distance cutoff to filter cis PETs, only keep PETs with distance\n                        >=cut. Default is 0, no filtering.\n  -mcut MCUT            Keep the PETs with distance <=mcut. Default is -1, no filtering.\n  -v                    Show cLoops2 verison number and exit.\n  ---                   Following are sub-commands specific options. This option just show\n                        version of cLoops2.\n  -f FNIN               Input BEDPE or PAIR file(s), .bedpe and .bedpe.gz are both suitable.\n                        Replicates or multiple samples can be assigned as -f A.bedpe.gz,\n                        B.bedpe.gz,C.bedpe.gz to get merged PETs.\n  -c CHROMS             Argument to process limited set of chromosomes, specify it as chr1,\n                        chr2,chr3. Use this option to filter reads from such as\n                        chr22_KI270876v1. The default setting is to use the entire set of\n                        chromosomes from the data.\n  -trans                Whether to parse trans- (inter-chromosomal) PETs. The default is to\n                        ignore trans-PETs. Set this flag to pre-process all PETs.\n  -mapq MAPQ            MAPQ cutoff to filter raw PETs, default is >=10. This option is not\n                        valid when input is .pairs file.\n  -format {bedpe,pairs}\n                        cLoops2 currently supports BEDPE and PAIRs file format. Default is bedpe.\n```\n\n------\n### 3. Update cLoops2 data directory\nRun **cLoops2 update -h** to see details. \n```\nUpdate cLoops2 data files generated by **cLoops2 pre**.\n\nIn the **cLoops2 pre** output directory, there is a .json file annotated with \nthe .ixy **absolute paths** and other information. So if the directory is \nmoved, or some .ixy files are removed or changed, this command is needed to \nupdate the paths, otherwise the other analysis modules will not work.\n\nExample:\n    cLoops2 update -d ./Trac\n    \n\noptional arguments:\n  -h, --help  show this help message and exit\n  -d PREDIR   Assign data directory generated by cLoops2 pre to carry out analysis. \n  -o FNOUT    Output data directory / file name prefix, default is cLoops2_output.\n  -p CPU      CPUs used to run the job, default is 1, set -1 to use all CPUs\n              available. Too many CPU could cause out-of-memory problem if there are\n              too many PETs.\n  -cut CUT    Distance cutoff to filter cis PETs, only keep PETs with distance\n              >=cut. Default is 0, no filtering.\n  -mcut MCUT  Keep the PETs with distance <=mcut. Default is -1, no filtering.\n  -v          Show cLoops2 verison number and exit.\n  ---         Following are sub-commands specific options. This option just show\n              version of cLoops2.\n```\n\n------\n### 4. Convert cLoops2 data to others    \nRun **cLoops2 dump -h** to see details.   \n```\nConvert cLoops2 data files to other types. Currently supports BED file,BEDPE \nfile, HIC file, washU long-range track, bedGraph file and matrix txt file. \n\nConverting cLoops2 data to .hic file needs \"juicer_tools pre\" in the command\nline enviroment. \nConverting cLoops2 data to legacy washU browser long-range track needs bgzip\nand tabix. Format reference: http://wiki.wubrowse.org/Long-range. \nConverting cLoops2 data to UCSC bigInteract track needs bedToBigBed. Format \nreference: https://genome.ucsc.edu/goldenPath/help/interact.html.\nConverting cLoops2 data to bedGraph track will normalize value as RPM \n(reads per million). Run with -bdg_pe flag for 1D data such as ChIC-seq,\nChIP-seq and ATAC-seq. \nConverting cLoops2 data to matrix txt file will need specific resolution. \nThe output txt file can be loaded in TreeView for visualization or further\nanalysis. \n\nExamples:\n    1. convert cLoops2 data to single-end .bed file fo usage of BEDtools or \n       MACS2 for peak-calling with close PETs\n        cLoops2 dump -d trac -o trac -bed -mcut 1000\n\n    2. convert cLoops2 data to .bedpe file for usage of BEDtools, only keep \n       PETs distance >1kb and < 1Mb\n        cLoops2 dump -d trac -o trac -bedpe -bedpe_ext -cut 1000 -mcut 1000000 \n\n    3. convert cLoops2 data to .hic file to load in juicebox\n        cLoops2 dump -d trac -o trac -hic -hic_org hg38 \\\n                    -hic_res 200000,20000,5000\n    \n    4. convert cLoops2 data to washU long-range track file, only keep PETs \n       distance > 1kb \n        cLoops2 dump -d trac -o trac -washU -washU_ext 50 -cut 1000\n    \n    5. convert cLoops2 data to UCSC bigInteract track file \n        cLoops2 dump -d trac -o trac -ucsc -ucsc_cs ./hg38.chrom.sizes \n\n    6. convert interacting cLoops2 data to bedGraph file with all PETs\n        cLoops2 dump -d trac -o trac -bdg -bdg_ext 100\n\n    7. convert 1D cLoops2 data (such as ChIC-seq/ChIP-seq/ATAC-seq) to bedGraph \n       file \n        cLoops2 dump -d trac -o trac -bdg -pe \n\n    8. convert 3D cLoops2 data (such as Trac-looping) to bedGraph file for peaks\n        cLoops2 dump -d trac -o trac -bdg -mcut 1000\n\n    9. convert one region in chr21 to contact matrix correlation matrix txt file \n        cLoops2 dump -d test -mat -o test -mat_res 10000 \\\n                    -mat_chrom chr21-chr21 -mat_start 36000000 \\\n                    -mat_end 40000000 -log -corr\n    \n\noptional arguments:\n  -h, --help            show this help message and exit\n  -d PREDIR             Assign data directory generated by cLoops2 pre to carry out analysis. \n  -o FNOUT              Output data directory / file name prefix, default is cLoops2_output.\n  -p CPU                CPUs used to run the job, default is 1, set -1 to use all CPUs\n                        available. Too many CPU could cause out-of-memory problem if there are\n                        too many PETs.\n  -cut CUT              Distance cutoff to filter cis PETs, only keep PETs with distance\n                        >=cut. Default is 0, no filtering.\n  -mcut MCUT            Keep the PETs with distance <=mcut. Default is -1, no filtering.\n  -v                    Show cLoops2 verison number and exit.\n  ---                   Following are sub-commands specific options. This option just show\n                        version of cLoops2.\n  -bed                  Convert data to single-end BED file.\n  -bed_ext BED_EXT      Extension from the center of the read to both ends for BED file.\n                        Default is 50.\n  -bedpe                Convert data to BEDPE file.\n  -bedpe_ext BEDPE_EXT  Extension from the center of the PET to both ends for BEDPE file.\n                        Default is 50.\n  -hic                  Convert data to .hic file.\n  -hic_org HIC_ORG      Organism required to generate .hic file,default is hg38. If the\n                        organism is not available, assign a chrom.size file.\n  -hic_res HIC_RES      Resolutions used to generate .hic file. Default is 1000,5000,25000,\n                        50000,100000,200000.\n  -washU                Convert data to legacy washU browser long-range track.\n  -washU_ext WASHU_EXT  Extension from the center of the PET to both ends for washU track.\n                        Default is 50.\n  -ucsc                 Convert data to UCSC bigInteract file track.\n  -ucsc_ext UCSC_EXT    Extension from the center of the PET to both ends for ucsc\n                        track. Default is 50.\n  -ucsc_cs UCSC_CS      A chrom.sizes file. Can be obtained through fetchChromSizese.\n                        Required for -ucsc option.\n  -bdg                  Convert data to 1D bedGraph track file.\n  -bdg_ext BDG_EXT      Extension from the center of the PET to both ends for\n                        bedGraph track. Default is 50.\n  -bdg_pe               When converting to bedGraph, argument determines whether to treat PETs\n                        as ChIP-seq, ChIC-seq or ATAC-seq paired-end libraries. Default is not.\n                        PETs are treated as single-end library for interacting data.\n  -mat                  Convert data to matrix txt file with required resolution.\n  -mat_res MAT_RES      Bin size/matrix resolution (bp) to generate the contact matrix. \n                        Default is 5000 bp. \n  -mat_chrom CHROM      The chrom-chrom set will be processed. Specify it as chr1-chr1.\n  -mat_start START      Start genomic coordinate for the target region. Default will be the\n                        smallest coordinate from specified chrom-chrom set.\n  -mat_end END          End genomic coordinate for the target region. Default will be the\n                        largest coordinate from specified chrom-chrom set.\n  -log                  Whether to log transform the matrix. Default is not.\n  -m {obs,obs/exp}      The type of matrix, observed matrix or observed/expected matrix, \n                        expected matrix will be generated by shuffling PETs. Default is\n                        observed.\n  -corr                 Whether to get the correlation matrix. Default is not. \n  -norm                 Whether to normalize the matrix with z-score. Default is not.\n\n```\n\n\n------\n### 5. Estimate eps\nRun **cLoops2 estEps -h** to see details. \n```\nEstimate key parameter eps. \n\nTwo methods are implemented: 1) unsupervised Gaussian mixture model (gmm), and \n2) k-distance plot (k-dis,-k needed). Gmm is based on the assumption that PETs \ncan be classified into self-ligation (peaks) and inter-ligation (loops). K-dis\nis based on the k-nearest neighbors distance distribution to find the \"knee\", \nwhich is where the distance (eps) between neighbors has a sharp increase along\nthe k-distance curve. K-dis is the traditional approach literatures, but it is\nmuch more time consuming than gmm, and maybe only fit to small cases. If both \nmethods do not give nice plots, please turn to the empirical parameters you \nlike, such as 100,200 for ChIP-seq -like data, 5000,1000 for Hi-C and etc.\n\nExamples: \n    1. estimate eps with Gaussian mixture model    \n        cLoops2 estEps -d trac -o trac_estEps_gmm -p 10 -method gmm\n\n    2. estimate eps with k-nearest neighbors distance distribution\n        cLoops2 estEps -d trac -o trac_estEps_kdis -p 10 -method k-dis -k 5\n    \n\noptional arguments:\n  -h, --help           show this help message and exit\n  -d PREDIR            Assign data directory generated by cLoops2 pre to carry out analysis. \n  -o FNOUT             Output data directory / file name prefix, default is cLoops2_output.\n  -p CPU               CPUs used to run the job, default is 1, set -1 to use all CPUs\n                       available. Too many CPU could cause out-of-memory problem if there are\n                       too many PETs.\n  -cut CUT             Distance cutoff to filter cis PETs, only keep PETs with distance\n                       >=cut. Default is 0, no filtering.\n  -mcut MCUT           Keep the PETs with distance <=mcut. Default is -1, no filtering.\n  -v                   Show cLoops2 verison number and exit.\n  ---                  Following are sub-commands specific options. This option just show\n                       version of cLoops2.\n  -fixy FIXY           Assign the .ixy file to estimate eps inside of the whole directory\n                       generated by cLoops2 pre. For very large data, especially Hi-C, this\n                       option is recommended for chr1 (or the smaller one) to save time.\n  -k KNN               The k-nearest neighbors used to draw the k-distance plot. Default is 0\n                       (not running), set this when -method k-dis. Suggested 5 for\n                       ChIA-PET/Trac-looping data, 20 or 30 for Hi-C like data.\n  -method {gmm,k-dis}  Two methods can be chosen to estimate eps. Default is Gmm. See above\n                       for difference of the methods.\n\n```\n\n------\n### 6. Estimate reasonable contact matrix resolution \nRun **cLoops2 estRes -h** to see details. \n```\nEstimate reasonable genome-wide contact matrix resolution based on signal \nenrichment. \n\nPETs will be assigned to contact matrix bins according to input resolution. A \nbin is marked as [nx,ny], and a PET is assigned to a bin by nx = int((x-s)/bs),\nny = int((y-s)/bs), where s is the minimal coordinate for all PETs and bs is \nthe bin size. Self-interaction bins (nx=ny) will be ignored. The bins only \ncontaining singleton PETs are assumed as noise. \n\nThe output is a PDF plot, for each resolution, a line is separated into two \nparts: 1) dash line indicated linear increased trend of singleton PETs/bins; 2)\nsolid thicker line indicated non-linear increased trend of higher potential \nsignal PETs/bins. The higher the ratio of signal PETs/bins, the easier it it to\nfind loops in that resolution. The closer to the random line, the higher the \npossibility to observe evenly distributed signals.  \n\nWe expect the highest resolution with >=50% PETs are not singletons.\n\nExample:\n    cLoops2 estRes -d trac -o trac -bs 10000,5000,1000 -p 20\n\noptional arguments:\n  -h, --help   show this help message and exit\n  -d PREDIR    Assign data directory generated by cLoops2 pre to carry out analysis. \n  -o FNOUT     Output data directory / file name prefix, default is cLoops2_output.\n  -p CPU       CPUs used to run the job, default is 1, set -1 to use all CPUs\n               available. Too many CPU could cause out-of-memory problem if there are\n               too many PETs.\n  -cut CUT     Distance cutoff to filter cis PETs, only keep PETs with distance\n               >=cut. Default is 0, no filtering.\n  -mcut MCUT   Keep the PETs with distance <=mcut. Default is -1, no filtering.\n  -v           Show cLoops2 verison number and exit.\n  ---          Following are sub-commands specific options. This option just show\n               version of cLoops2.\n  -bs BINSIZE  Candidate contact matrix resolution (bin size) to estimate signal\n               enrichment. A series of comma-separated values or a single value can\n               be used as input. For example,-bs 1000,5000,10000. Default is 5000.\n\n```\n\n------\n### 7. Estimate significant interaction distance range\nRun **cLoops2 estDis -h** to see details. \n```\nEstimate the significant interaction distance limitation by getting the observed\nand expected random background of the genomic distance vs interaction frequency.\n\nExample:\n    cLoops2 estDis -d trac -o trac -bs 5000 -p 20 -plot\n    \n\noptional arguments:\n  -h, --help   show this help message and exit\n  -d PREDIR    Assign data directory generated by cLoops2 pre to carry out analysis. \n  -o FNOUT     Output data directory / file name prefix, default is cLoops2_output.\n  -p CPU       CPUs used to run the job, default is 1, set -1 to use all CPUs\n               available. Too many CPU could cause out-of-memory problem if there are\n               too many PETs.\n  -cut CUT     Distance cutoff to filter cis PETs, only keep PETs with distance\n               >=cut. Default is 0, no filtering.\n  -mcut MCUT   Keep the PETs with distance <=mcut. Default is -1, no filtering.\n  -v           Show cLoops2 verison number and exit.\n  ---          Following are sub-commands specific options. This option just show\n               version of cLoops2.\n  -c CHROMS    Whether to process limited chroms, specify it as chr1,chr2,chr3, \n               default is not. Use this to save time for quite big data.\n  -bs BINSIZE  Bin size / contact matrix resolution (bp) to generate the contact\n               matrix for estimation, default is 5000 bp.\n  -r REPEATS   The reapet times to shuffle PETs to get the mean expected background,\n               default is 10.\n  -plot        Set to plot the result.\n```\n\n------\n### 8. Filter PETs    \nRun **cLoops2 filterPETs -h** to see details \n```\nFilter PETs according to peaks/domains/loops/singletons/KNNs. \n\nIf any end of the PETs overlap with features such as peaks or loops, the PET \nwill be kept. Filtering can be done before or after peak/loop-calling. Input \ncan be peaks or loops, but should not be be mixed. The -singleton mode is based\non a specified contact matrix resolution, if there is only one PET in the bin, \nthe singleton PETs will be filtered. The -knn is based on noise removing step \nof blockDBSCAN. \n\nExamples:\n    1. keep PETs overlapping with peaks\n        cLoops2 filterPETs -d trac -peaks peaks.bed -o trac_filtered\n\n    2. keep PETs that do not overlap with any blacklist regions\n        cLoops2 filterPETs -d trac -peaks bg.bed -o trac_filtered -iv\n\n    3. keep PETs that overlap with loop anchors\n        cLoops2 filterPETs -d trac -loops test_loops.txt -o trac_filtered\n\n    4. keep PETs that both ends overlap with loop anchors\n        cLoops2 filterPETs -d trac -loops test_loops.txt -o trac_filtered -both\n\n    5. keep non-singleton PETs based on 1kb contact matrix\n        cLoops2 filterPETs -d trac -o trac_filtered -singleton -bs 1000\n\n    6. filter PETs based on blockDBSCAN knn noise removing\n        cLoops2 filterPETs -d trac -o trac_filtered -knn -eps 1000 -minPts 5\n\noptional arguments:\n  -h, --help      show this help message and exit\n  -d PREDIR       Assign data directory generated by cLoops2 pre to carry out analysis. \n  -o FNOUT        Output data directory / file name prefix, default is cLoops2_output.\n  -p CPU          CPUs used to run the job, default is 1, set -1 to use all CPUs\n                  available. Too many CPU could cause out-of-memory problem if there are\n                  too many PETs.\n  -cut CUT        Distance cutoff to filter cis PETs, only keep PETs with distance\n                  >=cut. Default is 0, no filtering.\n  -mcut MCUT      Keep the PETs with distance <=mcut. Default is -1, no filtering.\n  -v              Show cLoops2 verison number and exit.\n  ---             Following are sub-commands specific options. This option just show\n                  version of cLoops2.\n  -peaks FBED     BED file of genomic features (such as promoters, enhancers, ChIP-seq,\n                  ATAC-seq peaks,TADs) to filter PETs.\n  -loops FLOOP    The loop.txt file generated by cLoops2, can be loops or domains, to\n                  filter PETs.\n  -gap GAP        If the distance between two genomic features is <=gap, the two regions\n                  will be combined. Default is 1. Set to >=1.\n  -singleton      Whether to use singleton mode to filter PETs. Contact matrix\n                  resolution with -bs is required. Singleton PETs in contact matrix bins\n                  will be filtered.\n  -bs BINSIZE     The contact matrix bin size for -singleton mode filtering. Default is\n                  5000.\n  -knn            Whether to use noise removing method in blockDBSCAN to filter PETs,\n                  -eps and -minPts are required.\n  -eps EPS        Same to callPeaks and callLoops, only used to filter PETs for -knn\n                  mode. Default is 1000. Only one value is supported.\n  -minPts MINPTS  Same to callPeaks and callLoops, only used to filter PETs for -knn\n                  mode. Default is 5. Only one value is supported.\n  -iv             Whether to only keep PETs not in the assigned regions, behaves like\n                  grep -v.\n  -both           Whether to only keep PETs that both ends overlap with loop anchors.\n                  Default is not.\n```\n\n------\n### 9. Sampling PETs     \nRun **cLoops2 samplePETs -h** to see details.\n```\nSampling PETs to target total size. \n\nIf there are multiple sample libraries and the total sequencing depths vary a \nlot, and you want to compare the data fairly, it's better to sample them to \nsimilar total PETs (either down-sampling or up-sampling), then call peaks/loops\nwith the same parameters. \n\nExample:\n    cLoops2 samplePETs -d trac -o trac_sampled -tot 5000000 -p 10\n    \n\noptional arguments:\n  -h, --help  show this help message and exit\n  -d PREDIR   Assign data directory generated by cLoops2 pre to carry out analysis. \n  -o FNOUT    Output data directory / file name prefix, default is cLoops2_output.\n  -p CPU      CPUs used to run the job, default is 1, set -1 to use all CPUs\n              available. Too many CPU could cause out-of-memory problem if there are\n              too many PETs.\n  -cut CUT    Distance cutoff to filter cis PETs, only keep PETs with distance\n              >=cut. Default is 0, no filtering.\n  -mcut MCUT  Keep the PETs with distance <=mcut. Default is -1, no filtering.\n  -v          Show cLoops2 verison number and exit.\n  ---         Following are sub-commands specific options. This option just show\n              version of cLoops2.\n  -tot TOT    Target total number of PETs.\n```\n\n------\n### 10. Call peaks for 1D or 3D data\nRun **cLoops2 callPeaks -h** to see details.\n```\nCall peaks based on clustering. \n\nWell tested work for ChIP-seq, ChIC-seq, ATAC-seq, CUT&RUN -like or the 3D\ngenomic data such as Hi-TrAC/Trac-looping, ChIA-PET and HiChIP.\n\nThere are three steps in the algorithm: 1) cluster the PETs to find \nself-ligation clusters, which are candidate peaks; 2) estimate the significance\nof candidate peaks with local background; 3) if given control data, further \ncompare candidate peaks to control data. If running multiple clusterings with\nseparated parameters, the clusters will be combined and callPeaks will output \nthe most significant one based on overlaps. \n\nKey parameters are -eps and -minPts, both are key parameters in the clustering\nalgorithm blockDBSCAN. Eps indicates the distance that define two points (PETs) \nbeing neighbors, while minPts indicatess the minial number of points required \nfor a cluster to form.  For sharp-peak like data (ATAC-seq, TF ChIC-seq), set\n-eps small such as 100 or 150. For broad-peak like data, such as H3K27me3 \nChIP-seq and ChIC-seq, set -eps large as 500 or 1000. \n\nEps will affect more than minPts for sensitivity.\n\nExamples:\n    1. call peaks for Trac-looping  \n        cLoops2 callPeaks -d trac -eps 100 -minPts 10 -o trac -p 10\n\n    2. call peaks for sharp-peak like ChIC-seq without control data\n        cLoops2 callPeaks -d ctcf_chic -o ctcf_chic -p 10\n\n    3. call peaks for broad-peak like ChIC-seq with IgG as control\n        cLoops2 callPeaks -d H3K27me3 -bgd IgG -eps 500,1000 -minPts 10 \\\n                          -o H3K27me3 \n\n    4. call peaks for sharp-peak ChIC-seq with linear fitting scaled control \n       data\n        cLoops2 callPeaks -d ctcf -bgd IgG -eps 150 -minPts 10 -o ctcf -p 10\\\n                          -bgm lf\n\n    5. call peaks with sentitive mode to get comprehensive peaks for CUT&TAG\n        cLoops2 callPeaks -d H3K27ac -bgd IgG -sen -p 10\n\n    6. filter PETs first and then call peaks for H3K27ac HiChIP, resulting much\n       much accurate peaks\n        cLoops2 filterPETs -d h3k27ac_hichip -o h3k27ac_hichip_filtered -knn \\\n                           -eps 500 -minPts 5\n        cLoops2 callPeaks -d h3k27ac_hichip_filtered -eps 200,500 -minPts 10 \\\n                          -p 10\n\n    7. call peaks for interaction data as single-end data \n        cLoops2 callPeaks -d h3k27ac -o h3k27ac -split -eps 200,500 -minPts 10 \\\n                          -p 10\n\n    8. call differential peaks between WT and KO condition\n        cLoops2 callPeaks -d MLL4_WT -bgd MLL4_KO -o MLL4_WTvsKO -p 10\n        cLoops2 callPeaks -d MLL4_KO -bgd MLL4_WT -o MLL4_KOvsWT -p 10\n    \n\noptional arguments:\n  -h, --help          show this help message and exit\n  -d PREDIR           Assign data directory generated by cLoops2 pre to carry out analysis. \n  -o FNOUT            Output data directory / file name prefix, default is cLoops2_output.\n  -p CPU              CPUs used to run the job, default is 1, set -1 to use all CPUs\n                      available. Too many CPU could cause out-of-memory problem if there are\n                      too many PETs.\n  -cut CUT            Distance cutoff to filter cis PETs, only keep PETs with distance\n                      >=cut. Default is 0, no filtering.\n  -mcut MCUT          Keep the PETs with distance <=mcut. Default is -1, no filtering.\n  -v                  Show cLoops2 verison number and exit.\n  ---                 Following are sub-commands specific options. This option just show\n                      version of cLoops2.\n  -eps EPS            Distance that defines two points (PETs) being neighbors, eps in\n                      blockDBSCAN as key parameter, multiple eps can be assigned such as\n                      100,200,300 to run multiple clusterings, the results will be combined.\n                      For callPeaks, the default is 100,200. If the data show much more broad\n                      feature such as H3K27me3 and H3K4me1, increase it to 500,1000 or larger.\n                      If expecting both narrow and broad peaks in the data, set -eps 100,200,\n                      500,1000.\n  -minPts MINPTS      Points required in a cluster, minPts in blockDBSCAN, key parameter,\n                      multiple minPts can be assigned such as 3,5 to run multiple\n                      clusterings, the results will be combined. For callPeaks, the default\n                      is 5. If the data have many reads, increasing minPts such as 10,20.\n  -pcut PCUT          Bonferroni corrected poisson p-value cutoff to determine significant\n                      peaks. Default is 1e-2.\n  -bgd BGD            Assign control data (IgG, Input) directory generated by cLoops2 pre to\n                      carry out analysis. Default is no background.\n  -bgm {ratio,lf}     How to scale the target data with control data. Available options are\n                      'ratio' and 'lf'. 'ratio' is based on library size and 'lf' means\n                      linear fitting for control and target candidate peaks nearby regions.\n                      Default is 'lf'. The scaling factor estimated by lf usually is a little\n                      larger than ratio. In other words, the higher the scaling factor, the\n                      less sensitive the results.\n  -pseudo PSEUDO      Pseudo counts for local background or control data to estimate the\n                      significance of peaks if no PETs/reads in the background. Default is\n                      1. Set it larger for noisy data, 0 is recommend for very clean data\n                      such as well prepared CUT&Tag.\n  -sen                Whether to use sensitive mode to call peaks. Default is not. If only a\n                      few peaks were called, while a lot more can be observed\n                      from visualization, try this option. Adjust -pcut or filter by\n                      yourself to select significant ones.\n  -split              Whether to split paired-end as single end data to call peaks. Sometimes\n                      works well for Trac-looping and HiChIP.\n  -splitExt SPLITEXT  When run with -split, the extension to upstraem and downstream, \n                      default is 50.\n```\n\n\n------\n### 11. Call loops\nRun **cLoops2 callLoops -h** to see details.\n```\nCall loops based on clustering. \n\nWell tested work for Hi-TrAC/TrAC-looping, HiCHiP, ChIA-PET and Hi-C.\n\nSimilar to call peaks, there are three main steps in the algorithm: 1) cluster \nthe PETs to find inter-ligation clusters, which are candidate loops; 2) \nestimate the significance of candidate loops with permutated local background. \n3) If -hic option not selected, the loop anchors will be checked for peak-like \nfeatures, only peak-like anchors are kept. If running multiple clusterings, \nthe clusters will be combined and callLoops will output the most significant \none based on overlaps. \n\nSimilar to callPeaks, keys parameters are -eps and -minPts. For sharp-peak like \ninteraction data, set -eps small such as 500,1000. For broad-peak like data, \nsuch as H3K27ac HiChIP, set -eps big as 1000,2000. For Hi-C and HiChIP data, \nbigger -minPts is also needed, such as 20,50. \n\nPlease note that the blockDBSCAN implementation in cLoops2 is much more \nsensitive than cDBSCAN in cLoops, so the same parameters can generate quite \ndifferent results. With -hic option, cDBSCAN will be used. \n\nExamples:\n    1. call loops for Hi-TrAC/Trac-looping\n        cLoops2 callLoops -d trac -o trac -eps 200,500,1000,2000 -minPts 5 -w -j\n\n    2. call loops for Hi-TrAC/Trac-looping with filtering short distance PETs \n       and using maximal estimated distance cutoff\n        cLoops2 callLoops -d trac -o trac -eps 200,500,1000,2000 -minPts 5 \\\n                          -cut 1000 -max_cut -w -j\n\n    3. call loops for Hi-TrAC/Trac-looping and get the PETs with any end \n       overlapping loop anchors\n        cLoops2 callLoops -d trac -o trac -eps 200,500,1000,2000 -minPts 5 -w \\\n                          -j -filterPETs\n\n    4. call loops for high-resolution Hi-C like data \n        cLoops2 callLoops -d hic -o hic -eps 2000,5000,10000 -minPts 20,50 -w -j\n    \n    5. call inter-chromosomal loops (for most data, there will be no significant \n       inter-chromosomal loops)\n        cLoops2 callLoops -d HiC -eps 5000 -minPts 10,20,50,100,200 -w -j -trans\\                          \n                          -o HiC_trans\n    \n\noptional arguments:\n  -h, --help      show this help message and exit\n  -d PREDIR       Assign data directory generated by cLoops2 pre to carry out analysis. \n  -o FNOUT        Output data directory / file name prefix, default is cLoops2_output.\n  -p CPU          CPUs used to run the job, default is 1, set -1 to use all CPUs\n                  available. Too many CPU could cause out-of-memory problem if there are\n                  too many PETs.\n  -cut CUT        Distance cutoff to filter cis PETs, only keep PETs with distance\n                  >=cut. Default is 0, no filtering.\n  -mcut MCUT      Keep the PETs with distance <=mcut. Default is -1, no filtering.\n  -v              Show cLoops2 verison number and exit.\n  ---             Following are sub-commands specific options. This option just show\n                  version of cLoops2.\n  -eps EPS        Distance that defines two points (PETs) being neighbors, eps in\n                  blockDBSCAN as key parameter, multiple eps can be assigned such as\n                  200,500,1000,2000 to run multiple clusterings, the results will be\n                  combined. No default value, please give the input.\n  -minPts MINPTS  Points required in a cluster. minPts in blockDBSCAN is a key parameter.\n                  Empirically 5 is good for TFs and histone modification ChIA-PET data\n                  and Trac-looping. For data like HiChIP and Hi-C, set it larger, like\n                  >=20. The input can be a series, and the final loops will have the\n                  PETs>= max(minPts). \n  -plot           Whether to plot estimated inter-ligation and self-ligation PETs\n                  distance distribution. Default is not to generate a plot.\n  -i              Whether to convert loops to UCSC Interact track to visualize in UCSC.\n                  Default is not, set this flag to save.\n  -j              Whether to convert loops to 2D feature annotations to visualize in\n                  Juicebox. Default is not, set this flag to save.\n  -w              Whether to save tracks of loops to visualize in legacy and new washU.\n                  Default is not, set this flag to save two files.\n  -max_cut        When running cLoops with multiple eps or minPts, multiple distance\n                  cutoffs for self-ligation and inter-ligation PETs will be estimated\n                  based on the overlaps of anchors. Default option is the minimal one\n                  will be used to filter PETs for candidate loop significance test.\n                  Set this flag to use maximal one, will speed up for significance test.\n  -hic            Whether to use statistical cutoffs for Hi-C to output significant loops.\n                  Default is not, set this option to enable. Additionally, with -hic\n                  option, there is no check for anchors requiring they looking like peaks.\n  -filter         Whether to filter raw PETs according to called loops. The filtered\n                  PETs can show clear view of interactions or be used to call loops again.\n  -trans          Whether to call trans- (inter-chromosomal) loops. Default is not, set\n                  this flag to call. For most common cases, not recommended, only for\n                  data there are obvious visible trans loops.\n  -emPair         By default eps and minPts combinations will be used to run clustering.\n                  With this option, for example eps=500,1000 and minPts=5,10, only (500,5)\n                  and (1000,10) as parameters of clustering will be run. Input number of\n                  eps and minPts should be same.\n\n```\n\n------\n### 12. Call differentially enriched intra-chromosomal loops\nRun **cLoops2 callDiffLoops -h** to see details.\n```\nCall differentially enriched intra-chromosomal loops between two conditions.\n\nSimilar to calling peaks with control data, calling differentially enriched \nloops is based on scaled PETs and the Poisson test. There are three main steps \nin the algorithm: 1) merge the overlapped loops, quantify them and their \npermutated local background regions; 2) fit the linear transformation of \nbackground target interaction density to control background data based on \nMANorm2; therefore, if there are more than than two samples, others can be \nscaled to the reference sample for quantitative comparison; 3) estimate the \nfold change (M) cutoff and average (A) cutoff using the background data with \nthe control of FDR, assuming there should no differentially significant \ninteractions called from the background data; or using the assigned cutoffs; 4) \nestimate the significance based on the Poisson test for transformed data, both \nfor the loop and loop anchors. For example, if transformed PETs for target is \n5, PETs for control is 3 while control nearby permutated background median is \n4, then for the Poisson test, lambda=4-1 is used to test the observed 5 to call\np-value.\n\nExample:\n    1. classical usage \n        cLoops2 callDiffLoops -tloop target_loop.txt -cloop control_loop.txt \\\n                          -td ./target -cd ./control -o target_diff\n\n    2. customize MA cutoffs \n        cLoops2 callDiffLoops -tloop target_loop.txt -cloop control_loop.txt \\\n                          -td ./target -cd ./control -o target_diff -cutomize \\\n                          -acut 5 -mcut 0.5\n    \n\noptional arguments:\n  -h, --help            show this help message and exit\n  -d PREDIR             Assign data directory generated by cLoops2 pre to carry out analysis. \n  -o FNOUT              Output data directory / file name prefix, default is cLoops2_output.\n  -p CPU                CPUs used to run the job, default is 1, set -1 to use all CPUs\n                        available. Too many CPU could cause out-of-memory problem if there are\n                        too many PETs.\n  -cut CUT              Distance cutoff to filter cis PETs, only keep PETs with distance\n                        >=cut. Default is 0, no filtering.\n  -mcut MCUT            Keep the PETs with distance <=mcut. Default is -1, no filtering.\n  -v                    Show cLoops2 verison number and exit.\n  ---                   Following are sub-commands specific options. This option just show\n                        version of cLoops2.\n  -tloop TLOOP          The target loops in _loop.txt file called by cLoops2.\n  -cloop CLOOP          The control loops in _loop.txt file called by cLoops2.\n  -td TPRED             The data directory generated by cLoops2 for target data.\n  -cd CPRED             The data directory generated by cLoops2 for control data.\n  -pcut PCUT            Poisson p-value cutoff to determine significant differentially\n                        enriched loops after Bonferroni correction , default is 1e-2.\n  -igp                  Ignore Poisson p-value cutoff and only using FDR to control MA plot\n                        cutoffs.\n  -noPCorr              Do not performe Bonferroni correction of Poisson p-values. Will get\n                        more loops. Default is always performing.\n  -fdr FDR              FDR cutoff for estimating fold change (M) and average value (A) after\n                        normalization with background data. Default is 0.1.\n  -j                    Whether to convert loops to 2D feature annotations to visualize in\n                        Juicebox. Default is not, set this flag to save.\n  -w                    Whether to save tracks of loops to visualize in legacy and new washU.\n                        Default is not, set this flag to save two files.\n  -customize            Whether to use cutomized cutoffs of MA plot. Defulat is not. If enable\n                        -acut and -mcut is needed.\n  -cacut CACUT          Average cutoff for MA plot of normalized PETs of loops. Assign when\n                        -customize option used.\n  -cmcut CMCUT          Fold change cutoff for MA plot of normalized PETs of loops. Assign when\n                        -customize option used.\n  -vmin VMIN            The minimum value shown in the heatmap and colorbar.\n  -vmax VMAX            The maxmum value shown in the heatmap and colorbar.\n  -cmap {summer,red,div,cool}\n                        The heatmap style. Default is summer.\n\n\n```\n\n------\n### 13. Call domains\nRun **cLoops2 callDomains -h** to see details.\n```\nCall domains for the 3D genomic data based on correlation matrix and local \nsegregation score.\n\nWell tested work for Hi-TrAC/Trac-looping data.\n\nExamples:\n    1. call Hi-C like TADs\n        cLoops2 callDomains -d trac -o trac -bs 5000,10000 -ws 500000 -p 20\n\n    2. call Hi-TrAC/Trac-looping specific small domains\n        cLoops2 callDomains -d trac -o trac -bs 1000 -ws 100000 -p 20 \n\n    3. call domains for Hi-C\n        cLoops2 callDomains -d hic -o hic -bs 10000 -ws 500000 -hic \n\noptional arguments:\n  -h, --help   show this help message and exit\n  -d PREDIR    Assign data directory generated by cLoops2 pre to carry out analysis. \n  -o FNOUT     Output data directory / file name prefix, default is cLoops2_output.\n  -p CPU       CPUs used to run the job, default is 1, set -1 to use all CPUs\n               available. Too many CPU could cause out-of-memory problem if there are\n               too many PETs.\n  -cut CUT     Distance cutoff to filter cis PETs, only keep PETs with distance\n               >=cut. Default is 0, no filtering.\n  -mcut MCUT   Keep the PETs with distance <=mcut. Default is -1, no filtering.\n  -v           Show cLoops2 verison number and exit.\n  ---          Following are sub-commands specific options. This option just show\n               version of cLoops2.\n  -bs BINSIZE  Candidate contact matrix resolution (bin size) to call domains. A\n               series of values or a single value can be used as input. Default is\n               10000. If given multiple values, callDomains will try to call nested\n               domains. Samll value may lead to samller domains.\n  -ws WINSIZE  The half of the sliding window size used to caculate local correlation,\n               Default is 500000 (500kb). Larger value may lead to larger domains.\n  -hic         Whether to use cutoffs for Hi-C to output significant domains.\n               Default is not. Set this option to enable, cutoffs will be more loose.\n```\n\n------\n### 14. Plot the interaction as heatmap/scatter/arches, 1D signals, peaks, loops and domains\nRun **cLoops2 plot -h** to see details.\n```\nPlot the interaction data as a heatmap (or arches/scatter) with additional of \nvirtual 4C view point, 1D tracks (bigWig files), 1D annotations (peaks, genes) \nand 2D annotations (domains). If -f is not assigned, will just plot profiles \nfrom bigWig file or bed files.\n\nExamples:\n    1. plot the simple square heatmap for a specific region with 1kb resolution \n       with genes \n        cLoops2 plot -f test/chr21-chr21.ixy -o test -bs 1000 -start 34840000 \\\n                     -end 34895000 -log -gtf test.gtf\n\n    2. plot the upper triangle heatmap with domains such as TAD and CTCF bigWig\n       track\n        cLoops2 plot -f test/chr21-chr21.ixy -o test_domain -bs 10000 \\\n                     -start 34600000 -end 35500000 -domains HiC_TAD.bed -log \\\n                    -triu -bws GM12878_CTCF_chr21.bw\n\n    3. plot the heatmap as upper triangle with 1D signal track and filter the \n       PETs shorter than 1kb\n        cLoops2 plot -f test/chr21-chr21.ixy -o test -bs 500 -start 34840000 \\\n                     -end 34895000 -log -triu -1D -cut 1000\n\n    4. plot the observation/expectation interaction heatmap with 1D signal \n        cLoops2 plot -f test/chr21-chr21.ixy -o test -m obs/exp -1D -triu \\ \n                     -bs 500 -start 34840000 -end 34895000\n\n    5. plot the chromosome-wide correlation heatmap \n        cLoops2 plot -f test/chr21-chr21.ixy -o test -corr \n\n    6. plot upper triangle interaction heatmap together with genes, bigWig \n       files, peaks, loops, domains, control the heatmap scale\n        cLoops2 plot -f test/chr21-chr21.ixy -o test -bs 500 -start 34840000 \\\n                     -end 34895000 -triu -bws ATAC.bw,CTCF.bw -1D \\\n                     -loop test_loops.txt -beds Enh.bed,Tss.bed \\\n                     -domains tad.bed -m obs -log -vmin 0.2 -vmax 2 -gtf genes.gtf\n    \n    7. plot small regions interacting PETs as arches \n        cLoops2 plot -f test/chr21-chr21.ixy -o test -start 46228500 \\\n                     -end 46290000 -1D -loops gm_loops.txt -arch -aw 0.05\n\n    8. plot small regions interacting PETs as scatter plot\n        cLoops2 plot -f test/chr21-chr21.ixy -o test -start 46228500 \\\n                     -end 46290000 -1D -loops gm_loops.txt -scatter\n\n    9. plot Hi-C compartments and eigenvector  \n        cLoops2 plot -f test/chr21-chr21.ixy -o test -bs 100000 -log -corr -eig  \n\noptional arguments:\n  -h, --help            show this help message and exit\n  -d PREDIR             Assign data directory generated by cLoops2 pre to carry out analysis. \n  -o FNOUT              Output data directory / file name prefix, default is cLoops2_output.\n  -p CPU                CPUs used to run the job, default is 1, set -1 to use all CPUs\n                        available. Too many CPU could cause out-of-memory problem if there are\n                        too many PETs.\n  -cut CUT              Distance cutoff to filter cis PETs, only keep PETs with distance\n                        >=cut. Default is 0, no filtering.\n  -mcut MCUT            Keep the PETs with distance <=mcut. Default is -1, no filtering.\n  -v                    Show cLoops2 verison number and exit.\n  ---                   Following are sub-commands specific options. This option just show\n                        version of cLoops2.\n  -f FIXY               Input .ixy file generated by cLoops2 pre. If not assigned, no heatmaps\n                        or arches will be shown and -chrom is needed to generate plots similar\n                        to IGV or other browser.\n  -bs BINSIZE           Bin size/matrix resolution (bp) to generate the contact matrix for\n                        plotting, default is 5000 bp.\n  -chrom CHROM          Chromosome for the target region if -f is not assigned.\n  -start START          Start genomic coordinate for the target region. Default is 0.\n  -end END              End genomic coordinate for the target region. Default is to infer\n                        from the data.\n  -loops FLOOP          The _loop.txt file generated by cLoops2, will be used to plot loops as\n                        arches.\n  -loopCut LOOPCUT      Only show loops with more than loopCut PETs. Default is 0.\n  -domains FDOMAIN      The domains need to annotated in the heatmap such as TADs, should be\n                        .bed file.\n  -beds BEDS            BED tracks of genomic features to plot above the heatmap, such as\n                        promoters and enhancers, track name will be inferred from file name,\n                        for example enhancer.bed,promoter.bed.\n  -gtf GTF              GTF track of genes to plot above the heatmap.\n  -bws BWS              BigWig tracks to plot above the heatmap, track name will be inferred\n                        from file name, for example a.bw,b.bw,c.bw. \n  -bwvs BWVS            BigWig tracks y-axis limitations. Default is atuo-determined. Assign\n                        as 'vmin,vmax;vmin,vmax;vmin,vmax'. For example, '0,1;;0,1' for three\n                        bigWig tracks, as the second track kept atuo-determined. Due to\n                        argparse limitation for parsing minus value, also can be assigned as\n                        vmax,vmin.\n  -bwcs BWCS            BigWig tracks colors. Default is atuo-determined. Assign as \n                        0,1,2 for three bigWig tracks. Values seperated by comma.\n  -log                  Whether to log transform the matrix.\n  -m {obs,obs/exp}      The type of matrix to plot, observed matrix or observed/expected\n                        matrix, expected matrix will be generated by shuffling PETs, default\n                        is observed.\n  -corr                 Whether to plot the correlation matrix. Default is not. Correlation\n                        heatmap will use dark mode color map, used together with obs method.\n  -norm                 Whether to normalize the matrix with z-score.\n  -triu                 Whether to rotate the heatmap only show upper triangle, default is\n                        False.\n  -vmin VMIN            The minimum value shown in the heatmap and colorbar.\n  -vmax VMAX            The maxmum value shown in the heatmap and colorbar.\n  -1D                   Whether to plot the pileup 1D signal for the region. Default is not.\n                        Please note, the 1D signal is aggregated from the visualization region.\n                        If want to check the signal from each position of all genome/chromosome,\n                        use cLoops2 dump -bdg to get the bigWig file.\n  -1Dv ONEDV            1D profile y-axis limitations. Default is auto-determined. Assign as\n                        vmin,vmax, for example 0,1.\n  -virtual4C            Whether to plot the virtual 4C view point 1D signal. Default is not.\n                        If assigned, -view_start and -view_end are needed.\n  -view_start VIEWSTART\n                        Start genomic coordinate for the view point start region, only valid\n                        when -vitrutal4C is set, should >=start and <=end.\n  -view_end VIEWEND     End genomic coordinate for the view point end region, only valid\n                        when -vitrutal4C is set, should >=start and <=end.\n  -4Cv VIEWV            Virtual 4C profile y-axis limitations. Default is auto-determined.\n                        Assign as vmin,vmax, for example 0,1.\n  -arch                 Whether to plot interacting PETs as arches. Default is not. If\n                        set, only original one PET one arch will be shown. Usefule to check\n                        small region for raw data, especially when heatmap is not clear.\n  -aw AW                Line width for each PET in arches plot. Default is 1. Try to\n                        change it if too many or few PETs.\n  -ac AC                Line color for each PET in arches plot. Default is 4. Try to\n                        change it see how many colors are supported by cLoops2.\n  -aa AA                Alpha to control arch color saturation. Default is 1.\n  -scatter              Whether to plot interacting PETs as scatter dots. Default is not.\n                        If set, only original one PET one dot will be shown. Usefule to check\n                        raw data, especially when heatmap is not clear that -vmax is too small.\n  -ss SS                Dot size for each PET in scatter plot. Default is 1. Try to\n                        change it to optimize the plot.\n  -sc SC                Dot color for each PET in scatter plot. Default is 0. Try to\n                        change it see how many colors are supported by cLoops2.\n  -sa SA                Alpha to control dot color saturation. Default is 1.\n  -eig                  Whether to plot the PC1 of correlation matirx to show compartments\n                        Default is not. Only work well for big regions such as resolution\n                        of 100k.\n  -eig_r                Whether to flip the PC1 values of -eig. It should be dependend on\n                        inactivate or activate histone markers, as actually the PCA values do\n                        not have directions, especially comparing different samples.\n  -figWidth {4,8}       Figure width. 4 is good to show the plot as half of a A4 figure\n                        width and 8 is good to show more wider. Default is 4.\n\n\n```\n\n------\n### 15. Montage analysis for regions of interactions\nRun **cLoops2 montage -h** to see details.\n```\nMontage analysis of specific regions, producing Westworld Season 3 -like \nRehoboam plot. \n\nExamples: \n    1. showing all PETs for a gene's promoter and enhancers\n        cLoops2 montage -f test/chr21-chr21.ixy -bed test.bed -o test \n\n    2. showing simplified PETs for a gene's promoter and enhancers\n        cLoops2 montage -f test/chr21-chr21.ixy -bed test.bed -o test -simple\n    \n    3. adjust interacting link width \n        cLoops2 montage -f test/chr21-chr21.ixy -bed test.bed -o test -simple \\\n                        -ppmw 10\n    \n    4. showing all PETs for a region, if in the bed file only contains one region\n        cLoops2 montage -f test/chr21-chr21.ixy -bed test.bed -o test -ext 0\n    \n\noptional arguments:\n  -h, --help     show this help message and exit\n  -d PREDIR      Assign data directory generated by cLoops2 pre to carry out analysis. \n  -o FNOUT       Output data directory / file name prefix, default is cLoops2_output.\n  -p CPU         CPUs used to run the job, default is 1, set -1 to use all CPUs\n                 available. Too many CPU could cause out-of-memory problem if there are\n                 too many PETs.\n  -cut CUT       Distance cutoff to filter cis PETs, only keep PETs with distance\n                 >=cut. Default is 0, no filtering.\n  -mcut MCUT     Keep the PETs with distance <=mcut. Default is -1, no filtering.\n  -v             Show cLoops2 verison number and exit.\n  ---            Following are sub-commands specific options. This option just show\n                 version of cLoops2.\n  -f FIXY        Input .ixy file generated by cLoops2 pre.\n  -bed BED       Input .bed file for target regions, 4th columns should be id/name for\n                 the region.\n  -ext EXT       Up-stream and down-stream extesion of target region length. Default is\n                 2. If the input bed already include up/down-stream regions, assign as 0.\n  -simple        Whether to only draw the representative interactions between two target\n                 regions as one arch, and not include the interactions in extended\n                 regions. Default is not, all interactions will be shown as archs..\n  -vp VIEWPOINT  Only show interactions with specific regions from all other regions.\n                 Name/id (4th column in .bed file) is need. Default is to show all\n                 releated interactions. Multiple names/ids can be assigned by seperation\n                 of comma.\n  -vmin VMIN     The minial scale for 1D pileup data. Default will be inferred from the\n                 data.\n  -vmax VMAX     The maxmial scale for 1D pileup data. Default will be inferred from the\n                 data.\n  -ppmw PPMW     Link line width indicator, short for 1 PETs per Million PETs line\n                 width, default is 10. Adjust this value when -simple is used. Decrease\n                 it if links are too bold and increase it when links are too thin.\n  -aw AW         Line width for each PET if -simple is not selected. Default is 1.\n  -no1D          Whether to not plot 1D profiles. Default is plot. Set this for Hi-C\n                 like data.\n```\n\n------\n### 16. Aggregation analysis for peaks, loops and domains\nRun **cLoops2 agg -h** to see details.\n```\nDo the aggregation analysis for peaks, loops, view points and domains.\n\nThe output figures can be used directly, and the data to generate the plot are \nalso saved for further customized analysis. \n\nFor the aggregated peaks analysis,input is a .bed file annotated with the \ncoordinates for the target regions/peaks/anchors. Output is a .pdf file \ncontaining a mean density plot and heatmap and a .txt file for the data. The \ndata in the .txt file and plot were normalized to RPM (reads per million).\n\nFor the aggregated view points analysis, input is a .bed file annotated with \ncoordinates for the target regions/peaks/anchors as view point. Output is a \n.pdf file containing a mean density plot and heatmap and a .txt file for the \ndata. The data in the .txt file and plot were normalized to \nlog2( RPM (reads per million)+1).\n\nFor the aggregated loops analysis, input is a _loops.txt file annotated with \nthe coordinates for target loops, similar to the format of BEDPE. Output is a \n.pdf file for mean heatmap and .npz file generated through numpy.savez for all \nloops and nearby regions matrix. The enrichment score (ES) in the plot is \ncalculated as: ES = mean( (PETs in loop)/(mean PETs of nearby regions) ). Other \nfiles except _loops.txt can be used as input, as long as the file contains key \ninformation in the first columns separated by tabs:\nloopId\tchrA\tstartA\tendA\tchrB\tstartB\tendB\tdistance\nloop-1\tchr21\t1000\t2000\tchr21\t8000\t9000\t7000\n\nThere is another option for loops analysis, termed as two anchors. Input file is \nsame to aggregated loops analysis. The whole region with assigned extesion\nbetween two anchors will be aggregated and 1D profile can show two anchors. The \nanalysis could be usefule to study/comapre different classes of anchors and \ncombinations, for example, considering CTCT motif directions, all left anchors\nCTCF motifs are in positive strand and in negative strand for all right anchors. \nIt could be interesting for some loops one anchor only bound by transcription \nfactor a and another anchor only bound by transcription b. \n\nFor the aggregated domains analysis, input is a .bed file annotated with the\ncoordinates for the domains, such as TADs. Output are a .pdf file for the upper \ntriangular heatmap and .npz file generated through numpy.savez for all domains \nand nearby region matrix. The enrichment score (ES) in the plot is calculated \nas mean( (two ends both with in domain PETs number)/( only one end in domain \nPETs number) ).\n\nExamples:\n    1. show aggregated peaks heatmap and profile \n        cLoops2 agg -d test -peaks peaks.bed -o test -peak_ext 2500 \\ \n                    -peak_bins 200 -peak_norm -skipZeros\n\n    2. show aggregated view points and aggregated bigWig signal\n        cLoops2 agg -d test -o test -viewPoints test_peaks.bed -bws CTCF.bw \n\n    3. show aggregated loops heatmap, 1D profile and aggregated bigWig signal\n        cLoops2 agg -d test -o test -loops test_loops.txt -bws CTCF.bw -1D \\\n                    -loop_norm\n    \n    3. show aggregated loops heatmap, 1D profile and aggregated bigWig signal\n       in two anchors mode\n        cLoops2 agg -d test -o test -twoAnchors test_loops.txt -bws CTCF.bw -1D \\\n                    -loop_norm\n\n    4. show aggregated domains heatmap, 1D profile and aggregated bigWig signal\n        cLoops2 agg -d test -o test -domains TAD.bed -bws CTCF.bw -1D \n    \n\noptional arguments:\n  -h, --help            show this help message and exit\n  -d PREDIR             Assign data directory generated by cLoops2 pre to carry out analysis. \n  -o FNOUT              Output data directory / file name prefix, default is cLoops2_output.\n  -p CPU                CPUs used to run the job, default is 1, set -1 to use all CPUs\n                        available. Too many CPU could cause out-of-memory problem if there are\n                        too many PETs.\n  -cut CUT              Distance cutoff to filter cis PETs, only keep PETs with distance\n                        >=cut. Default is 0, no filtering.\n  -mcut MCUT            Keep the PETs with distance <=mcut. Default is -1, no filtering.\n  -v                    Show cLoops2 verison number and exit.\n  ---                   Following are sub-commands specific options. This option just show\n                        version of cLoops2.\n  -peaks PEAKF          The .bed file for peaks-centric aggregation analysis.\n  -peak_ext PEAK_EXT    The nearby upstream and downstream regions (bp) from the peak center.\n                        Default is 5000.\n  -peak_bins PEAK_BINS  The bin size for the profile array of peaks. Default is 100.\n  -peak_norm            Whether to normalize the data in the peaks profile plot and\n                        heatmap with row-wise z-score. Default is not.\n  -viewPoints VIEWPOINTF\n                        The .bed file for view points -centric aggregation analysis.\n  -viewPointUp VIEWPOINTUP\n                        The upstream regions included for the aggreaged view points analysis.\n                        Default is 100000 bp.\n  -viewPointDown VIEWPOINTDOWN\n                        The downstream regions included for the aggreaged view points analysis.\n                        Default is 100000 bp.\n  -viewPointBs VIEWPOINTBS\n                        Contact matrix bin size for view points heatmap. Default is 1000 bp. \n  -viewPoint_norm       Whether to normalize the sub-matrix for each loop as divide the mean\n                        PETs for the matrix. Default is not.\n  -loops LOOPF          The _loop.txt file generated by cLoops2 for loops-centric\n                        aggregation analysis. The file first 8 columns are necessary.\n  -loop_ext LOOP_EXT    The nearby regions included to plot in the heatmap and calculation of\n                        enrichment for aggregation loop analysis, default is 10, should be\n                        even number.\n  -loop_cut LOOP_CUT    Distance cutoff for loops to filter. Default is 0.\n  -loop_norm            Whether to normalize the sub-matrix for each loop as divide the mean\n                        PETs for the matrix (except the loop region). Default is not.\n  -twoAnchors TWOANCHORSF\n                        The similar _loop.txt file generated by cLoops2 for two anchors\n                        aggregation analysis. The file first 8 columns are necessary.\n  -twoAnchor_ext TWOANCHOR_EXT\n                        The nearby regions of fold included to plot in heatmap.\n                        Default is 0.1.\n  -twoAnchor_vmin TWOANCHOR_VMIN\n                        The minimum value shown in the domain heatmap and colorbar.\n  -twoAnchor_vmax TWOANCHOR_VMAX\n                        The maxmum value shown in the domain heatmap and colorbar.\n  -domains DOMAINF      The .bed file annotated the domains such as TADs for aggregated\n                        domains-centric analysis.\n  -domain_ext DOMAIN_EXT\n                        The nearby regions of fold included to plot in heatmap and\n                        caculation of enrichment, default is 0.5.\n  -domain_vmin DOMAIN_VMIN\n                        The minimum value shown in the domain heatmap and colorbar.\n  -domain_vmax DOMAIN_VMAX\n                        The maxmum value shown in the domain heatmap and colorbar.\n  -1D                   Whether to plot the pileup 1D signal for aggregated loops, \n                        aggregated view points or aggregated domains. Default is not.\n  -bws BWS              BigWig tracks to plot above the aggregated loops heatmap (or under\n                        the aggregated domains heatmap), track name will be inferred from file\n                        name, for example a.bw,b.bw,c.bw. \n  -skipZeros            Whether to remove all 0 records. Default is not.\n\n```\n\n------\n### 17. Quantification of peaks, loops and domains\nRun **cLoops2 quant -h** to see details.\n```\nQuantify the peaks, loops and domains.  The output file will be the same as\noutputs of callPeaks, callLoops and callDomains.\n\nExamples:\n    1. quantify peaks \n        cLoops2 quant -d test -peaks peaks.bed -o test \n\n    2. quantify loops \n        cLoops2 quant -d test -loops test_loops.txt -o test\n    \n    3. quantify domains \n        cLoops2 quant -d test -domains test_domains.txt -o test\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -d PREDIR             Assign data directory generated by cLoops2 pre to carry out analysis. \n  -o FNOUT              Output data directory / file name prefix, default is cLoops2_output.\n  -p CPU                CPUs used to run the job, default is 1, set -1 to use all CPUs\n                        available. Too many CPU could cause out-of-memory problem if there are\n                        too many PETs.\n  -cut CUT              Distance cutoff to filter cis PETs, only keep PETs with distance\n                        >=cut. Default is 0, no filtering.\n  -mcut MCUT            Keep the PETs with distance <=mcut. Default is -1, no filtering.\n  -v                    Show cLoops2 verison number and exit.\n  ---                   Following are sub-commands specific options. This option just show\n                        version of cLoops2.\n  -peaks PEAKF          The .bed file for peaks-centric quantification.\n  -loops LOOPF          The _loop.txt file generated by cLoops2 for loops-centric\n                        quantification, as long as there are first 8 columns.\n  -domains DOMAINF      The _domains.txt file generated by cLoops2 for domains-centric\n                        quantification, as long as there are first 3 columns\n  -domain_bs DOMAINBINSIZE\n                        Candidate contact matrix resolution (bin size) to quantify domains, \n                        default is 10000. Only one integer is supported.\n  -domain_ws DOMAINWINSIZE\n                        The half window size used to calculate local correlation to quantify\n                        domains. Default is 500000 (500kb).\n  -domain_bdg           Whether to save the segregation score ad bedGraph file, default.\n                        is not.\n```\n\n------\n### 18. Annotation of loops to genes \nRun **cLoops2 anaLoops -h** to see details.\n```\nAnnotating loops:\n- find the closest TSS for each loop anchors\n- merge the loop anchors and classify them as enhancers or promoters based on \n  distance to nearest TSS\n- build the interaction networks for merged anchors \n- find the all interacted enhancers/promoters for each promoter  \n\nBasic mode 1: with -gtf, loops will be annotated as enhancer or promoter based \non distance to nearest gene. If a anchor overlapped with two/multiple promoters\n(often seen for close head-to-head genes), all will be reported. If no TSS \noverlaps, then nearest one will be assigned.  \n\nBasic mode 2: with -gtf -net, overlapped anchors will be merged and annoated as \nenhancer or promoter considering distance to genes. For each promoter, all \nlinked enhancer and promoter will be shown. If there are more than 3 direct or \nindirect enhancers for a promoter, HITS algorithm will be used to identify one\nhub for indirect enhancer and one hub for indirect enhancer. \n\nExamples:\n    1. annotate loops for target gene, basic mode 1\n        cLoops2 anaLoops -loops test_loops.txt -gtf genecode.gtf\n    \n    2. annotate loops for target transcripts (alternative TSS), basic mode 1\n        cLoops2 anaLoops -loops test_loops.txt -gtf genecode.gtf -tid\n    \n    3. find a gene's all linked enhancer or promoter, basic mode 2\n        cLoops2 anaLoops -loops test_loops.txt -gtf genecode.gtf -net\n\noptional arguments:\n  -h, --help    show this help message and exit\n  -d PREDIR     Assign data directory generated by cLoops2 pre to carry out analysis. \n  -o FNOUT      Output data directory / file name prefix, default is cLoops2_output.\n  -p CPU        CPUs used to run the job, default is 1, set -1 to use all CPUs\n                available. Too many CPU could cause out-of-memory problem if there are\n                too many PETs.\n  -cut CUT      Distance cutoff to filter cis PETs, only keep PETs with distance\n                >=cut. Default is 0, no filtering.\n  -mcut MCUT    Keep the PETs with distance <=mcut. Default is -1, no filtering.\n  -v            Show cLoops2 verison number and exit.\n  ---           Following are sub-commands specific options. This option just show\n                version of cLoops2.\n  -loops FLOOP  The _loop.txt file generated by cLoops2 callLoops or callDiffLoops.\n  -gtf GTF      GTF file annotation for genes.\n  -tid          Whether to use transcript id instead of gene id for annotation. Default\n                is not.\n  -pdis PDIS    Distance limitation for anchor to nearest gene/transcript TSS to define\n                as promoter. Default is 2000 bp.\n  -net          Whether to use network method to find all enhancer/promoter links based\n                on loops. Default is not. In this mode, overlapped anchors will be\n                merged and annotated as enhancer/promoter, then for a gene, all linked\n                node will be output.\n  -gap GAP      When -net is set, the distance for close anchors to merge. Default is 1.\n\n```\n\n------\n### 19. Find target genes of genomic regions with cLoops2 anaLoops output\nRun **cLoops2 findTargets -h** to see details.\n```\nFind target genes of genomic regions (peaks, SNPs) through enhancer-promoter \nnetworks. Output from cLoops2 anaLoops with suffix of _ep_net.sif and\n_targets.txt are needed.\n\nExamples:\n    1. find target genes of peaks/SNPs\n        cLoops2 findTargets -net test_ep_net.sif -tg test_targets.txt \\\n                            -bed GWAS.bed -o test \n\noptional arguments:\n  -h, --help  show this help message and exit\n  -d PREDIR   Assign data directory generated by cLoops2 pre to carry out analysis. \n  -o FNOUT    Output data directory / file name prefix, default is cLoops2_output.\n  -p CPU      CPUs used to run the job, default is 1, set -1 to use all CPUs\n              available. Too many CPU could cause out-of-memory problem if there are\n              too many PETs.\n  -cut CUT    Distance cutoff to filter cis PETs, only keep PETs with distance\n              >=cut. Default is 0, no filtering.\n  -mcut MCUT  Keep the PETs with distance <=mcut. Default is -1, no filtering.\n  -v          Show cLoops2 verison number and exit.\n  ---         Following are sub-commands specific options. This option just show\n              version of cLoops2.\n  -net FNET   The _ep_net.sif file generated by cLoops2 anaLoops.\n  -tg FTG     The _targets.txt file generated by cLoops2 anaLoops.\n  -bed FBED   Find target genes for regions, such as anchors, SNPs or peaks.\n\n```\n\n------\n------\n## Extended Analysis Application Scripts\nThe following analysis application scripts are available when cLoops2 is installed. The majority of them can be independently run. The -h option can show example usages and details of parameters. Some of them will be integrated into cLoops sub-programmes if well tested and frequently used. More will be added. \n\n### File Format Conversion\n- [hicpro2bedpe.py](https://github.com/YaqiangCao/cLoops2/blob/master/scripts/hicpro2bedpe.py) : convert HiC-Pro output allValidPairs file to BEDPE file as input of cLoops2.   \n- [juicerLong2bedpe.py](https://github.com/YaqiangCao/cLoops2/blob/master/scripts/juicerLong2bedpe.py): convert Juicer output long format interaction file to BEDPE file as input of cLoops2.   \n- [getBedpeFBed.py](https://github.com/YaqiangCao/cLoops2/blob/master/scripts/getBedpeFBed.py): convert single-end reads in BED format to paired-end reads in BEDPE format with expected fragment size as input of cLoops2 to call peaks.    \n\n---\n### Analysis without plot\n- [getDI.py](https://github.com/YaqiangCao/cLoops2/blob/master/scripts/getDI.py): calculate the [Directionality Index](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3356448/) as <img align=\"center\" src=\"https://latex.codecogs.com/svg.latex?\\Large&space;DI_{x}=\\frac{(B-A)}{|B+A|}*\\frac{(A-E)^2+(B-E)^2}{E},E=\\frac{A+B}{2}\"/>, where **x** is the bin and **A** is the interaction reads within the region from specific upstream to bin **x**, and **B** is the downstream reads.  \n\n- [getFRiF.py](https://github.com/YaqiangCao/cLoops2/blob/master/scripts/getFRiF.py): calculate the **F**raction of **R**eads **i**n **F**eatures (FRiF), the features could be domains and peaks annotated with .bed file or domains/stripes/loops with .txt file such as the \\_loop.txt file.\n\n- [getIS.py](https://github.com/YaqiangCao/cLoops2/blob/master/scripts/getIS.py): calculate the [insulation score](https://www.nature.com/articles/nature20158) with a little modification for the data with output of a bedGraph file, the math formula used is <img align=\"center\" src=\"https://latex.codecogs.com/svg.latex?\\Large&space;IS_{x}=-log2(\\frac{I(x-s.x+s)-I(x,x+s),I(x-s,x)}{I(x-s,x+s)})\" />, where ***x*** is the genomic location, which can be bins or exact base pair, ***I(x-s,x+s)*** is the interactions/PETs observed in the region from ***x-s*** to ***x+s***, and ***s*** should be set a little large, such as 100kb to observe a good fit for the insulation score and TAD boundaries.  \n\n- [getLocalIDS.py](https://github.com/YaqiangCao/cLoops2/blob/master/scripts/getcLocalIDS.py): calculate the local interaction density score for the data with output a bedGraph file, the math formula used is <img align=\"center\" src=\"https://latex.codecogs.com/svg.latex?\\Large&space;IDS_{x}=\\sum_{i=-5}^{5}{\\frac{I(x,x_{i})}{N}},i\\neq0\" />, where ***x*** is the genomic location for the target bin, ***N*** is the total PETs in the target chromosomal, ***I(x,x_i)*** is the observed PETs linking the region bin ***x*** and the ith nearby bin of the same size. \n\n- [getPETsAno.py](https://github.com/YaqiangCao/cLoops2/blob/master/scripts/getPETsAno.py): get the PETs ratio of enhancer-promoter, enhancer-enhancer, promoter-promoter, enhancer-none, promoter-none, none-none interactions.\n\n- [tracPre.py](https://github.com/YaqiangCao/cLoops2/blob/master/scripts/tracPre.py): pre-process the raw reads of FASTQ files of Trac-looping data to the reference genome and obtain the unique PETs with quality control results.\n\n- [tracPre2.py](https://github.com/YaqiangCao/cLoops2/blob/master/scripts/tracPre2.py): pre-process the raw reads of FASTQ files of Hi-TrAC data to the reference genome and obtain the unique PETs with quality control results.\n\n-----\n------\n## Input, Intermediate, Output Files\n- [.bedpe](#.bedpe)\n- [.ixy](#.ixy)\n- [_peaks.txt](#_peaks.txt)\n- [_loops.txt](#_loops.txt)\n- [_dloops.txt](#_dloops.txt)\n- [_domains.txt](#_domains.txt)\n\n----\n<a name=\".bedpe\"></a>\n### Input .bedpe file \nMapped PETs in [BEDPE format](http://bedtools.readthedocs.io/en/latest/content/general-usage.html), compressed files with gzip are also accepted, following columns are necessary: chrom1 (1st),start1 (2),end1 (3),chrom2 (4),start2 (5),end2 (6),strand1 (9),strand2 (10). For the column of name or score, \".\" is accepted. Columns are separated by \"\\t\".\nFor example as following:\n```\nchr1\t9945\t10095\tchr1\t248946216\t248946366\t.\t.\t+\t+\nchr1\t10034\t10184\tchr1\t180987\t181137\t.\t.\t+\t-\nchr1\t10286\t10436\tchr1\t181103\t181253\t.\t.\t+\t-\nchr1\t10286\t10436\tchr11\t181103\t181253\t.\t.\t+\t-\nchr11\t10286\t10436\tchr1\t181103\t181253\t.\t.\t+\t-\n...\n```\n\n------\n<a name=\".ixy\"></a>\n### Intermediate .ixy file\nnumpy.array of (x,y) saved to [joblib.dump](https://joblib.readthedocs.io/en/latest/generated/joblib.dump.html) for fast access of the interaction EPTs and contact matrix at any resolution, nearly all cLoops2 related analysis are based on this file type.\n```\n10099025\t10099048\n39943889\t39943890\n18391007\t18391853\n35502951\t35502951\n10061555\t10061557\n...\n```\n\n------\n<a name=\"_peaks.txt\"></a>\n### Output \\_peaks.txt file \ncolumn | name | explanation\n------ | ---- | ------------\n0th | peakId | id for a peak, for example peak\\_chr1-chr1-1\n1th | chrom | chromosomal for the peak \n2th | start | genomic coordinate of the start site\n3th | end | genomic coordinate of the end site \n4th | summit | genomic coordinate of peak summit\n5th | length | length of the peak\n6th | counts | observed reads number in the peak \n7th | RPKM | RPKM for the reads density in the peak\n8th | enrichmentScore | enrichment score for the peak, calculated by observed PETs number divided by the mean PETs number of nearby 10 fold and 20 fold regions\n9th | poissonPvalue | Poisson test p-value for the loop after Bonferroni correction\n10th | controlCounts| if control data such as input/IgG is assigned, the observed reads number in peak region for control data\n11th | controlRPKM | if control data assigned, RPKM for the reads density in the peak region for control data\n12th | controlScaledCount | if control data assigned, the scaled expected counts used for Poisson test/enrichment score against control data\n13th | enrichmentScoreVsControl | if control data assigned, enrichment score of target vs. control\n14th | poissonPvalueVsControl | if control data assigned, Poisson test p-value of target vs. control after Bonferroni correction\n15th | significant | 1 or 0, 1 means we think the peak is significant compared to local background and control (if assigned)\n\n------\n<a name=\"_loops.txt\"></a>\n### Output \\_loops.txt file \ncolumn | name | explanation\n------ | ---- | ------------\n0th | loopId | id for a loop, for example loop\\_chr1-chr1-1\n1th | chromA | chromosomal for the loop first anchor\n2th | startA | genomic coordinate of the start site for the first anchor\n3th | endA | genomic coordinate of the end site for the first anchor\n4th | chromB | chromosomal for the loop second anchor\n5th | startB | genomic coordinate of the start site for the second anchor\n6th | endB | genomic coordinate of the end site for the second anchor\n7th | distance | distance (bp) between the centers of the anchors for the loop\n8th | centerA | genomic coordinate of the center site for the first anchor\n9th | centerB | genomic coordinate of the center site for the second anchor\n10th | readsA | observed PETs number for the first anchor\n11th | readsB | observed PETs number for the second anchor\n12th | cis | whether the loop is a intra-chromosomal loop (cis)\n13th | PETs | observed PETs number linking the two anchors\n14th | density | similarly to that of RPKM (reads per kilobase per million):<img align=\"center\" src=\"https://latex.codecogs.com/svg.latex?\\Large&space;density=\\frac{r}{N\\times(anchorLengthA+anchorLengthB)}\\times10^9\" />\n15th | enrichmentScore | enrichment score for the loop, calculated by observed PETs number divided by the mean PETs number of nearby permutated regions\n16th | P2LL | peak to the lower left, calculated similar to that of Juicer\n17th | FDR | false discovery rate for the loop, calculated as the number of permutated regions that there are more observed PETs than the region  \n18th | binomalPvalue | binomal test p-value for the loop, updated caculation, different from cLoops\n19th | hypergeometricPvalue | hypergeometric test p-value for the loop\n20th | poissonPvalue | Poisson test p-value for the loop\n21th | xPeakpoissonPvalue | Poisson test p-value for the left anchor potential peak p-value\n22th | yPeakpoissonPvalue | Poisson test p-value for the right anchor potential peak p-value\n23th | significant | 1 or 0, 1 means we think the loop is significant compared to permutated regions. In cLoops2, only significant loops are written to the file. \n\n------\n<a name=\"_dloops.txt\"></a>\n### Output \\_dloops.txt file \ncolumn | name | explanation\n------ | ---- | ------------\n0th | loopId | id for a loop, for example loop\\_chr1-chr1-1\n1th | chromA | chromosomal for the loop first anchor\n2th | startA | genomic coordinate of the start site for the first anchor\n3th | endA | genomic coordinate of the end site for the first anchor\n4th | chromB | chromosomal for the loop second anchor\n5th | startB | genomic coordinate of the start site for the second anchor\n6th | endB | genomic coordinate of the end site for the second anchor\n7th | distance | distance (bp) between the centers of the anchors for the loop\n8th | centerA | genomic coordinate of the center site for the first anchor\n9th | centerB | genomic coordinate of the center site for the second anchor\n10th | rawTargetAnchorAReads | observed PETs number for the first anchor in target sample \n11th | rawTargetAnchorBReads | observed PETs number for the second anchor in target sample \n12th | rawControlAnchorAReads | observed PETs number for the first anchor in control sample \n13th | rawControlAnchorBReads | observed PETs number for the second anchor in control sample \n14th | scaledTargetAnchorAReads | scaled PETs number for the first anchor in target sample \n15th | scaledTargetAnchorBReads | scaled PETs number for the second anchor in target sample \n16th | rawTargetCounts | raw PETs number for the loop in target sample \n17th | scaledTargetCounts | scaled PETs number for the loop in target sample, fitting to control sample\n18th | rawControlCounts | raw PETs number for the loop in control sample \n19th | rawTargetNearbyMedianCounts | raw median PETs number for the loop nearby permutation regions in target sample\n20th | scaledTargetNearbyMedianCounts | scaled median PETs number for the loop nearby permutation regions in target sample, fitting to control sample\n21th | rawControlNearbyMedianCounts | raw median PETs number for the loop nearby permutation regions in control sample \n22th | rawTargetES | target sample rawTargetCounts/rawTargetNearbyMedianCounts \n23th | rawControlES | control sample rawControlCounts/rawControlNearbyMedianCounts \n24th | targetDensity | raw interaction density in target sample, RPKM\n25th | controlDensity | raw interaction density in control sample, RPKM\n26th | rawFc | raw fold change of the interaction density, log2(target/control)\n27th | scaledFc | scaled fold change of PETs, log2( scaledTargetCounts/rawControlCounts )\n28th | poissonPvalue | possion p-value for the significance test after Bonferroni correction\n29th | significant | 1 or 0, 1 means we think the loop is significant differentlly enriched\n\n------\n<a name=\"_domains.txt\"></a>\n### Output \\_domains.txt file \ncolumn | name | explanation\n------ | ---- | ------------\n0th | domainId | id for a domain, for example domain\\_0\n1th | chrom | chromosomal for the loop first anchor\n2th | start | genomic coordinate of the start site for the domain\n3th | end | genomic coordinate of the end site for the domain \n4th | length | length of the domain\n5th | binSize | bin size used for the matrix to call the domain  \n6th | winSize | window size used for the matrix to call the domain  \n7th | segregationScore | mean segregation score for all bins within the domain  \n8th | totalPETs | number of total PETs in the domain\n9th | withinDomainPETs | number of PETs only interacting within the domain\n10th | enrichmentScore | (withinDomainPETs) / (totalPETs-withinDomainPETs)\n11th | density | similarly to that of RPKM (reads per kilobase per million):<img align=\"center\" src=\"https://latex.codecogs.com/svg.latex?\\Large&space;density=\\frac{withinDomainPETs}{(libraryTotalPETs)\\times(domainLength)}\\times10^9\" />\n\n------\n<a name=\"_loopsGtfAno.txt\"></a>\n### Output \\_loopsGtfAno.txt file \ncolumn | name | explanation\n------ | ---- | ------------\n0th | loopId  | loopId from input file\n1th | typeAnchorA  | annotated type of anchor a (left anchor), enhancer or promoter\n2th | typeAnchorB  | annotated type of anchor b (right anchor)\n3th | nearestDistanceToGeneAnchorA  | distance of anchor a to nearest TSS \n4th | nearestDistanceToGeneAnchorB  | distance of anchor b to nearest TSS \n5th | nearestTargetGeneAnchorA  | anchor a nearest TSS gene, for example chr21:34836286-34884882\\|+\\|AP000331.1 (named by rules of chrom:start-end\\|strand\\|geneName). If a promoter overlaps two head-to-head genes, all genes will be reported by seperation of a comma.\n6th | nearestTargetGeneAnchorB  | anchor b nearest TSS gene\n\n------\n<a name=\"_mergedAnchors.txt\"></a>\n### Output \\_mergedAnchors.txt file \ncolumn | name | explanation\n------ | ---- | ------------\n0th | anchorId  | id for merged anchors. For example, chr21:14025126-14026192\\|Promoter (named by the rule of: chrom:start-end\\|type)\n1th | chrom  | chromosome\n2th | start  | start\n3th | end  | end\n4th | type  | annotated type for the anchor, enhancer or promoter\n5th | nearestDistanceToTSS  | distance of anchor a to nearest TSS\n6th | nearestGene  | nearest gene name. If a promoter overlaps two head-to-head genes, all genes will be reported by seperation of a comma.    \n7th | nearestGeneLoc | neart gene information. For example, chr21:34787801-35049344\\|-\\|RUNX1 (named by the rule of: chrom:start-end\\|strand\\|name). If a promoter overlaps two head-to-head genes, all genes will be reported by seperation of a comma.    \n\n------\n<a name=\"_loop2anchors.txt\"></a>\n### Output \\_loop2anchors.txt file \ncolumn | name | explanation\n------ | ---- | ------------\n0th | loopId  | loopId from input file\n1th | mergedAnchorA  | original anchor a (left anchor) to new merged anchor id\n2th | mergedAnchorB  | original anchor b (right anchor) to new merged anchor id\n\n------\n<a name=\"_targets.txt\"></a>\n### Output \\_targets.txt file \ncolumn | name | explanation\n------ | ---- | ------------\n0th | promoter  | annotated anchors that overlapped or very close to gene's transcription start site. For example, chr21:35043062-35051895\\|Promoter (named by the rule of: chrom:start-end\\|Promoter).\n1th | PromoterTarget  | promoter target genes. If a promoter is shared by multiple genes, all genes will be reported and seperated by comma. For example, chr21:34787801-35049344\\|-\\|RUNX1 (named by the rule of: chorm:start-end\\|strand\\|name.\n2th | directEnhancer  | enhancers that directly looping with target promoter. Multiple enhancers will be reported and seperated by comma. For example, chr21:35075636-35077527\\|Enhancer,chr21:35026356-35028520\\|Enhancer,chr21:34801302-34805056\\|Enhancer.\n3th | indirectEnhancer  | enhancers that indirectly looping with target promoter, by enhancer-enhancer-promoter or enhancer-promoter-promoter. Multiple enhancers will be reported and seperated by comma.\n4th | directPromoter  | other promoters directly looping with target promoter. \n5th | indirectPromoter | other promoters indirectly looping with target promoter, by promoter-enhancer-promoter or promoter-promoter-promoter. \n6th | directEnhancerHub | hub of direct enhancer. If there are more than 2 direct enhancers, using HITS algorithm to find the most linked one and report. \n7th | indirectEnhancerHub | hub of indirect enhancer. If there are more than 2 indirect enhancers, using HITS algorithm to find the most linked one and report. \n\n\n--------\n--------\n## cLoops2 citations\n\n--------\n--------\n## cLoops2 updates",
    "bugtrack_url": null,
    "license": "",
    "summary": "Loop-calling and peak-calling for sequencing-based interaction data, including related analysis utilities.",
    "version": "0.0.5",
    "project_urls": {
        "Homepage": "https://github.com/YaqiangCao/cLoops2",
        "Source": "https://github.com/YaqiangCao/cLoops2"
    },
    "split_keywords": [
        "peak-calling",
        "loop-calling",
        "hi-trac",
        "interaction",
        "visualization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "50d38f00df2c414dc97c900a162c2075c28142e640ccbb36ab9600069a913f9b",
                "md5": "ec109d115b5396f8acd2f2ed99649625",
                "sha256": "2342b783a54ae7ba7fbbf45dfeb547efe92850bdf0321b0e996be73f35899e72"
            },
            "downloads": -1,
            "filename": "cLoops2-0.0.5-py3.6.egg",
            "has_sig": false,
            "md5_digest": "ec109d115b5396f8acd2f2ed99649625",
            "packagetype": "bdist_egg",
            "python_version": "0.0.5",
            "requires_python": ">=3",
            "size": 383490,
            "upload_time": "2023-07-20T15:37:19",
            "upload_time_iso_8601": "2023-07-20T15:37:19.755020Z",
            "url": "https://files.pythonhosted.org/packages/50/d3/8f00df2c414dc97c900a162c2075c28142e640ccbb36ab9600069a913f9b/cLoops2-0.0.5-py3.6.egg",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-20 15:37:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "YaqiangCao",
    "github_project": "cLoops2",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "cloops2"
}
        
Elapsed time: 0.24718s