hicpeaks

Name	hicpeaks JSON
Version	0.3.8 JSON
	download
home_page	https://github.com/XiaoTaoWang/HiCPeaks/
Summary	A Python implementation for BH-FDR and HiCCUPS
upload_time	2024-11-21 02:21:27
maintainer	None
docs_url	None
author	XiaoTao Wang
requires_python	None
license	None
keywords	hi-c chromatin interaction contact loops
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            HiCPeaks
========
.. image:: https://static.pepy.tech/personalized-badge/hicpeaks?period=total&units=international_system&left_color=black&right_color=orange&left_text=Downloads
 :target: https://pepy.tech/project/hicpeaks

*hicpeaks* provides a Python implementation for BH-FDR and HiCCUPS [1]_.

Installation
============
*hicpeaks* is developed and tested on UNIX-like operating systems, and the following packages or softwares are
required:

Python requirements:

a) Python 2.7/3.5+
b) Multiprocess
c) Numpy
d) Scipy
e) Matplotlib
f) Pandas
g) Statsmodels
h) Scikit-Learn
i) H5py
j) Cooler

Other requirements:

- ucsc-fetchchromsizes

*conda*, an excellent package manager, can be used to install all requirements above.

Install Requirements Through Conda
----------------------------------
All above requirements can be installed through the conda package manager.

.. note:: If you have the Anaconda Distribution installed, you already have it.

Choose an appropriate `Miniconda installer <https://conda.io/miniconda.html>`_ for your system,
then in your terminal window type the following and follow the prompts on the installer screens::

    $ bash Miniconda3-latest-Linux-x86_64.sh

After that, update the environment variables to finish the Conda installation::

    $ source ~/.bashrc

Next, you need to set up channels to make all packages listed above accessible (note
that the order is important to guarantee the correct priority)::
    
    $ conda config --add channels defaults
    $ conda config --add channels bioconda
    $ conda config --add channels conda-forge

Then type and execute the commands below to satisfy the requirements::

    $ conda create -n HiCPeaks numpy scipy matplotlib pandas statsmodels scikit-learn h5py multiprocess cooler ucsc-fetchchromsizes
    $ conda activate HiCPeaks

Install hicpeaks
----------------
Finally, hicpeaks can be installed from PyPI using pip::

    $ pip install -U hicpeaks

Overview
========
*hicpeaks* comes with 6 scripts: *toCooler*, *pyBHFDR*, *pyHICCUPS*, *combine-resolutions*, *peak-plot*,
and *apa-analysis*.

- toCooler

  Store TXT/NPZ bin-level Hi-C data into the `cooler <https://github.com/mirnylab/cooler>`_ container.

  1. I have included a sample data with the *hicpeaks* source code to illustrate how you should prepare your
     data in TXT format. It's quite easy, just remember 3 points: 1. the file name should follow this pattern
     "chrom1_chrom2.txt" (remove prefix from your chromosome labels, i.e. "chr1" should be "1", and "chrX" should
     be "X"); 2. each file should only contain 3 columns, corresponding to "bin1" of "chrom1", "bin2" of "chrom2",
     and "contact frequency" (**don't** perform any normalization processes); 3. all files at the same resolution
     should be placed in the same folder.
  2. NPZ format is another bin-level Hi-C data container which can extremely speed up data loading. *hicpeaks*
     supports NPZ files generated by old versions of `runHiC (<0.8.0) <https://github.com/XiaoTaoWang/HiC_pipeline>`_ and
     `TADLib (<0.4.0) <https://github.com/XiaoTaoWang/TADLib>`_.

- pyBHFDR

  A CPU-based python implementation for the BH-FDR algorithm. Rao et al (2014) stated in their supplementary material that
  this algorithm is robust enough to obtain all main results of their paper. Compared with HiCCUPS, BH-FDR doesn't use
  λ-chunk in multiple hypothesis tests, and only considers the Donut background region when calculating the
  expected values.

- pyHICCUPS

  A CPU-based python implementation for the HiCCUPS algorithm. Besides the donut region, HiCCUPS also considers the
  lower-left, the vertical, and the horizontal backgrounds when calculating the expected values. And λ-chunk is used
  to overcome several multiple hypothesis testing challenges for Hi-C data. Finally, while BH-FDR can only detect chromatin
  interactions near the diagonal (<2Mb), HiCCUPS is able to detect super long-range interactions. Here, *pyHICCUPS* keeps
  all main concepts of the original algorithm except for these points:

  1. *pyHICCUPS* excludes vertical and horizontal backgrounds from its calculation.
  2. There are two critical parameters related to the loop definition in HiCCUPS: the peak width *p* and the donut width *w*.
     In original implementation, they are set exclusively for each certain resolution, specifically, *p=1* and *w=3* at 25Kb,
     *p=2* and *w=5* at 10Kb, and *p=4* and *w=7* at 5Kb. To improve the sensitivity, *pyHICCUPS* can calculate and output
     the union peak calls from all parameter combinations *(1,3)*, *(2,5)*, *(4,7)* in a single run.
  3. Due to computational complexity, the search space still need to be limited, for example, within 5Mb/10Mb.

- combine-resolutions

  Combine peak calls from different resolutions in a way similar to original *HiCCUPS*. Briefly, it excludes redundant lower
  resolution peaks while filters out low-confidence high resolution peaks.

- peak-plot

  Visualize peaks (or chromatin loops) on a local contact matrix.

- apa-analysis

  Perform Aggregate Peak Analysis (APA).

QuickStart
==========
This tutorial will guide you through the basic usage of all scripts distributed with *hicpeaks*.

toCooler
--------
If you have already created a cooler file for your Hi-C data, skip to the next section
`pyBHFDR and pyHICCUPS <https://github.com/XiaoTaoWang/HiCPeaks/blob/master/README.rst#pybhfdr-and-pyhiccups>`_,
go on otherwise.

First, you should store your TXT/NPZ bin-level Hi-C data into a cooler file by using *toCooler*. Let's begin
with our sample data below. Suppose you are in the *hicpeaks* source code root folder: change your current
working directory to the sub-folder *example*::

    $ cd example
    $ ls -lh *

    -rw-r--r-- 1 xtwang  18 May  4 18:00 datasets
    -rw-r--r-- 1 xtwang 293 May  4 18:00 hg38.chromsizes

    25K:
    total 12M
    -rw-r--r-- 1 xtwang 12M May  4 18:00 21_21.txt

There is a sub-directory called *25K* and a metadata file called *datasets*. The *25K* folder contains chromatin
interactions of chromosome 21 of the K562 cell line at the 25K resolution, and the *datasets* describes the data
that need to be transformed::

    $ cd 25K
    $ head -5 21_21.txt

    201	703	1
    201	1347	1
    201	1351	1
    201	1524	1
    201	1691	1

    $ cd ..
    $ cat datasets

    res:25000
      ./25K

You should construct your TXT files (no head, no tail) with 3 columns, which indicate "bin1 of the 1st chromosome",
"bin2 of the 2nd chromosome", and "contact frequency" respectively. See `Overview <https://github.com/XiaoTaoWang/HiCPeaks#overview>`_
above.

To transform this data to the *cooler* format, just run the command below::

    $ toCooler -O K562-MboI-parts.cool -d datasets --assembly hg38 --nproc 1

*toCooler* routinely fetch sizes of each chromosome from UCSC with the provided genome assembly name (here hg38).
However, if your reference genome is not holded in UCSC, you can also build a file like "hg38.chromsizes" in
current working directory, and pass the file path to the argument "--chromsizes-file".

Type ``toCooler`` with no arguments on your terminal to print detailed help information for each parameter.

For this dataset, *toCooler* will create a cooler file named "K562-MboI-parts.cool", and your data will be stored under
the URI "K562-MboI-parts.cool::25000".

This tutorial only illustrates a very simple case, in fact the metadata file may contain list of resolutions (if you
have data at different resolutions for the same cell line) and corresponding folder paths (both relative and absolute
path are accepted, and if your data are in the NPZ format, this path should point to the NPZ file)::

    res:10000
      /absoultepath/10K
    
    res:25000
      ../relativepath/25K
    
    res:40000
      /npzfile/anyprefix.npz

Then *toCooler* will generate a single cooler file storing all the specified data under different cooler URI.
Suppose your cool file is named "specified_cooler_path", the above data will be stored at
"specified_cooler_path::10000", "specified_cooler_path::25000", and "specified_cooler_path::40000", respectively.

pyBHFDR and pyHICCUPS
---------------------
After you have obtained a cool file, you can call peaks or chromatin loops using *pyBHFDR* or *pyHICCUPS*::

    $ pyBHFDR -O K562-MboI-BHFDR-loops.txt -p K562-MboI-parts.cool::25000 -C 21 --pw 1 --ww 3

Or::

    $ pyHICCUPS -O K562-MboI-HICCUPS-loops.txt -p K562-MboI-parts.cool::25000 --pw 1 2 4 --ww 3 5 7 --only-anchors

Type ``pyBHFDR`` or ``pyHICCUPS`` on your terminal to print detailed help information for each parameter.

Before step to the next section, let's list the contents under current working directory again::

    $ ls -lh

    total 852K
    drwxr-xr-x 4 xtwang  128 May  4 18:21 25K/
    -rw-r--r-- 1 xtwang  17K May  4 18:23 K562-MboI-BHFDR-loops.txt
    -rw-r--r-- 1 xtwang  15K May  4 18:23 K562-MboI-HICCUPS-loops.txt
    -rw-r--r-- 1 xtwang 723K May  4 18:22 K562-MboI-parts.cool
    -rw-r--r-- 1 xtwang   18 May  4 18:21 datasets
    -rw-r--r-- 1 xtwang  293 May  4 18:21 hg38.chromsizes
    -rw-r--r-- 1 xtwang 2.2K May  4 18:23 pyBHFDR.log
    -rw-r--r-- 1 xtwang 8.5K May  4 18:23 pyHICCUPS.log
    -rw-r--r-- 1 xtwang  17K May  4 18:22 tocooler.log

The detected loops are reported in a customized `bedpe <https://bedtools.readthedocs.io/en/latest/content/general-usage.html#bedpe-format>`_
format. The first 10 columns are identical to the `official definition <https://bedtools.readthedocs.io/en/latest/content/general-usage.html#bedpe-format>`_,
and the additional fields are:

11. Fold enrichment score calculated from the donut background.
12. The p value calculated from the donut background.
13. The q value calculated from the donut background.
14. Fold enrichment score calculated from the lower-left background.
15. The p value calculated from the lower-left background.
16. The q value calculated from the lower-left background.

Peak Visualization
------------------
Now, you can visualize the detected peaks/loops using *peak-plot*::

    $ peak-plot -O test-HICCUPS.png -p K562-MboI-parts.cool::25000 -I K562-MboI-HICCUPS-loops.txt \
      -C 21 -S 25000000 -E 29500000 --clr-weight-name weight --vmin 0 --vmax 0.008

The output figure should look like this:

.. image:: ./figures/test-HICCUPS-new.png
        :align: center


Aggregate Peak Analysis
-----------------------
To inspect the overall loop patterns of the detected peaks, you can use the *apa-analysis* script::

    $ apa-analysis -O apa.png -p K562-MboI-parts.cool::25000 -I K562-MboI-HICCUPS-loops.txt --clr-weight-name weight --vmax 2

The output plot should look like this:

.. image:: ./figures/apa-new.png
        :align: center

Combine different resolutions
-----------------------------
The inputs to *combine-resolutions* are loop annotation files (*bedpe*) at different resolutions. If an interaction
is detected as a peak in both resolutions, this script records the precise coordinates in finer resolutions and discards
the coarser resolution one. And a long-range (determined by the ``--min-dis`` parameter) peak call at high resolutions
(any resolutions lower than the ``--good-res`` cutoff, note that lower values correspond to higher resolutions) will be
treated as a false positive if it could not be identified at lower resolutions (any resolutions equal to or greater than
the ``--good-res`` cutoff). Here's a *pseudo* command with 3 loop files at 5Kb, 10Kb, and 20Kb respectively::

    $ combine-resolutions -O K562-MboI-pyHICCUPS-combined.bedpe -p K562-MboI-pyHICCUPS-5K.txt K562-MboI-pyHICCUPS-10K.txt K562-MboI-pyHICCUPS-20K.txt -R 5000 10000 20000 -G 20000 -M 100000

Performance
===========
The table below shows a performance test for the *toCooler*, *pyBHFDR* , and *pyHICCUPS* scripts:

- Processor: 2.6 GHz Intel Core i7, Memory: 16 GB 2400 MHz DDR4
- Software version: *hicpeaks 0.3.0*
- At the 40Kb resolution, ``--pw`` and ``--ww`` are set to 1 and 3, respectively; at the 10Kb resolution,
  they are set to 2 and 5, respectively.
- The original Hi-C data is stored in TXT
- Number of proccesses assigned: 1
- Valid contacts: total number of non-zero pixels in intra-chromosomal matrices
- Running time format: hr: min: sec

+--------------+----------------+--------------+--------------+--------------+--------------+--------------+--------------+
| Datasets     | Valid contacts |          toCooler           |           pyBHFDR           |          pyHICCUPS          |
+--------------+----------------+--------------+--------------+--------------+--------------+--------------+--------------+
|                               | Memory Usage | Running time | Memory Usage | Running time | Memory Usage | Running time |
+==============+================+==============+==============+==============+==============+==============+==============+
| T47D (40K)   |   25,216,875   |    <600M     |    0:07:55   |    <600M     |    0:01:34   |    <600M     |    0:04:17   |
+--------------+----------------+--------------+--------------+--------------+--------------+--------------+--------------+
| K562 (40K)   |   49,088,465   |    <1.2G     |    0:21:37   |    <1.0G     |    0:01:49   |    <1.0G     |    0:03:21   |
+--------------+----------------+--------------+--------------+--------------+--------------+--------------+--------------+
| K562 (10K)   |  139,884,876   |    <3.0G     |    1:00:07   |    <2.0G     |    0:24:53   |    <4.0G     |    1:57:33   |
+--------------+----------------+--------------+--------------+--------------+--------------+--------------+--------------+

.. note:: Both *pyBHFDR* and *pyHICCUPS* support parallel computation (``--nproc``). If your computer has sufficient memory, the
          calculation should end within 30 minutes even for high-resolution matrices.

Release Notes
=============
Version 0.3.5 (08/28/2022)
--------------------------
- Added parameters to *peak-plot* and *apa-analysis* so that the output figures can be more finely tuned

Version 0.3.4 (05/04/2019)
--------------------------
- Improved the efficiency of the local clustering algorithm
- Changed the output loop format to bedpe

Version 0.3.3 (03/08/2019)
--------------------------
- Made *toCooler* support the float data type
- Removed ticklabels in APA plot

Version 0.3.2 (03/03/2019)
--------------------------
- Supported combination of different resolutions
- Improved the local clustering algorithm
- Added the APA analysis module
- Dealed with the compatiblility with cooler 0.8

Version 0.3.0 (09/03/2018)
--------------------------
- Removed the horizontal and vertical backgrounds
- Supported multiple combinations of the *pw* and *ww* parameters
- Migrated to Python 3
- Optimized the calculation efficiency
- Fixed bugs when external .cool files are provided.

Version 0.2.0-r1 (08/26/2018)
-----------------------------
- Speeded up the program by dynamically limiting the donut widths

Version 0.2.0 (08/25/2018)
--------------------------
- Added the vertical and horizontal backgrounds 
- Added additional filtering procedures based on the dbscan clusters and more stringent q value cutoffs
- Fixed bugs of *toCooler* in storing the inter-chromosomal data

Version 0.1.1 (08/24/2018)
--------------------------
- Lower memory usage and more efficient calculation

Version 0.1.0 (08/22/2018)
--------------------------
- The first release.
- Added *toCooler* and *peak-plot*.
- Added support for multiple processing.

Pre-Release (05/04/2015)
-----------------------------
- Implemented core algorithms of BH-FDR and HICCUPS



Reference
=========
.. [1] Rao SS, Huntley MH, Durand NC et al. A 3D Map of the Human Genome at Kilobase Resolution
      Reveals Principles of Chromatin Looping. Cell, 2014, 159(7):1665-80.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/XiaoTaoWang/HiCPeaks/",
    "name": "hicpeaks",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "Hi-C chromatin interaction contact loops",
    "author": "XiaoTao Wang",
    "author_email": "wangxiaotao686@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/0c/01/cd1a15d5ea054ae3ed4d3020eaf778a0bb27b08a03345758605a4e8d475f/hicpeaks-0.3.8.tar.gz",
    "platform": null,
    "description": "HiCPeaks\n========\n.. image:: https://static.pepy.tech/personalized-badge/hicpeaks?period=total&units=international_system&left_color=black&right_color=orange&left_text=Downloads\n :target: https://pepy.tech/project/hicpeaks\n\n*hicpeaks* provides a Python implementation for BH-FDR and HiCCUPS [1]_.\n\nInstallation\n============\n*hicpeaks* is developed and tested on UNIX-like operating systems, and the following packages or softwares are\nrequired:\n\nPython requirements:\n\na) Python 2.7/3.5+\nb) Multiprocess\nc) Numpy\nd) Scipy\ne) Matplotlib\nf) Pandas\ng) Statsmodels\nh) Scikit-Learn\ni) H5py\nj) Cooler\n\nOther requirements:\n\n- ucsc-fetchchromsizes\n\n*conda*, an excellent package manager, can be used to install all requirements above.\n\nInstall Requirements Through Conda\n----------------------------------\nAll above requirements can be installed through the conda package manager.\n\n.. note:: If you have the Anaconda Distribution installed, you already have it.\n\nChoose an appropriate `Miniconda installer <https://conda.io/miniconda.html>`_ for your system,\nthen in your terminal window type the following and follow the prompts on the installer screens::\n\n    $ bash Miniconda3-latest-Linux-x86_64.sh\n\nAfter that, update the environment variables to finish the Conda installation::\n\n    $ source ~/.bashrc\n\nNext, you need to set up channels to make all packages listed above accessible (note\nthat the order is important to guarantee the correct priority)::\n    \n    $ conda config --add channels defaults\n    $ conda config --add channels bioconda\n    $ conda config --add channels conda-forge\n\nThen type and execute the commands below to satisfy the requirements::\n\n    $ conda create -n HiCPeaks numpy scipy matplotlib pandas statsmodels scikit-learn h5py multiprocess cooler ucsc-fetchchromsizes\n    $ conda activate HiCPeaks\n\nInstall hicpeaks\n----------------\nFinally, hicpeaks can be installed from PyPI using pip::\n\n    $ pip install -U hicpeaks\n\nOverview\n========\n*hicpeaks* comes with 6 scripts: *toCooler*, *pyBHFDR*, *pyHICCUPS*, *combine-resolutions*, *peak-plot*,\nand *apa-analysis*.\n\n- toCooler\n\n  Store TXT/NPZ bin-level Hi-C data into the `cooler <https://github.com/mirnylab/cooler>`_ container.\n\n  1. I have included a sample data with the *hicpeaks* source code to illustrate how you should prepare your\n     data in TXT format. It's quite easy, just remember 3 points: 1. the file name should follow this pattern\n     \"chrom1_chrom2.txt\" (remove prefix from your chromosome labels, i.e. \"chr1\" should be \"1\", and \"chrX\" should\n     be \"X\"); 2. each file should only contain 3 columns, corresponding to \"bin1\" of \"chrom1\", \"bin2\" of \"chrom2\",\n     and \"contact frequency\" (**don't** perform any normalization processes); 3. all files at the same resolution\n     should be placed in the same folder.\n  2. NPZ format is another bin-level Hi-C data container which can extremely speed up data loading. *hicpeaks*\n     supports NPZ files generated by old versions of `runHiC (<0.8.0) <https://github.com/XiaoTaoWang/HiC_pipeline>`_ and\n     `TADLib (<0.4.0) <https://github.com/XiaoTaoWang/TADLib>`_.\n\n- pyBHFDR\n\n  A CPU-based python implementation for the BH-FDR algorithm. Rao et al (2014) stated in their supplementary material that\n  this algorithm is robust enough to obtain all main results of their paper. Compared with HiCCUPS, BH-FDR doesn't use\n  \u03bb-chunk in multiple hypothesis tests, and only considers the Donut background region when calculating the\n  expected values.\n\n- pyHICCUPS\n\n  A CPU-based python implementation for the HiCCUPS algorithm. Besides the donut region, HiCCUPS also considers the\n  lower-left, the vertical, and the horizontal backgrounds when calculating the expected values. And \u03bb-chunk is used\n  to overcome several multiple hypothesis testing challenges for Hi-C data. Finally, while BH-FDR can only detect chromatin\n  interactions near the diagonal (<2Mb), HiCCUPS is able to detect super long-range interactions. Here, *pyHICCUPS* keeps\n  all main concepts of the original algorithm except for these points:\n\n  1. *pyHICCUPS* excludes vertical and horizontal backgrounds from its calculation.\n  2. There are two critical parameters related to the loop definition in HiCCUPS: the peak width *p* and the donut width *w*.\n     In original implementation, they are set exclusively for each certain resolution, specifically, *p=1* and *w=3* at 25Kb,\n     *p=2* and *w=5* at 10Kb, and *p=4* and *w=7* at 5Kb. To improve the sensitivity, *pyHICCUPS* can calculate and output\n     the union peak calls from all parameter combinations *(1,3)*, *(2,5)*, *(4,7)* in a single run.\n  3. Due to computational complexity, the search space still need to be limited, for example, within 5Mb/10Mb.\n\n- combine-resolutions\n\n  Combine peak calls from different resolutions in a way similar to original *HiCCUPS*. Briefly, it excludes redundant lower\n  resolution peaks while filters out low-confidence high resolution peaks.\n\n- peak-plot\n\n  Visualize peaks (or chromatin loops) on a local contact matrix.\n\n- apa-analysis\n\n  Perform Aggregate Peak Analysis (APA).\n\nQuickStart\n==========\nThis tutorial will guide you through the basic usage of all scripts distributed with *hicpeaks*.\n\ntoCooler\n--------\nIf you have already created a cooler file for your Hi-C data, skip to the next section\n`pyBHFDR and pyHICCUPS <https://github.com/XiaoTaoWang/HiCPeaks/blob/master/README.rst#pybhfdr-and-pyhiccups>`_,\ngo on otherwise.\n\nFirst, you should store your TXT/NPZ bin-level Hi-C data into a cooler file by using *toCooler*. Let's begin\nwith our sample data below. Suppose you are in the *hicpeaks* source code root folder: change your current\nworking directory to the sub-folder *example*::\n\n    $ cd example\n    $ ls -lh *\n\n    -rw-r--r-- 1 xtwang  18 May  4 18:00 datasets\n    -rw-r--r-- 1 xtwang 293 May  4 18:00 hg38.chromsizes\n\n    25K:\n    total 12M\n    -rw-r--r-- 1 xtwang 12M May  4 18:00 21_21.txt\n\nThere is a sub-directory called *25K* and a metadata file called *datasets*. The *25K* folder contains chromatin\ninteractions of chromosome 21 of the K562 cell line at the 25K resolution, and the *datasets* describes the data\nthat need to be transformed::\n\n    $ cd 25K\n    $ head -5 21_21.txt\n\n    201\t703\t1\n    201\t1347\t1\n    201\t1351\t1\n    201\t1524\t1\n    201\t1691\t1\n\n    $ cd ..\n    $ cat datasets\n\n    res:25000\n      ./25K\n\nYou should construct your TXT files (no head, no tail) with 3 columns, which indicate \"bin1 of the 1st chromosome\",\n\"bin2 of the 2nd chromosome\", and \"contact frequency\" respectively. See `Overview <https://github.com/XiaoTaoWang/HiCPeaks#overview>`_\nabove.\n\nTo transform this data to the *cooler* format, just run the command below::\n\n    $ toCooler -O K562-MboI-parts.cool -d datasets --assembly hg38 --nproc 1\n\n*toCooler* routinely fetch sizes of each chromosome from UCSC with the provided genome assembly name (here hg38).\nHowever, if your reference genome is not holded in UCSC, you can also build a file like \"hg38.chromsizes\" in\ncurrent working directory, and pass the file path to the argument \"--chromsizes-file\".\n\nType ``toCooler`` with no arguments on your terminal to print detailed help information for each parameter.\n\nFor this dataset, *toCooler* will create a cooler file named \"K562-MboI-parts.cool\", and your data will be stored under\nthe URI \"K562-MboI-parts.cool::25000\".\n\nThis tutorial only illustrates a very simple case, in fact the metadata file may contain list of resolutions (if you\nhave data at different resolutions for the same cell line) and corresponding folder paths (both relative and absolute\npath are accepted, and if your data are in the NPZ format, this path should point to the NPZ file)::\n\n    res:10000\n      /absoultepath/10K\n    \n    res:25000\n      ../relativepath/25K\n    \n    res:40000\n      /npzfile/anyprefix.npz\n\nThen *toCooler* will generate a single cooler file storing all the specified data under different cooler URI.\nSuppose your cool file is named \"specified_cooler_path\", the above data will be stored at\n\"specified_cooler_path::10000\", \"specified_cooler_path::25000\", and \"specified_cooler_path::40000\", respectively.\n\npyBHFDR and pyHICCUPS\n---------------------\nAfter you have obtained a cool file, you can call peaks or chromatin loops using *pyBHFDR* or *pyHICCUPS*::\n\n    $ pyBHFDR -O K562-MboI-BHFDR-loops.txt -p K562-MboI-parts.cool::25000 -C 21 --pw 1 --ww 3\n\nOr::\n\n    $ pyHICCUPS -O K562-MboI-HICCUPS-loops.txt -p K562-MboI-parts.cool::25000 --pw 1 2 4 --ww 3 5 7 --only-anchors\n\nType ``pyBHFDR`` or ``pyHICCUPS`` on your terminal to print detailed help information for each parameter.\n\nBefore step to the next section, let's list the contents under current working directory again::\n\n    $ ls -lh\n\n    total 852K\n    drwxr-xr-x 4 xtwang  128 May  4 18:21 25K/\n    -rw-r--r-- 1 xtwang  17K May  4 18:23 K562-MboI-BHFDR-loops.txt\n    -rw-r--r-- 1 xtwang  15K May  4 18:23 K562-MboI-HICCUPS-loops.txt\n    -rw-r--r-- 1 xtwang 723K May  4 18:22 K562-MboI-parts.cool\n    -rw-r--r-- 1 xtwang   18 May  4 18:21 datasets\n    -rw-r--r-- 1 xtwang  293 May  4 18:21 hg38.chromsizes\n    -rw-r--r-- 1 xtwang 2.2K May  4 18:23 pyBHFDR.log\n    -rw-r--r-- 1 xtwang 8.5K May  4 18:23 pyHICCUPS.log\n    -rw-r--r-- 1 xtwang  17K May  4 18:22 tocooler.log\n\nThe detected loops are reported in a customized `bedpe <https://bedtools.readthedocs.io/en/latest/content/general-usage.html#bedpe-format>`_\nformat. The first 10 columns are identical to the `official definition <https://bedtools.readthedocs.io/en/latest/content/general-usage.html#bedpe-format>`_,\nand the additional fields are:\n\n11. Fold enrichment score calculated from the donut background.\n12. The p value calculated from the donut background.\n13. The q value calculated from the donut background.\n14. Fold enrichment score calculated from the lower-left background.\n15. The p value calculated from the lower-left background.\n16. The q value calculated from the lower-left background.\n\nPeak Visualization\n------------------\nNow, you can visualize the detected peaks/loops using *peak-plot*::\n\n    $ peak-plot -O test-HICCUPS.png -p K562-MboI-parts.cool::25000 -I K562-MboI-HICCUPS-loops.txt \\\n      -C 21 -S 25000000 -E 29500000 --clr-weight-name weight --vmin 0 --vmax 0.008\n\nThe output figure should look like this:\n\n.. image:: ./figures/test-HICCUPS-new.png\n        :align: center\n\n\nAggregate Peak Analysis\n-----------------------\nTo inspect the overall loop patterns of the detected peaks, you can use the *apa-analysis* script::\n\n    $ apa-analysis -O apa.png -p K562-MboI-parts.cool::25000 -I K562-MboI-HICCUPS-loops.txt --clr-weight-name weight --vmax 2\n\nThe output plot should look like this:\n\n.. image:: ./figures/apa-new.png\n        :align: center\n\nCombine different resolutions\n-----------------------------\nThe inputs to *combine-resolutions* are loop annotation files (*bedpe*) at different resolutions. If an interaction\nis detected as a peak in both resolutions, this script records the precise coordinates in finer resolutions and discards\nthe coarser resolution one. And a long-range (determined by the ``--min-dis`` parameter) peak call at high resolutions\n(any resolutions lower than the ``--good-res`` cutoff, note that lower values correspond to higher resolutions) will be\ntreated as a false positive if it could not be identified at lower resolutions (any resolutions equal to or greater than\nthe ``--good-res`` cutoff). Here's a *pseudo* command with 3 loop files at 5Kb, 10Kb, and 20Kb respectively::\n\n    $ combine-resolutions -O K562-MboI-pyHICCUPS-combined.bedpe -p K562-MboI-pyHICCUPS-5K.txt K562-MboI-pyHICCUPS-10K.txt K562-MboI-pyHICCUPS-20K.txt -R 5000 10000 20000 -G 20000 -M 100000\n\nPerformance\n===========\nThe table below shows a performance test for the *toCooler*, *pyBHFDR* , and *pyHICCUPS* scripts:\n\n- Processor: 2.6 GHz Intel Core i7, Memory: 16 GB 2400 MHz DDR4\n- Software version: *hicpeaks 0.3.0*\n- At the 40Kb resolution, ``--pw`` and ``--ww`` are set to 1 and 3, respectively; at the 10Kb resolution,\n  they are set to 2 and 5, respectively.\n- The original Hi-C data is stored in TXT\n- Number of proccesses assigned: 1\n- Valid contacts: total number of non-zero pixels in intra-chromosomal matrices\n- Running time format: hr: min: sec\n\n+--------------+----------------+--------------+--------------+--------------+--------------+--------------+--------------+\n| Datasets     | Valid contacts |          toCooler           |           pyBHFDR           |          pyHICCUPS          |\n+--------------+----------------+--------------+--------------+--------------+--------------+--------------+--------------+\n|                               | Memory Usage | Running time | Memory Usage | Running time | Memory Usage | Running time |\n+==============+================+==============+==============+==============+==============+==============+==============+\n| T47D (40K)   |   25,216,875   |    <600M     |    0:07:55   |    <600M     |    0:01:34   |    <600M     |    0:04:17   |\n+--------------+----------------+--------------+--------------+--------------+--------------+--------------+--------------+\n| K562 (40K)   |   49,088,465   |    <1.2G     |    0:21:37   |    <1.0G     |    0:01:49   |    <1.0G     |    0:03:21   |\n+--------------+----------------+--------------+--------------+--------------+--------------+--------------+--------------+\n| K562 (10K)   |  139,884,876   |    <3.0G     |    1:00:07   |    <2.0G     |    0:24:53   |    <4.0G     |    1:57:33   |\n+--------------+----------------+--------------+--------------+--------------+--------------+--------------+--------------+\n\n.. note:: Both *pyBHFDR* and *pyHICCUPS* support parallel computation (``--nproc``). If your computer has sufficient memory, the\n          calculation should end within 30 minutes even for high-resolution matrices.\n\nRelease Notes\n=============\nVersion 0.3.5 (08/28/2022)\n--------------------------\n- Added parameters to *peak-plot* and *apa-analysis* so that the output figures can be more finely tuned\n\nVersion 0.3.4 (05/04/2019)\n--------------------------\n- Improved the efficiency of the local clustering algorithm\n- Changed the output loop format to bedpe\n\nVersion 0.3.3 (03/08/2019)\n--------------------------\n- Made *toCooler* support the float data type\n- Removed ticklabels in APA plot\n\nVersion 0.3.2 (03/03/2019)\n--------------------------\n- Supported combination of different resolutions\n- Improved the local clustering algorithm\n- Added the APA analysis module\n- Dealed with the compatiblility with cooler 0.8\n\nVersion 0.3.0 (09/03/2018)\n--------------------------\n- Removed the horizontal and vertical backgrounds\n- Supported multiple combinations of the *pw* and *ww* parameters\n- Migrated to Python 3\n- Optimized the calculation efficiency\n- Fixed bugs when external .cool files are provided.\n\nVersion 0.2.0-r1 (08/26/2018)\n-----------------------------\n- Speeded up the program by dynamically limiting the donut widths\n\nVersion 0.2.0 (08/25/2018)\n--------------------------\n- Added the vertical and horizontal backgrounds \n- Added additional filtering procedures based on the dbscan clusters and more stringent q value cutoffs\n- Fixed bugs of *toCooler* in storing the inter-chromosomal data\n\nVersion 0.1.1 (08/24/2018)\n--------------------------\n- Lower memory usage and more efficient calculation\n\nVersion 0.1.0 (08/22/2018)\n--------------------------\n- The first release.\n- Added *toCooler* and *peak-plot*.\n- Added support for multiple processing.\n\nPre-Release (05/04/2015)\n-----------------------------\n- Implemented core algorithms of BH-FDR and HICCUPS\n\n\n\nReference\n=========\n.. [1] Rao SS, Huntley MH, Durand NC et al. A 3D Map of the Human Genome at Kilobase Resolution\n      Reveals Principles of Chromatin Looping. Cell, 2014, 159(7):1665-80.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python implementation for BH-FDR and HiCCUPS",
    "version": "0.3.8",
    "project_urls": {
        "Homepage": "https://github.com/XiaoTaoWang/HiCPeaks/"
    },
    "split_keywords": [
        "hi-c",
        "chromatin",
        "interaction",
        "contact",
        "loops"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9fe3e0a08257cb733268bdcc2bedf8f9075aa70895a905d6938fa467de072bba",
                "md5": "1bedbdf5e7055e0990ef061b6ff06716",
                "sha256": "e283aa839ab06d46f7d38ac0eed7ebfa0dddd33092598ed47f5049f51300eeb7"
            },
            "downloads": -1,
            "filename": "hicpeaks-0.3.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1bedbdf5e7055e0990ef061b6ff06716",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 48424,
            "upload_time": "2024-11-21T02:21:25",
            "upload_time_iso_8601": "2024-11-21T02:21:25.016735Z",
            "url": "https://files.pythonhosted.org/packages/9f/e3/e0a08257cb733268bdcc2bedf8f9075aa70895a905d6938fa467de072bba/hicpeaks-0.3.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0c01cd1a15d5ea054ae3ed4d3020eaf778a0bb27b08a03345758605a4e8d475f",
                "md5": "6b9a1c25dece1d45a392c2c74e1e76d7",
                "sha256": "33fea83206b9ab221281c5cde231719e6a86858d272c29364d0c9dc12823703d"
            },
            "downloads": -1,
            "filename": "hicpeaks-0.3.8.tar.gz",
            "has_sig": false,
            "md5_digest": "6b9a1c25dece1d45a392c2c74e1e76d7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 3043985,
            "upload_time": "2024-11-21T02:21:27",
            "upload_time_iso_8601": "2024-11-21T02:21:27.676421Z",
            "url": "https://files.pythonhosted.org/packages/0c/01/cd1a15d5ea054ae3ed4d3020eaf778a0bb27b08a03345758605a4e8d475f/hicpeaks-0.3.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-21 02:21:27",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "XiaoTaoWang",
    "github_project": "HiCPeaks",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "hicpeaks"
}

XiaoTao Wang