RaichuNorm


NameRaichuNorm JSON
Version 1.0 PyPI version JSON
download
home_pagehttps://github.com/XiaoTaoWang/Raichu
SummaryA cross-platform method for chromatin contact normalization
upload_time2024-12-12 02:09:44
maintainerNone
docs_urlNone
authorXiaoTao Wang
requires_pythonNone
licenseNone
keywords hi-c chia-pet hichip plac-seq single-cell normalization
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Raichu 
======
Accurately detecting enhancer-promoter loops from genome-wide interaction data,
such as Hi-C, is crucial for understanding gene regulation. Current normalization
methods, such as Iterative Correction and Eigenvector decomposition (ICE), are
commonly used to remove biases in Hi-C data prior to chromatin loop detection.
However, while structural or CTCF-associated loop signals are retained,
enhancer-promoter interaction signals are often greatly diminished after ICE
normalization and similar methods, making these regulatory loops harder to detect.
To address this limitation, we developed Raichu, a novel method for normalizing
chromatin contact data. Raichu identifies nearly twice as many chromatin loops
as ICE, recovering almost all loops detected by ICE and revealing thousands of
additional enhancer-promoter loops missed by ICE. With its enhanced sensitivity
for regulatory loops, Raichu detects more biologically meaningful differential
loops between conditions in the same cell type. Furthermore, Raichu performs
consistently across different sequencing depths and platforms, including Hi-C,
HiChIP, and single-cell Hi-C, making it a versatile tool for uncovering new
insights into three-dimensional (3D) genomic organization and transcriptional
regulation.

Installation
============
Raichu and all the dependencies can be installed through either `mamba <https://github.com/mamba-org/mamba>`_
or `pip <https://pypi.org/project/pip/>`_::

    $ conda config --append channels defaults
    $ conda config --append channels bioconda
    $ conda config --append channels conda-forge
    $ mamba create -n 3Dnorm cooler numba joblib
    $ mamba activate 3Dnorm
    $ pip install raichu

Raichu is a command-line tool, and after successful installation, help information
can be accessed by running ``raichu -h`` in a terminal.

Usage
=====
Raichu is built on the `cooler <https://github.com/open2c/cooler>`_ Python package
for reading and processing contact matrices. To demonstrate how to normalize a
contact matrix in .cool format, let's download the file "GM12878.Hi-C.10kb.cool"
from this `link <https://www.jianguoyun.com/p/DUoSz7gQh9qdDBi5lLwFIAA>`_. This
file contains contact matrices at 10kb resolution, generated from an in situ Hi-C
dataset in the GM12878 cell line.

.. note:: Raichu is also applicable to other 3D genomic platforms,
    such as Micro-C, HiChIP, and ChIA-PET.

Now all that is needed is to execute the commands below in a terminal::

    $ raichu --cool-uri GM12878.Hi-C.10kb.cool --window-size 200 -p 8 -n obj_weight -f

Here:

1. The ``--cool-uri`` parameter specifies the URI of contact matrices at
a specific resolution. For a single-resolution cooler file (typically suffixed
with .cool), the value should be the file path. For a multi-resolution cooler
file (typically suffixed with .mcool), the value should include the file path
followed by ``::`` and the internal group path to the root of a data collection.
For example: ``test.mcool::resolutions/10000`` or ``test.mcool::resolutions/5000``.

2. The ``--window-size`` parameter specifies the size of the sliding window. In most
cases, the default value of 200 is sufficient. Increasing the window size may
improve the accuracy of bias vector calculations but will also increase the runtime.

3. The ``-p`` or ``--nproc`` parameter specifies the number of processes to allocate for
the calculation. Raichu uses this parameter to perform calculations for chromosomes
in parallel. However, setting this parameter to a value greater than the number of
chromosomes will not result in additional speed improvements.

4. The ``-n`` or ``--name`` parameter specifies the name of the column where the
calculated bias vectors will be written.

5. If the ``-f`` or ``--force`` parameter is specified, the target column in the
bin table will be overwritten if it already exists.


Downstream Analysis with Raichu-Normalized Matrices
===================================================
Raichu stores the calculated bias vectors in the same format as
``cooler balance`` (an implementation of the ICE algorithm), ensuring
seamless compability with downstream tools for analyzing compartments,
TADs, and loops.

For instance, to compute chromatin compartment values based on Raichu-normalized
signals, we can use the `cooltools eigs-cis  <https://github.com/open2c/cooltools>`_
command and specify the ``--clr-weight-name`` parameter as "obj_weight" (matching
the ``-n`` parameter setting we used when running Raichu). The full command would
look like this::

    $ cooltools eigs-cis --phasing-track hg38-gene-density-100K.bedGraph --clr-weight-name obj_weight -o GM_raichu GM12878-MboI-allReps-hg38.mcool::resolutions/100000

Similarly, we can use the following command to compute insulation scores with
Raichu-normalized signals::

    $ cooltools insulation --ignore-diags 1 -p 8 -o GM_raichu.IS.25kb.tsv --clr-weight-name obj_weight GM12878-MboI-allReps-hg38.mcool::resolutions/25000 1000000

For loop detection, we have tested the `pyHICCUPS <https://github.com/XiaoTaoWang/HiCPeaks>`_,
`Mustache <https://github.com/ay-lab/mustache>`_, and `Peakachu <https://github.com/tariks/peakachu>`_
software.

Here is an example command for using pyHICCUPS (v0.3.8)::

    $ pyHICCUPS -p GM12878.Hi-C.5kb.cool -O GM12878_pyHICCUPS.5kb.bedpe --pw 1 2 4 --ww 3 5 7 --only-anchors --nproc 8 --clr-weight-name obj_weight --maxapart 4000000
    $ pyHICCUPS -p GM12878.Hi-C.10kb.cool -O GM12878_pyHICCUPS.10kb.bedpe --pw 1 2 4 --ww 3 5 7 --only-anchors --nproc 8 --clr-weight-name obj_weight --maxapart 4000000
    $ combine-resolutions -O GM12878_pyHICCUPS.bedpe -p GM12878_pyHICCUPS.5kb.bedpe GM12878_pyHICCUPS.10kb.bedpe -R 5000 10000 -G 10000 -M 100000 --max-res 10000

And here is an example command for using Mustache (v1.3.2)::

    $ mustache -f GM12878-MboI-allReps-hg38.mcool -r 10000 -pt 0.05 -norm obj_weight -p 8 -o GM12878_mustache_test.tsv

Performance
===========
In GM12878 cells, ICE detected 15,446 loops, while Raichu identified 28,986 loops.
(For this analysis, pyHICCUPS was applied; however, as shown in the manuscript,
various loop-calling methods achieve a similar level of improvement when using
Raichu-normalized signals.) Notably, 90.6% of loops detected by ICE (13,997 out
of 15,446) were also identified by Raichu, whereas 51.7% of loops detected by
Raichu (14,989 out of 28,986) were missed by ICE.

We classified the loops into three categories: ICE-specific loops, Raichu-specific loops,
and common loops (detected by both ICE and Raichu). Interestingly, while ICE-specific
and Raichu-specific loops showed comparable enrichment for CTCF and RAD21, Raichu-specific
loops exhibited substantially greater enrichment for a broader range of transcription
factors (TFs) and histone modifications closely associated with transcriptional regulation.
These include RNA polymerase II (POLR2A), CREB1, RELB, H3K4me3, and H3K27ac.

.. image:: ./images/performance.png
        :align: center

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/XiaoTaoWang/Raichu",
    "name": "RaichuNorm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "Hi-C ChIA-PET HiChIP PLAC-Seq single-cell normalization",
    "author": "XiaoTao Wang",
    "author_email": "wangxiaotao@fudan.edu.cn",
    "download_url": "https://files.pythonhosted.org/packages/5b/e5/c1a0f167b7d45c51c47881a6a1ae62724b4f96cdb22436cc514bcf69b155/RaichuNorm-1.0.tar.gz",
    "platform": null,
    "description": "Raichu \n======\nAccurately detecting enhancer-promoter loops from genome-wide interaction data,\nsuch as Hi-C, is crucial for understanding gene regulation. Current normalization\nmethods, such as Iterative Correction and Eigenvector decomposition (ICE), are\ncommonly used to remove biases in Hi-C data prior to chromatin loop detection.\nHowever, while structural or CTCF-associated loop signals are retained,\nenhancer-promoter interaction signals are often greatly diminished after ICE\nnormalization and similar methods, making these regulatory loops harder to detect.\nTo address this limitation, we developed Raichu, a novel method for normalizing\nchromatin contact data. Raichu identifies nearly twice as many chromatin loops\nas ICE, recovering almost all loops detected by ICE and revealing thousands of\nadditional enhancer-promoter loops missed by ICE. With its enhanced sensitivity\nfor regulatory loops, Raichu detects more biologically meaningful differential\nloops between conditions in the same cell type. Furthermore, Raichu performs\nconsistently across different sequencing depths and platforms, including Hi-C,\nHiChIP, and single-cell Hi-C, making it a versatile tool for uncovering new\ninsights into three-dimensional (3D) genomic organization and transcriptional\nregulation.\n\nInstallation\n============\nRaichu and all the dependencies can be installed through either `mamba <https://github.com/mamba-org/mamba>`_\nor `pip <https://pypi.org/project/pip/>`_::\n\n    $ conda config --append channels defaults\n    $ conda config --append channels bioconda\n    $ conda config --append channels conda-forge\n    $ mamba create -n 3Dnorm cooler numba joblib\n    $ mamba activate 3Dnorm\n    $ pip install raichu\n\nRaichu is a command-line tool, and after successful installation, help information\ncan be accessed by running ``raichu -h`` in a terminal.\n\nUsage\n=====\nRaichu is built on the `cooler <https://github.com/open2c/cooler>`_ Python package\nfor reading and processing contact matrices. To demonstrate how to normalize a\ncontact matrix in .cool format, let's download the file \"GM12878.Hi-C.10kb.cool\"\nfrom this `link <https://www.jianguoyun.com/p/DUoSz7gQh9qdDBi5lLwFIAA>`_. This\nfile contains contact matrices at 10kb resolution, generated from an in situ Hi-C\ndataset in the GM12878 cell line.\n\n.. note:: Raichu is also applicable to other 3D genomic platforms,\n    such as Micro-C, HiChIP, and ChIA-PET.\n\nNow all that is needed is to execute the commands below in a terminal::\n\n    $ raichu --cool-uri GM12878.Hi-C.10kb.cool --window-size 200 -p 8 -n obj_weight -f\n\nHere:\n\n1. The ``--cool-uri`` parameter specifies the URI of contact matrices at\na specific resolution. For a single-resolution cooler file (typically suffixed\nwith .cool), the value should be the file path. For a multi-resolution cooler\nfile (typically suffixed with .mcool), the value should include the file path\nfollowed by ``::`` and the internal group path to the root of a data collection.\nFor example: ``test.mcool::resolutions/10000`` or ``test.mcool::resolutions/5000``.\n\n2. The ``--window-size`` parameter specifies the size of the sliding window. In most\ncases, the default value of 200 is sufficient. Increasing the window size may\nimprove the accuracy of bias vector calculations but will also increase the runtime.\n\n3. The ``-p`` or ``--nproc`` parameter specifies the number of processes to allocate for\nthe calculation. Raichu uses this parameter to perform calculations for chromosomes\nin parallel. However, setting this parameter to a value greater than the number of\nchromosomes will not result in additional speed improvements.\n\n4. The ``-n`` or ``--name`` parameter specifies the name of the column where the\ncalculated bias vectors will be written.\n\n5. If the ``-f`` or ``--force`` parameter is specified, the target column in the\nbin table will be overwritten if it already exists.\n\n\nDownstream Analysis with Raichu-Normalized Matrices\n===================================================\nRaichu stores the calculated bias vectors in the same format as\n``cooler balance`` (an implementation of the ICE algorithm), ensuring\nseamless compability with downstream tools for analyzing compartments,\nTADs, and loops.\n\nFor instance, to compute chromatin compartment values based on Raichu-normalized\nsignals, we can use the `cooltools eigs-cis  <https://github.com/open2c/cooltools>`_\ncommand and specify the ``--clr-weight-name`` parameter as \"obj_weight\" (matching\nthe ``-n`` parameter setting we used when running Raichu). The full command would\nlook like this::\n\n    $ cooltools eigs-cis --phasing-track hg38-gene-density-100K.bedGraph --clr-weight-name obj_weight -o GM_raichu GM12878-MboI-allReps-hg38.mcool::resolutions/100000\n\nSimilarly, we can use the following command to compute insulation scores with\nRaichu-normalized signals::\n\n    $ cooltools insulation --ignore-diags 1 -p 8 -o GM_raichu.IS.25kb.tsv --clr-weight-name obj_weight GM12878-MboI-allReps-hg38.mcool::resolutions/25000 1000000\n\nFor loop detection, we have tested the `pyHICCUPS <https://github.com/XiaoTaoWang/HiCPeaks>`_,\n`Mustache <https://github.com/ay-lab/mustache>`_, and `Peakachu <https://github.com/tariks/peakachu>`_\nsoftware.\n\nHere is an example command for using pyHICCUPS (v0.3.8)::\n\n    $ pyHICCUPS -p GM12878.Hi-C.5kb.cool -O GM12878_pyHICCUPS.5kb.bedpe --pw 1 2 4 --ww 3 5 7 --only-anchors --nproc 8 --clr-weight-name obj_weight --maxapart 4000000\n    $ pyHICCUPS -p GM12878.Hi-C.10kb.cool -O GM12878_pyHICCUPS.10kb.bedpe --pw 1 2 4 --ww 3 5 7 --only-anchors --nproc 8 --clr-weight-name obj_weight --maxapart 4000000\n    $ combine-resolutions -O GM12878_pyHICCUPS.bedpe -p GM12878_pyHICCUPS.5kb.bedpe GM12878_pyHICCUPS.10kb.bedpe -R 5000 10000 -G 10000 -M 100000 --max-res 10000\n\nAnd here is an example command for using Mustache (v1.3.2)::\n\n    $ mustache -f GM12878-MboI-allReps-hg38.mcool -r 10000 -pt 0.05 -norm obj_weight -p 8 -o GM12878_mustache_test.tsv\n\nPerformance\n===========\nIn GM12878 cells, ICE detected 15,446 loops, while Raichu identified 28,986 loops.\n(For this analysis, pyHICCUPS was applied; however, as shown in the manuscript,\nvarious loop-calling methods achieve a similar level of improvement when using\nRaichu-normalized signals.) Notably, 90.6% of loops detected by ICE (13,997 out\nof 15,446) were also identified by Raichu, whereas 51.7% of loops detected by\nRaichu (14,989 out of 28,986) were missed by ICE.\n\nWe classified the loops into three categories: ICE-specific loops, Raichu-specific loops,\nand common loops (detected by both ICE and Raichu). Interestingly, while ICE-specific\nand Raichu-specific loops showed comparable enrichment for CTCF and RAD21, Raichu-specific\nloops exhibited substantially greater enrichment for a broader range of transcription\nfactors (TFs) and histone modifications closely associated with transcriptional regulation.\nThese include RNA polymerase II (POLR2A), CREB1, RELB, H3K4me3, and H3K27ac.\n\n.. image:: ./images/performance.png\n        :align: center\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A cross-platform method for chromatin contact normalization",
    "version": "1.0",
    "project_urls": {
        "Homepage": "https://github.com/XiaoTaoWang/Raichu"
    },
    "split_keywords": [
        "hi-c",
        "chia-pet",
        "hichip",
        "plac-seq",
        "single-cell",
        "normalization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "553e3d660bad2316faaaf6229e05808d598779d61b8a6a097333fda6f9518969",
                "md5": "ec2ea893055c953982854772604d02d6",
                "sha256": "515303af6958621225d35bff24c0af09db75924b9e85b8e8e7995c3a990b7769"
            },
            "downloads": -1,
            "filename": "RaichuNorm-1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ec2ea893055c953982854772604d02d6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 22695,
            "upload_time": "2024-12-12T02:09:42",
            "upload_time_iso_8601": "2024-12-12T02:09:42.031196Z",
            "url": "https://files.pythonhosted.org/packages/55/3e/3d660bad2316faaaf6229e05808d598779d61b8a6a097333fda6f9518969/RaichuNorm-1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5be5c1a0f167b7d45c51c47881a6a1ae62724b4f96cdb22436cc514bcf69b155",
                "md5": "797d3210e703ff9a3d995a80564b5567",
                "sha256": "2f7706062a1b16e908bed6b2fceae17d8745b96a3548b80fb3e987501bab60b2"
            },
            "downloads": -1,
            "filename": "RaichuNorm-1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "797d3210e703ff9a3d995a80564b5567",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 22287,
            "upload_time": "2024-12-12T02:09:44",
            "upload_time_iso_8601": "2024-12-12T02:09:44.786076Z",
            "url": "https://files.pythonhosted.org/packages/5b/e5/c1a0f167b7d45c51c47881a6a1ae62724b4f96cdb22436cc514bcf69b155/RaichuNorm-1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-12 02:09:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "XiaoTaoWang",
    "github_project": "Raichu",
    "github_not_found": true,
    "lcname": "raichunorm"
}
        
Elapsed time: 0.55221s