HiCLift


NameHiCLift JSON
Version 1.0 PyPI version JSON
download
home_pagehttps://github.com/XiaoTaoWang/HiCLift
SummaryConvert genomic coordinates of contact pairs from one assembly to another.
upload_time2023-01-17 15:14:49
maintainer
docs_urlNone
authorXiaoTao Wang
requires_python
license
keywords liftover pairs hi-c 4dn
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            HiCLift 
=======
With the continuous effort to improve the quality of human reference genome and the generation
of more and more personal genomes, the conversion of genomic coordinates between genome assemblies
is critical in many integrative and comparative studies. While tools have been developed for such
task for linear genome signals such as ChIP-Seq, no tool exists to convert genome assemblies for
chromatin interaction data, despite the importance of three-dimensional (3D) genome organization
in gene regulation and disease. Here, we present HiCLift (previously known as pairLiftOver), a
fast and efficient tool that can convert the genomic coordinates of chromatin contacts such as Hi-C
and Micro-C from one assembly to another, including the latest T2T genome. Comparing with the
strategy of directly re-mapping raw reads to a different genome, HiCLift runs on average 42 times
faster (hours vs. days), while outputs nearly identical contact matrices. More importantly, as
HiCLift does not need to re-map the raw reads, it can directly convert human patient sample data,
where the raw sequencing reads are sometimes hard to acquire or not available.

Installation
============
HiCLift and all the dependencies can be installed through either `mamba <https://mamba.readthedocs.io/en/latest/installation.html>`_
or `pip <https://pypi.org/project/pip/>`_::

    $ conda config --append channels defaults
    $ conda config --append channels bioconda
    $ conda config --append channels conda-forge
    $ mamba create -n HiCLift cooler pairtools kerneltree cxx-compiler
    $ mamba activate HiCLift
    $ pip install HiCLift hic-straw

Overview
========
The inputs to HiCLift include two parts. The first part is a file containing the chromatin
contacts information. This file can be either a pairs file
(`4DN pairs <https://github.com/4dn-dcic/pairix/blob/master/pairs_format_specification.md>`_ or
`HiC-Pro allValidPairs <https://nservant.github.io/HiC-Pro/RESULTS.html>`_)
with each row representing a pair of interacting genomic loci in base-pair resolution, or a matrix
file (`.cool <https://open2c.github.io/cooler/>`_ or `.hic <https://github.com/aidenlab/juicer/wiki/Data>`_),
which stores interaction frequencies between genomic intervals of fixed size. The second part is a
`UCSC chain file <https://genome.ucsc.edu/goldenPath/help/chain.html>`_, which describes pairwise
alignment that allows gaps in both assemblies simultaneously. Internally, HiCLift represents
a chain file as IntervalTrees, with one tree per chromosome, to efficiently search for a specific
genomic position in a chain file and locate the matched position in the target genome. The converted
chromatin contacts will be reported in either a sorted 4DN pairs file, which can be directly used
to generate contact matrix in various formats, or a matrix file in .cool or .hic formats.

.. image:: ./images/fig1.svg.png
        :align: center

Usage
=====
Open a terminal, type ``HiCLift -h`` for help information.

Here is an example command which uses a 4DN pairs file in hg19 coordinates as input, and
outputs an mcool file with chromatin contacts in hg38 coordinates::

    $ HiCLift --input test.hg19.pairs.gz --input-format pairs --out-pre test-hg38 \
    --output-format cool --out-chromsizes hg38.chrom.sizes --in-assembly hg19 --out-assembly hg38 \
    --logFile HiCLift.log

HiCLift can also serve as a tool to perform a pure data format conversion. For example,
the following command transforms a contact matrix from the .cool format to the .hic format,
without the coordinate liftover. Note that the values of ``--in-assembly`` and ``--out-assembly``
need to be the same to turn on this function::

    $ HiCLift --input Rao2014-K562-MboI-allreps-filtered.5kb.cool --input-format cooler \
    --out-pre K562-format-conversion-test --output-format hic --out-chromsizes hg19.chrom.sizes \
    --in-assembly hg19 --out-assembly hg19 --memory 40G


Performance
===========
Using large Hi-C datasets of different species as a benchmark, we show that compared with
the strategy directly re-mapping raw reads to a different genome, HiCLift runs on
average 42 times faster, while outputs nearly identical contact matrices. 

.. image:: ./images/accuracy.png
        :align: center



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/XiaoTaoWang/HiCLift",
    "name": "HiCLift",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "liftover pairs Hi-C 4DN",
    "author": "XiaoTao Wang",
    "author_email": "wangxiaotao686@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/7e/f8/7aff3c8913e3e127c23aaca6609caa079f15bd131b842c61ff69ab2eeb8a/HiCLift-1.0.tar.gz",
    "platform": null,
    "description": "HiCLift \n=======\nWith the continuous effort to improve the quality of human reference genome and the generation\nof more and more personal genomes, the conversion of genomic coordinates between genome assemblies\nis critical in many integrative and comparative studies. While tools have been developed for such\ntask for linear genome signals such as ChIP-Seq, no tool exists to convert genome assemblies for\nchromatin interaction data, despite the importance of three-dimensional (3D) genome organization\nin gene regulation and disease. Here, we present HiCLift (previously known as pairLiftOver), a\nfast and efficient tool that can convert the genomic coordinates of chromatin contacts such as Hi-C\nand Micro-C from one assembly to another, including the latest T2T genome. Comparing with the\nstrategy of directly re-mapping raw reads to a different genome, HiCLift runs on average 42 times\nfaster (hours vs. days), while outputs nearly identical contact matrices. More importantly, as\nHiCLift does not need to re-map the raw reads, it can directly convert human patient sample data,\nwhere the raw sequencing reads are sometimes hard to acquire or not available.\n\nInstallation\n============\nHiCLift and all the dependencies can be installed through either `mamba <https://mamba.readthedocs.io/en/latest/installation.html>`_\nor `pip <https://pypi.org/project/pip/>`_::\n\n    $ conda config --append channels defaults\n    $ conda config --append channels bioconda\n    $ conda config --append channels conda-forge\n    $ mamba create -n HiCLift cooler pairtools kerneltree cxx-compiler\n    $ mamba activate HiCLift\n    $ pip install HiCLift hic-straw\n\nOverview\n========\nThe inputs to HiCLift include two parts. The first part is a file containing the chromatin\ncontacts information. This file can be either a pairs file\n(`4DN pairs <https://github.com/4dn-dcic/pairix/blob/master/pairs_format_specification.md>`_ or\n`HiC-Pro allValidPairs <https://nservant.github.io/HiC-Pro/RESULTS.html>`_)\nwith each row representing a pair of interacting genomic loci in base-pair resolution, or a matrix\nfile (`.cool <https://open2c.github.io/cooler/>`_ or `.hic <https://github.com/aidenlab/juicer/wiki/Data>`_),\nwhich stores interaction frequencies between genomic intervals of fixed size. The second part is a\n`UCSC chain file <https://genome.ucsc.edu/goldenPath/help/chain.html>`_, which describes pairwise\nalignment that allows gaps in both assemblies simultaneously. Internally, HiCLift represents\na chain file as IntervalTrees, with one tree per chromosome, to efficiently search for a specific\ngenomic position in a chain file and locate the matched position in the target genome. The converted\nchromatin contacts will be reported in either a sorted 4DN pairs file, which can be directly used\nto generate contact matrix in various formats, or a matrix file in .cool or .hic formats.\n\n.. image:: ./images/fig1.svg.png\n        :align: center\n\nUsage\n=====\nOpen a terminal, type ``HiCLift -h`` for help information.\n\nHere is an example command which uses a 4DN pairs file in hg19 coordinates as input, and\noutputs an mcool file with chromatin contacts in hg38 coordinates::\n\n    $ HiCLift --input test.hg19.pairs.gz --input-format pairs --out-pre test-hg38 \\\n    --output-format cool --out-chromsizes hg38.chrom.sizes --in-assembly hg19 --out-assembly hg38 \\\n    --logFile HiCLift.log\n\nHiCLift can also serve as a tool to perform a pure data format conversion. For example,\nthe following command transforms a contact matrix from the .cool format to the .hic format,\nwithout the coordinate liftover. Note that the values of ``--in-assembly`` and ``--out-assembly``\nneed to be the same to turn on this function::\n\n    $ HiCLift --input Rao2014-K562-MboI-allreps-filtered.5kb.cool --input-format cooler \\\n    --out-pre K562-format-conversion-test --output-format hic --out-chromsizes hg19.chrom.sizes \\\n    --in-assembly hg19 --out-assembly hg19 --memory 40G\n\n\nPerformance\n===========\nUsing large Hi-C datasets of different species as a benchmark, we show that compared with\nthe strategy directly re-mapping raw reads to a different genome, HiCLift runs on\naverage 42 times faster, while outputs nearly identical contact matrices. \n\n.. image:: ./images/accuracy.png\n        :align: center\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Convert genomic coordinates of contact pairs from one assembly to another.",
    "version": "1.0",
    "split_keywords": [
        "liftover",
        "pairs",
        "hi-c",
        "4dn"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "420907c3158339644b06d0fcf269477a8e1b5f19af1e7742c9f90fa97c344325",
                "md5": "aec31e90ecb04a058770614bd8d3dfad",
                "sha256": "f9b72937565cac586b3baed7504135a193ce57cfa76830997e8464e32848662a"
            },
            "downloads": -1,
            "filename": "HiCLift-1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "aec31e90ecb04a058770614bd8d3dfad",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 27756082,
            "upload_time": "2023-01-17T15:14:39",
            "upload_time_iso_8601": "2023-01-17T15:14:39.546188Z",
            "url": "https://files.pythonhosted.org/packages/42/09/07c3158339644b06d0fcf269477a8e1b5f19af1e7742c9f90fa97c344325/HiCLift-1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7ef87aff3c8913e3e127c23aaca6609caa079f15bd131b842c61ff69ab2eeb8a",
                "md5": "05a68fabe820c0ee32741f9cd0cef569",
                "sha256": "3573dc51894b1c9a44fcc448896912864fb7288d37ec5a5a861a756d10be1b0c"
            },
            "downloads": -1,
            "filename": "HiCLift-1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "05a68fabe820c0ee32741f9cd0cef569",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 27746181,
            "upload_time": "2023-01-17T15:14:49",
            "upload_time_iso_8601": "2023-01-17T15:14:49.655110Z",
            "url": "https://files.pythonhosted.org/packages/7e/f8/7aff3c8913e3e127c23aaca6609caa079f15bd131b842c61ff69ab2eeb8a/HiCLift-1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-17 15:14:49",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "XiaoTaoWang",
    "github_project": "HiCLift",
    "lcname": "hiclift"
}
        
Elapsed time: 0.04066s