hic-straw


Namehic-straw JSON
Version 1.3.1 PyPI version JSON
download
home_pagehttps://github.com/aidenlab/straw
SummaryStraw bound with pybind11
upload_time2022-05-27 22:31:04
maintainer
docs_urlNone
authorNeva C. Durand, Muhammad S Shamim
requires_python>3.3
licenseMIT
keywords hi-c 3d genomics chromatin ml
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## Quick Start Python

Straw is library which allows rapid streaming of contact data from .hic files. 
To learn more about Hi-C data and 3D genomics, visit https://aidenlab.gitbook.io/juicebox/

Once you've installed the library with `pip install hic-straw`, you can import your code with `import hicstraw`. 

## New usage to directly get numpy matrix

The new usage for straw allows you to create objects and retain intermediate variables.
This can speed up your code significantly when querying hundreds or thousands of regions
for a given chromosome/resolution/normalization.

First we import `numpy` and `hicstraw`.
```python
import numpy as np
import hicstraw
```

We then create a Hi-C file object. 
From this object, we can query genomeID, chromosomes, and resolutions.
```python
hic = hicstraw.HiCFile("HIC001.hic")
print(hic.getChromosomes())
print(hic.getGenomeID())
print(hic.getResolutions())
```

We can also collect a matrix zoom data object, which is specific to 
- specific matrix-type: `observed` (count) or `oe` (observed/expected ratio)
- chromosome-chromosome pair
- resolution
- normalization

This object retains information for fast future queries. 
Here's an example that pick the counts from the intrachromosomal region for chr4 
with KR normalization at 5kB resolution.
```python
mzd = hic.getMatrixZoomData('4', '4', "observed", "KR", "BP", 5000)
```

We can get numpy matrices for specific genomic windows by calling:
```python
numpy_matrix = mzd.getRecordsAsMatrix(10000000, 12000000, 10000000, 12000000)
```

### Usage
```
hic = hicstraw.HiCFile(filepath)
hic.getChromosomes()
hic.getGenomeID()
hic.getResolutions()

mzd = hic.getMatrixZoomData(chrom1, chrom2, data_type, normalization, "BP", resolution)

numpy_matrix = mzd.getRecordsAsMatrix(gr1, gr2, gc1, gc2)
records_list = mzd.getRecords(gr1, gr2, gc1, gc2)
```

`filepath`: path to file (local or URL)<br>
`data_type`: `'observed'` (previous default / "main" data) or `'oe'` (observed/expected)<br>
`normalization`: `NONE`, `VC`, `VC_SQRT`, `KR`, `SCALE`, etc.<br>
`resolution`: typically `2500000`, `1000000`, `500000`, `100000`, `50000`, `25000`, `10000`, `5000`, etc.<br><br>
Note: the normalization, resolution, and chromosome/regions must already exist in the .hic to be read 
(i.e. they are not calculated by straw, only read from the file if available)<br>
`gr1`: start genomic position along rows<br>
`gr2`: end genomic position along rows<br>
`gc1`: start genomic position along columns<br>
`gc2`: end genomic position along columns<br>


## Legacy usage to fetch list of contacts

For example, to fetch a list of all the raw contacts on chrX at 100Kb resolution:

```python
import hicstraw
result = hicstraw.straw('observed', 'NONE', 'HIC001.hic', 'X', 'X', 'BP', 1000000)
for i in range(len(result)):
    print("{0}\t{1}\t{2}".format(result[i].binX, result[i].binY, result[i].counts))
```

To fetch a list of KR normalized contacts for the same region:
```python
import hicstraw
result = hicstraw.straw('observed', 'KR', 'HIC001.hic', 'X', 'X', 'BP', 1000000)
for i in range(len(result)):
    print("{0}\t{1}\t{2}".format(result[i].binX, result[i].binY, result[i].counts))
```

To query observed/expected KR normalized data:
```python
import hicstraw
result = hicstraw.straw('oe', 'KR', 'HIC001.hic', 'X', 'X', 'BP', 1000000)
for i in range(len(result)):
    print("{0}\t{1}\t{2}".format(result[i].binX, result[i].binY, result[i].counts))
```

### Usage
```
hicstraw.straw(data_type, normalization, file, region_x, region_y, 'BP', resolution)
```

`data_type`: `'observed'` (previous default / "main" data) or `'oe'` (observed/expected)<br>
`normalization`: `NONE`, `VC`, `VC_SQRT`, `KR`, `SCALE`, etc.<br>
`file`: filepath (local or URL)<br>
`region_x/y`: provide the `chromosome` or utilize the syntax `chromosome:start_position:end_position` if using a smaller window within the chromosome<br>
`resolution`: typically `2500000`, `1000000`, `500000`, `100000`, `50000`, `25000`, `10000`, `5000`, etc.<br><br>
Note: the normalization, resolution, and chromosome/regions must already exist in the .hic to be read 
(i.e. they are not calculated by straw, only read from the file if available)<br>




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/aidenlab/straw",
    "name": "hic-straw",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">3.3",
    "maintainer_email": "",
    "keywords": "Hi-C,3D Genomics,Chromatin,ML",
    "author": "Neva C. Durand, Muhammad S Shamim",
    "author_email": "neva@broadinstitute.org",
    "download_url": "https://files.pythonhosted.org/packages/8e/ec/431c76970f8973ea5937a9b5f2d1689a641b3fe6475246a32451274fa2dd/hic-straw-1.3.1.tar.gz",
    "platform": null,
    "description": "## Quick Start Python\n\nStraw is library which allows rapid streaming of contact data from .hic files. \nTo learn more about Hi-C data and 3D genomics, visit https://aidenlab.gitbook.io/juicebox/\n\nOnce you've installed the library with `pip install hic-straw`, you can import your code with `import hicstraw`. \n\n## New usage to directly get numpy matrix\n\nThe new usage for straw allows you to create objects and retain intermediate variables.\nThis can speed up your code significantly when querying hundreds or thousands of regions\nfor a given chromosome/resolution/normalization.\n\nFirst we import `numpy` and `hicstraw`.\n```python\nimport numpy as np\nimport hicstraw\n```\n\nWe then create a Hi-C file object. \nFrom this object, we can query genomeID, chromosomes, and resolutions.\n```python\nhic = hicstraw.HiCFile(\"HIC001.hic\")\nprint(hic.getChromosomes())\nprint(hic.getGenomeID())\nprint(hic.getResolutions())\n```\n\nWe can also collect a matrix zoom data object, which is specific to \n- specific matrix-type: `observed` (count) or `oe` (observed/expected ratio)\n- chromosome-chromosome pair\n- resolution\n- normalization\n\nThis object retains information for fast future queries. \nHere's an example that pick the counts from the intrachromosomal region for chr4 \nwith KR normalization at 5kB resolution.\n```python\nmzd = hic.getMatrixZoomData('4', '4', \"observed\", \"KR\", \"BP\", 5000)\n```\n\nWe can get numpy matrices for specific genomic windows by calling:\n```python\nnumpy_matrix = mzd.getRecordsAsMatrix(10000000, 12000000, 10000000, 12000000)\n```\n\n### Usage\n```\nhic = hicstraw.HiCFile(filepath)\nhic.getChromosomes()\nhic.getGenomeID()\nhic.getResolutions()\n\nmzd = hic.getMatrixZoomData(chrom1, chrom2, data_type, normalization, \"BP\", resolution)\n\nnumpy_matrix = mzd.getRecordsAsMatrix(gr1, gr2, gc1, gc2)\nrecords_list = mzd.getRecords(gr1, gr2, gc1, gc2)\n```\n\n`filepath`: path to file (local or URL)<br>\n`data_type`: `'observed'` (previous default / \"main\" data) or `'oe'` (observed/expected)<br>\n`normalization`: `NONE`, `VC`, `VC_SQRT`, `KR`, `SCALE`, etc.<br>\n`resolution`: typically `2500000`, `1000000`, `500000`, `100000`, `50000`, `25000`, `10000`, `5000`, etc.<br><br>\nNote: the normalization, resolution, and chromosome/regions must already exist in the .hic to be read \n(i.e. they are not calculated by straw, only read from the file if available)<br>\n`gr1`: start genomic position along rows<br>\n`gr2`: end genomic position along rows<br>\n`gc1`: start genomic position along columns<br>\n`gc2`: end genomic position along columns<br>\n\n\n## Legacy usage to fetch list of contacts\n\nFor example, to fetch a list of all the raw contacts on chrX at 100Kb resolution:\n\n```python\nimport hicstraw\nresult = hicstraw.straw('observed', 'NONE', 'HIC001.hic', 'X', 'X', 'BP', 1000000)\nfor i in range(len(result)):\n    print(\"{0}\\t{1}\\t{2}\".format(result[i].binX, result[i].binY, result[i].counts))\n```\n\nTo fetch a list of KR normalized contacts for the same region:\n```python\nimport hicstraw\nresult = hicstraw.straw('observed', 'KR', 'HIC001.hic', 'X', 'X', 'BP', 1000000)\nfor i in range(len(result)):\n    print(\"{0}\\t{1}\\t{2}\".format(result[i].binX, result[i].binY, result[i].counts))\n```\n\nTo query observed/expected KR normalized data:\n```python\nimport hicstraw\nresult = hicstraw.straw('oe', 'KR', 'HIC001.hic', 'X', 'X', 'BP', 1000000)\nfor i in range(len(result)):\n    print(\"{0}\\t{1}\\t{2}\".format(result[i].binX, result[i].binY, result[i].counts))\n```\n\n### Usage\n```\nhicstraw.straw(data_type, normalization, file, region_x, region_y, 'BP', resolution)\n```\n\n`data_type`: `'observed'` (previous default / \"main\" data) or `'oe'` (observed/expected)<br>\n`normalization`: `NONE`, `VC`, `VC_SQRT`, `KR`, `SCALE`, etc.<br>\n`file`: filepath (local or URL)<br>\n`region_x/y`: provide the `chromosome` or utilize the syntax `chromosome:start_position:end_position` if using a smaller window within the chromosome<br>\n`resolution`: typically `2500000`, `1000000`, `500000`, `100000`, `50000`, `25000`, `10000`, `5000`, etc.<br><br>\nNote: the normalization, resolution, and chromosome/regions must already exist in the .hic to be read \n(i.e. they are not calculated by straw, only read from the file if available)<br>\n\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Straw bound with pybind11",
    "version": "1.3.1",
    "project_urls": {
        "Homepage": "https://github.com/aidenlab/straw"
    },
    "split_keywords": [
        "hi-c",
        "3d genomics",
        "chromatin",
        "ml"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0990fa240ee10625db3d81901a1ce60f5302d43c81422db901d0f9931902d7d4",
                "md5": "b1ec364d5216030a42b03bf935da30e9",
                "sha256": "7dea65dba0b271453fa624ee7f5e7d3ffd18e08e4905a8c714bfd04648408c52"
            },
            "downloads": -1,
            "filename": "hic_straw-1.3.1-cp39-cp39-macosx_10_9_x86_64.whl",
            "has_sig": false,
            "md5_digest": "b1ec364d5216030a42b03bf935da30e9",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">3.3",
            "size": 124626,
            "upload_time": "2022-05-27T22:31:02",
            "upload_time_iso_8601": "2022-05-27T22:31:02.860128Z",
            "url": "https://files.pythonhosted.org/packages/09/90/fa240ee10625db3d81901a1ce60f5302d43c81422db901d0f9931902d7d4/hic_straw-1.3.1-cp39-cp39-macosx_10_9_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8eec431c76970f8973ea5937a9b5f2d1689a641b3fe6475246a32451274fa2dd",
                "md5": "e7069201927daecd77354fd71e2bb35d",
                "sha256": "fb0f878127f6b1d096303c67793477c83fddf3f4a1a8e29a9d92952634989876"
            },
            "downloads": -1,
            "filename": "hic-straw-1.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "e7069201927daecd77354fd71e2bb35d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">3.3",
            "size": 18112,
            "upload_time": "2022-05-27T22:31:04",
            "upload_time_iso_8601": "2022-05-27T22:31:04.733044Z",
            "url": "https://files.pythonhosted.org/packages/8e/ec/431c76970f8973ea5937a9b5f2d1689a641b3fe6475246a32451274fa2dd/hic-straw-1.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-05-27 22:31:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "aidenlab",
    "github_project": "straw",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "hic-straw"
}
        
Elapsed time: 0.28614s