pytabix


Namepytabix JSON
Version 0.1 PyPI version JSON
download
home_pagehttps://github.com/slowkow/pytabix
SummaryPython interface for tabix
upload_time2014-04-16 17:49:24
maintainerKamil Slowikowski
docs_urlNone
authorHyeshik Chang, Kamil Slowikowski
requires_pythonNone
licenseMIT
keywords tabix bgzip bioinformatics genomics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            This module allows fast random access to files compressed with bgzip_ and
indexed by tabix_. It includes a C extension with code from klib_. The bgzip
and tabix programs are available here_.

Installation
------------

::

    pip install --user pytabix


Synopsis
--------

Genomics data is often in a table where each row corresponds to a genomic
region (start, end) or a position::

    chrom  pos      snp
    1      1000760  rs75316104
    1      1000894  rs114006445
    1      1000910  rs79750022
    1      1001177  rs4970401
    1      1001256  rs78650406

With tabix_, you can quickly retrieve all rows in a genomic region by
specifying a query with a sequence name, start, and end:

.. code:: python

    import tabix

    # Open a remote or local file.
    url = "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/"
    url += "ALL.2of4intersection.20100804.genotypes.vcf.gz"

    tb = tabix.open(url)

    # These queries are identical. A query returns an iterator over the results.
    records = tb.query("1", 1000000, 1250000)
    records = tb.queryi(0, 1000000, 1250000)
    records = tb.querys("1:1000000-1250000")

    # Each record is a list of strings.
    for record in records:
        print record[:3]

.. code:: python

    ['1', '1000760', 'rs75316104']
    ['1', '1000760', 'rs75316104']
    ['1', '1000894', 'rs114006445']
    ['1', '1000910', 'rs79750022']
    ['1', '1001177', 'rs4970401']
    ['1', '1001256', 'rs78650406']


Example
-------

Let's say you have a table of gene coordinates:

.. code:: bash

    $ zcat example.bed.gz | shuf | head -n5 | column -t
    chr19  53611131   53636172   55786   ZNF415
    chr10  72149121   72150375   221017  CEP57L1P1
    chr4   185009858  185139113  133121  ENPP6
    chrX   132669772  133119672  2719    GPC3
    chr6   134924279  134925376  114182  FAM8A6P

Sort_ it by chromosome, then by start and end positions. Then, use bgzip_ to
deflate the file into compressed blocks:

.. code:: bash

    $ zcat example.bed.gz | sort -k1V -k2n -k3n | bgzip > example.bed.bgz

The compressed size is usually slightly larger than that obtained with gzip.

Index the file with tabix_:

.. code:: bash

    $ tabix -s 1 -b 2 -e 3 example.bed.gz
    
    $ ls
    example.bed.gz  example.bed.bgz  example.bed.bgz.tbi

.. _bgzip: http://samtools.sourceforge.net/tabix.shtml
.. _tabix: http://samtools.sourceforge.net/tabix.shtml
.. _klib: https://github.com/jmarshall/klib
.. _here: http://sourceforge.net/projects/samtools/files/tabix/
.. _Sort: https://www.gnu.org/software/coreutils/manual/html_node/Details-about-version-sort.html#Details-about-version-sort
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/slowkow/pytabix",
    "name": "pytabix",
    "maintainer": "Kamil Slowikowski",
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": "slowikow@broadinstitute.org",
    "keywords": "tabix, bgzip, bioinformatics, genomics",
    "author": "Hyeshik Chang, Kamil Slowikowski",
    "author_email": "hyeshik@snu.ac.kr, slowikow@broadinstitute.org",
    "download_url": "https://files.pythonhosted.org/packages/84/6a/520ecf75c2ada77492cb4ed21fb22aed178e791df434ca083b59fffadddd/pytabix-0.1.tar.gz",
    "platform": "",
    "description": "This module allows fast random access to files compressed with bgzip_ and\r\nindexed by tabix_. It includes a C extension with code from klib_. The bgzip\r\nand tabix programs are available here_.\r\n\r\nInstallation\r\n------------\r\n\r\n::\r\n\r\n    pip install --user pytabix\r\n\r\n\r\nSynopsis\r\n--------\r\n\r\nGenomics data is often in a table where each row corresponds to a genomic\r\nregion (start, end) or a position::\r\n\r\n    chrom  pos      snp\r\n    1      1000760  rs75316104\r\n    1      1000894  rs114006445\r\n    1      1000910  rs79750022\r\n    1      1001177  rs4970401\r\n    1      1001256  rs78650406\r\n\r\nWith tabix_, you can quickly retrieve all rows in a genomic region by\r\nspecifying a query with a sequence name, start, and end:\r\n\r\n.. code:: python\r\n\r\n    import tabix\r\n\r\n    # Open a remote or local file.\r\n    url = \"ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/\"\r\n    url += \"ALL.2of4intersection.20100804.genotypes.vcf.gz\"\r\n\r\n    tb = tabix.open(url)\r\n\r\n    # These queries are identical. A query returns an iterator over the results.\r\n    records = tb.query(\"1\", 1000000, 1250000)\r\n    records = tb.queryi(0, 1000000, 1250000)\r\n    records = tb.querys(\"1:1000000-1250000\")\r\n\r\n    # Each record is a list of strings.\r\n    for record in records:\r\n        print record[:3]\r\n\r\n.. code:: python\r\n\r\n    ['1', '1000760', 'rs75316104']\r\n    ['1', '1000760', 'rs75316104']\r\n    ['1', '1000894', 'rs114006445']\r\n    ['1', '1000910', 'rs79750022']\r\n    ['1', '1001177', 'rs4970401']\r\n    ['1', '1001256', 'rs78650406']\r\n\r\n\r\nExample\r\n-------\r\n\r\nLet's say you have a table of gene coordinates:\r\n\r\n.. code:: bash\r\n\r\n    $ zcat example.bed.gz | shuf | head -n5 | column -t\r\n    chr19  53611131   53636172   55786   ZNF415\r\n    chr10  72149121   72150375   221017  CEP57L1P1\r\n    chr4   185009858  185139113  133121  ENPP6\r\n    chrX   132669772  133119672  2719    GPC3\r\n    chr6   134924279  134925376  114182  FAM8A6P\r\n\r\nSort_ it by chromosome, then by start and end positions. Then, use bgzip_ to\r\ndeflate the file into compressed blocks:\r\n\r\n.. code:: bash\r\n\r\n    $ zcat example.bed.gz | sort -k1V -k2n -k3n | bgzip > example.bed.bgz\r\n\r\nThe compressed size is usually slightly larger than that obtained with gzip.\r\n\r\nIndex the file with tabix_:\r\n\r\n.. code:: bash\r\n\r\n    $ tabix -s 1 -b 2 -e 3 example.bed.gz\r\n    \r\n    $ ls\r\n    example.bed.gz  example.bed.bgz  example.bed.bgz.tbi\r\n\r\n.. _bgzip: http://samtools.sourceforge.net/tabix.shtml\r\n.. _tabix: http://samtools.sourceforge.net/tabix.shtml\r\n.. _klib: https://github.com/jmarshall/klib\r\n.. _here: http://sourceforge.net/projects/samtools/files/tabix/\r\n.. _Sort: https://www.gnu.org/software/coreutils/manual/html_node/Details-about-version-sort.html#Details-about-version-sort",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python interface for tabix",
    "version": "0.1",
    "project_urls": {
        "Download": "UNKNOWN",
        "Homepage": "https://github.com/slowkow/pytabix"
    },
    "split_keywords": [
        "tabix",
        " bgzip",
        " bioinformatics",
        " genomics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "846a520ecf75c2ada77492cb4ed21fb22aed178e791df434ca083b59fffadddd",
                "md5": "bf9c069c3787c0c240255b917ef34405",
                "sha256": "0774f1687ebd41811fb07a0e50951b6be72d7cc7e22ed2b18972eaf7482eb7d1"
            },
            "downloads": -1,
            "filename": "pytabix-0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "bf9c069c3787c0c240255b917ef34405",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 45811,
            "upload_time": "2014-04-16T17:49:24",
            "upload_time_iso_8601": "2014-04-16T17:49:24.235849Z",
            "url": "https://files.pythonhosted.org/packages/84/6a/520ecf75c2ada77492cb4ed21fb22aed178e791df434ca083b59fffadddd/pytabix-0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2014-04-16 17:49:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "slowkow",
    "github_project": "pytabix",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "pytabix"
}
        
Elapsed time: 0.07631s