pybedtools


Namepybedtools JSON
Version 0.11.0 PyPI version JSON
download
home_pagehttps://github.com/daler/pybedtools
SummaryWrapper around BEDTools for bioinformatics work
upload_time2025-01-02 15:56:45
maintainerRyan Dale
docs_urlhttps://pythonhosted.org/pybedtools/
authorNone
requires_pythonNone
licenseMIT
keywords
VCS
bugtrack_url
requirements numpy pandas pysam
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
Overview
--------

.. image:: https://badge.fury.io/py/pybedtools.svg?style=flat
    :target: https://badge.fury.io/py/pybedtools

.. image:: https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg
    :target: https://bioconda.github.io

The `BEDTools suite of programs <http://bedtools.readthedocs.org/>`_ is widely
used for genomic interval manipulation or "genome algebra".  `pybedtools` wraps
and extends BEDTools and offers feature-level manipulations from within
Python.

See full online documentation, including installation instructions, at
https://daler.github.io/pybedtools/.

The GitHub repo is at https://github.com/daler/pybedtools.

Why `pybedtools`?
-----------------

Here is an example to get the names of genes that are <5 kb away from
intergenic SNPs:

.. code-block:: python

    from pybedtools import BedTool

    snps = BedTool('snps.bed.gz')  # [1]
    genes = BedTool('hg19.gff')    # [1]

    intergenic_snps = snps.subtract(genes)                       # [2]
    nearby = genes.closest(intergenic_snps, d=True, stream=True) # [2, 3]

    for gene in nearby:             # [4]
        if int(gene[-1]) < 5000:    # [4]
            print gene.name         # [4]

Useful features shown here include:

* `[1]` support for all BEDTools-supported formats (here gzipped BED and GFF)
* `[2]` wrapping of all BEDTools programs and arguments (here, `subtract` and `closest` and passing
  the `-d` flag to `closest`);
* `[3]` streaming results (like Unix pipes, here specified by `stream=True`)
* `[4]` iterating over results while accessing feature data by index or by attribute
  access (here `[-1]` and `.name`).

In contrast, here is the same analysis using shell scripting.  Note that this
requires knowledge in Perl, bash, and awk.  The run time is identical to the
`pybedtools` version above:

.. code-block:: bash

    snps=snps.bed.gz
    genes=hg19.gff
    intergenic_snps=/tmp/intergenic_snps

    snp_fields=`zcat $snps | awk '(NR == 2){print NF; exit;}'`
    gene_fields=9
    distance_field=$(($gene_fields + $snp_fields + 1))

    intersectBed -a $snps -b $genes -v > $intergenic_snps

    closestBed -a $genes -b $intergenic_snps -d \
    | awk '($'$distance_field' < 5000){print $9;}' \
    | perl -ne 'm/[ID|Name|gene_id]=(.*?);/; print "$1\n"'

    rm $intergenic_snps

See the `Shell script comparison <http://daler.github.io/pybedtools/sh-comparison.html>`_ in the docs
for more details on this comparison, or keep reading the full documentation at
http://daler.github.io/pybedtools.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/daler/pybedtools",
    "name": "pybedtools",
    "maintainer": "Ryan Dale",
    "docs_url": "https://pythonhosted.org/pybedtools/",
    "requires_python": null,
    "maintainer_email": "ryan.dale@nih.gov",
    "keywords": null,
    "author": null,
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/93/c0/593dadfc238f1980cc7e612b9035f0f2890bea2b9a745c8dabadfe9d4da0/pybedtools-0.11.0.tar.gz",
    "platform": null,
    "description": "\nOverview\n--------\n\n.. image:: https://badge.fury.io/py/pybedtools.svg?style=flat\n    :target: https://badge.fury.io/py/pybedtools\n\n.. image:: https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg\n    :target: https://bioconda.github.io\n\nThe `BEDTools suite of programs <http://bedtools.readthedocs.org/>`_ is widely\nused for genomic interval manipulation or \"genome algebra\".  `pybedtools` wraps\nand extends BEDTools and offers feature-level manipulations from within\nPython.\n\nSee full online documentation, including installation instructions, at\nhttps://daler.github.io/pybedtools/.\n\nThe GitHub repo is at https://github.com/daler/pybedtools.\n\nWhy `pybedtools`?\n-----------------\n\nHere is an example to get the names of genes that are <5 kb away from\nintergenic SNPs:\n\n.. code-block:: python\n\n    from pybedtools import BedTool\n\n    snps = BedTool('snps.bed.gz')  # [1]\n    genes = BedTool('hg19.gff')    # [1]\n\n    intergenic_snps = snps.subtract(genes)                       # [2]\n    nearby = genes.closest(intergenic_snps, d=True, stream=True) # [2, 3]\n\n    for gene in nearby:             # [4]\n        if int(gene[-1]) < 5000:    # [4]\n            print gene.name         # [4]\n\nUseful features shown here include:\n\n* `[1]` support for all BEDTools-supported formats (here gzipped BED and GFF)\n* `[2]` wrapping of all BEDTools programs and arguments (here, `subtract` and `closest` and passing\n  the `-d` flag to `closest`);\n* `[3]` streaming results (like Unix pipes, here specified by `stream=True`)\n* `[4]` iterating over results while accessing feature data by index or by attribute\n  access (here `[-1]` and `.name`).\n\nIn contrast, here is the same analysis using shell scripting.  Note that this\nrequires knowledge in Perl, bash, and awk.  The run time is identical to the\n`pybedtools` version above:\n\n.. code-block:: bash\n\n    snps=snps.bed.gz\n    genes=hg19.gff\n    intergenic_snps=/tmp/intergenic_snps\n\n    snp_fields=`zcat $snps | awk '(NR == 2){print NF; exit;}'`\n    gene_fields=9\n    distance_field=$(($gene_fields + $snp_fields + 1))\n\n    intersectBed -a $snps -b $genes -v > $intergenic_snps\n\n    closestBed -a $genes -b $intergenic_snps -d \\\n    | awk '($'$distance_field' < 5000){print $9;}' \\\n    | perl -ne 'm/[ID|Name|gene_id]=(.*?);/; print \"$1\\n\"'\n\n    rm $intergenic_snps\n\nSee the `Shell script comparison <http://daler.github.io/pybedtools/sh-comparison.html>`_ in the docs\nfor more details on this comparison, or keep reading the full documentation at\nhttp://daler.github.io/pybedtools.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Wrapper around BEDTools for bioinformatics work",
    "version": "0.11.0",
    "project_urls": {
        "Homepage": "https://github.com/daler/pybedtools"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "93c0593dadfc238f1980cc7e612b9035f0f2890bea2b9a745c8dabadfe9d4da0",
                "md5": "e45ef213f0729bb8df0f197b917ef72c",
                "sha256": "73b67cdfcccf84f37b3c444db8a4b22025edd6edcb45ce5725697eeb5b510d60"
            },
            "downloads": -1,
            "filename": "pybedtools-0.11.0.tar.gz",
            "has_sig": false,
            "md5_digest": "e45ef213f0729bb8df0f197b917ef72c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 12543619,
            "upload_time": "2025-01-02T15:56:45",
            "upload_time_iso_8601": "2025-01-02T15:56:45.172064Z",
            "url": "https://files.pythonhosted.org/packages/93/c0/593dadfc238f1980cc7e612b9035f0f2890bea2b9a745c8dabadfe9d4da0/pybedtools-0.11.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-02 15:56:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "daler",
    "github_project": "pybedtools",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "pysam",
            "specs": []
        }
    ],
    "lcname": "pybedtools"
}
        
Elapsed time: 7.08047s