bioframe


Namebioframe JSON
Version 0.7.2 PyPI version JSON
download
home_pageNone
SummaryOperations and utilities for Genomic Interval Dataframes.
upload_time2024-06-19 22:03:44
maintainerNone
docs_urlNone
authorNone
requires_pythonNone
licenseMIT
keywords bed bedframe bedtools bioinformatics dataframe epigenomics genomic ranges genomics interval operations pandas viewframe
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Bioframe: Operations on Genomic Interval Dataframes

<img src="https://github.com/open2c/bioframe/raw/main/docs/figs/bioframe-logo.png" width=75%>

![CI](https://github.com/open2c/bioframe/actions/workflows/ci.yml/badge.svg)
[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/open2c/bioframe/main.svg)](https://results.pre-commit.ci/latest/github/open2c/bioframe/main)
[![Docs status](https://readthedocs.org/projects/bioframe/badge/)](https://bioframe.readthedocs.io/en/latest/)
[![Paper](https://img.shields.io/badge/DOI-10.1093%2Fbioinformatics%2Fbtae088-blue)](https://doi.org/10.1093/bioinformatics/btae088)
[![Zenodo](https://zenodo.org/badge/69901992.svg)](https://zenodo.org/badge/latestdoi/69901992)
[![Slack](https://img.shields.io/badge/chat-slack-%233F0F3F?logo=slack)](https://bit.ly/open2c-slack)
[![NumFOCUS](https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A)](https://www.numfocus.org)

Bioframe enables flexible and scalable operations on genomic interval dataframes in Python.

Bioframe is built directly on top of [Pandas](https://pandas.pydata.org/). Bioframe provides:

* A variety of genomic interval operations that work directly on dataframes.
* Operations for special classes of genomic intervals, including chromosome arms and fixed-size bins.
* Conveniences for diverse tabular genomic data formats and loading genome assembly summary information.

Read the [documentation](https://bioframe.readthedocs.io/en/latest/), including the [guide](https://bioframe.readthedocs.io/en/latest/guide-intervalops.html), as well as the [publication](https://doi.org/10.1093/bioinformatics/btae088) for more information.

Bioframe is an Affiliated Project of [NumFOCUS](https://www.numfocus.org).

## Installation

Bioframe is available on [PyPI](https://pypi.org/project/bioframe/) and [bioconda](https://bioconda.github.io/recipes/bioframe/README.html):

```sh
pip install bioframe
```

## Contributing

Interested in contributing to bioframe? That's great! To get started, check out the [contributing guide](https://github.com/open2c/bioframe/blob/main/CONTRIBUTING.md). Discussions about the project roadmap take place on the [Open2C Slack](https://bit.ly/open2c-slack) and regular developer meetings scheduled there. Anyone can join and participate!


## Interval operations

Key genomic interval operations in bioframe include:
- `overlap`: Find pairs of overlapping genomic intervals between two dataframes.
- `closest`: For every interval in a dataframe, find the closest intervals in a second dataframe.
- `cluster`: Group overlapping intervals in a dataframe into clusters.
- `complement`: Find genomic intervals that are not covered by any interval from a dataframe.

Bioframe additionally has functions that are frequently used for genomic interval operations and can be expressed as combinations of these core operations and dataframe operations, including: `coverage`, `expand`, `merge`, `select`, and `subtract`.

To `overlap` two dataframes, call:
```python
import bioframe as bf

bf.overlap(df1, df2)
```

For these two input dataframes, with intervals all on the same chromosome:

<img src="https://github.com/open2c/bioframe/raw/main/docs/figs/df1.png" width=60%>
<img src="https://github.com/open2c/bioframe/raw/main/docs/figs/df2.png" width=60%>

`overlap` will return the following interval pairs as overlaps:

<img src="https://github.com/open2c/bioframe/raw/main/docs/figs/overlap_inner_0.png" width=60%>
<img src="https://github.com/open2c/bioframe/raw/main/docs/figs/overlap_inner_1.png" width=60%>


To `merge` all overlapping intervals in a dataframe, call:
```python
import bioframe as bf

bf.merge(df1)
```

For this input dataframe, with intervals all on the same chromosome:

<img src="https://github.com/open2c/bioframe/raw/main/docs/figs/df1.png" width=60%>

`merge` will return a new dataframe with these merged intervals:

<img src="https://github.com/open2c/bioframe/raw/main/docs/figs/merge_df1.png" width=60%>

See the [guide](https://bioframe.readthedocs.io/en/latest/guide-intervalops.html) for visualizations of other interval operations in bioframe.

## File I/O

Bioframe includes utilities for reading genomic file formats into dataframes and vice versa. One handy function is `read_table` which mirrors pandas’s read_csv/read_table but provides a [`schema`](https://github.com/open2c/bioframe/blob/main/bioframe/io/schemas.py) argument to populate column names for common tabular file formats.

```python
jaspar_url = 'http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/hg38/MA0139.1.tsv.gz'
ctcf_motif_calls = bioframe.read_table(jaspar_url, schema='jaspar', skiprows=1)
```

## Tutorials
See this [jupyter notebook](https://github.com/open2c/bioframe/tree/master/docs/tutorials/tutorial_assign_motifs_to_peaks.ipynb) for an example of how to assign TF motifs to ChIP-seq peaks using bioframe.


## Citing

If you use ***bioframe*** in your work, please cite:

```bibtex
@article{bioframe_2024,
author = {Open2C and Abdennur, Nezar and Fudenberg, Geoffrey and Flyamer, Ilya M and Galitsyna, Aleksandra A and Goloborodko, Anton and Imakaev, Maxim and Venev, Sergey},
doi = {10.1093/bioinformatics/btae088},
journal = {Bioinformatics},
title = {{Bioframe: Operations on Genomic Intervals in Pandas Dataframes}},
year = {2024}
}
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "bioframe",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "bed, bedframe, bedtools, bioinformatics, dataframe, epigenomics, genomic ranges, genomics, interval operations, pandas, viewframe",
    "author": null,
    "author_email": "Open2C <open.chromosome.collective@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/a0/62/dba524f00b36af9f2df52b65431510dd15af5b74f858ee5ca931da922a90/bioframe-0.7.2.tar.gz",
    "platform": null,
    "description": "# Bioframe: Operations on Genomic Interval Dataframes\n\n<img src=\"https://github.com/open2c/bioframe/raw/main/docs/figs/bioframe-logo.png\" width=75%>\n\n![CI](https://github.com/open2c/bioframe/actions/workflows/ci.yml/badge.svg)\n[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/open2c/bioframe/main.svg)](https://results.pre-commit.ci/latest/github/open2c/bioframe/main)\n[![Docs status](https://readthedocs.org/projects/bioframe/badge/)](https://bioframe.readthedocs.io/en/latest/)\n[![Paper](https://img.shields.io/badge/DOI-10.1093%2Fbioinformatics%2Fbtae088-blue)](https://doi.org/10.1093/bioinformatics/btae088)\n[![Zenodo](https://zenodo.org/badge/69901992.svg)](https://zenodo.org/badge/latestdoi/69901992)\n[![Slack](https://img.shields.io/badge/chat-slack-%233F0F3F?logo=slack)](https://bit.ly/open2c-slack)\n[![NumFOCUS](https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A)](https://www.numfocus.org)\n\nBioframe enables flexible and scalable operations on genomic interval dataframes in Python.\n\nBioframe is built directly on top of [Pandas](https://pandas.pydata.org/). Bioframe provides:\n\n* A variety of genomic interval operations that work directly on dataframes.\n* Operations for special classes of genomic intervals, including chromosome arms and fixed-size bins.\n* Conveniences for diverse tabular genomic data formats and loading genome assembly summary information.\n\nRead the [documentation](https://bioframe.readthedocs.io/en/latest/), including the [guide](https://bioframe.readthedocs.io/en/latest/guide-intervalops.html), as well as the [publication](https://doi.org/10.1093/bioinformatics/btae088) for more information.\n\nBioframe is an Affiliated Project of [NumFOCUS](https://www.numfocus.org).\n\n## Installation\n\nBioframe is available on [PyPI](https://pypi.org/project/bioframe/) and [bioconda](https://bioconda.github.io/recipes/bioframe/README.html):\n\n```sh\npip install bioframe\n```\n\n## Contributing\n\nInterested in contributing to bioframe? That's great! To get started, check out the [contributing guide](https://github.com/open2c/bioframe/blob/main/CONTRIBUTING.md). Discussions about the project roadmap take place on the [Open2C Slack](https://bit.ly/open2c-slack) and regular developer meetings scheduled there. Anyone can join and participate!\n\n\n## Interval operations\n\nKey genomic interval operations in bioframe include:\n- `overlap`: Find pairs of overlapping genomic intervals between two dataframes.\n- `closest`: For every interval in a dataframe, find the closest intervals in a second dataframe.\n- `cluster`: Group overlapping intervals in a dataframe into clusters.\n- `complement`: Find genomic intervals that are not covered by any interval from a dataframe.\n\nBioframe additionally has functions that are frequently used for genomic interval operations and can be expressed as combinations of these core operations and dataframe operations, including: `coverage`, `expand`, `merge`, `select`, and `subtract`.\n\nTo `overlap` two dataframes, call:\n```python\nimport bioframe as bf\n\nbf.overlap(df1, df2)\n```\n\nFor these two input dataframes, with intervals all on the same chromosome:\n\n<img src=\"https://github.com/open2c/bioframe/raw/main/docs/figs/df1.png\" width=60%>\n<img src=\"https://github.com/open2c/bioframe/raw/main/docs/figs/df2.png\" width=60%>\n\n`overlap` will return the following interval pairs as overlaps:\n\n<img src=\"https://github.com/open2c/bioframe/raw/main/docs/figs/overlap_inner_0.png\" width=60%>\n<img src=\"https://github.com/open2c/bioframe/raw/main/docs/figs/overlap_inner_1.png\" width=60%>\n\n\nTo `merge` all overlapping intervals in a dataframe, call:\n```python\nimport bioframe as bf\n\nbf.merge(df1)\n```\n\nFor this input dataframe, with intervals all on the same chromosome:\n\n<img src=\"https://github.com/open2c/bioframe/raw/main/docs/figs/df1.png\" width=60%>\n\n`merge` will return a new dataframe with these merged intervals:\n\n<img src=\"https://github.com/open2c/bioframe/raw/main/docs/figs/merge_df1.png\" width=60%>\n\nSee the [guide](https://bioframe.readthedocs.io/en/latest/guide-intervalops.html) for visualizations of other interval operations in bioframe.\n\n## File I/O\n\nBioframe includes utilities for reading genomic file formats into dataframes and vice versa. One handy function is `read_table` which mirrors pandas\u2019s read_csv/read_table but provides a [`schema`](https://github.com/open2c/bioframe/blob/main/bioframe/io/schemas.py) argument to populate column names for common tabular file formats.\n\n```python\njaspar_url = 'http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/hg38/MA0139.1.tsv.gz'\nctcf_motif_calls = bioframe.read_table(jaspar_url, schema='jaspar', skiprows=1)\n```\n\n## Tutorials\nSee this [jupyter notebook](https://github.com/open2c/bioframe/tree/master/docs/tutorials/tutorial_assign_motifs_to_peaks.ipynb) for an example of how to assign TF motifs to ChIP-seq peaks using bioframe.\n\n\n## Citing\n\nIf you use ***bioframe*** in your work, please cite:\n\n```bibtex\n@article{bioframe_2024,\nauthor = {Open2C and Abdennur, Nezar and Fudenberg, Geoffrey and Flyamer, Ilya M and Galitsyna, Aleksandra A and Goloborodko, Anton and Imakaev, Maxim and Venev, Sergey},\ndoi = {10.1093/bioinformatics/btae088},\njournal = {Bioinformatics},\ntitle = {{Bioframe: Operations on Genomic Intervals in Pandas Dataframes}},\nyear = {2024}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Operations and utilities for Genomic Interval Dataframes.",
    "version": "0.7.2",
    "project_urls": {
        "changelog": "https://github.com/open2c/bioframe/blob/main/CHANGES.md",
        "documentation": "https://bioframe.readthedocs.io/en/latest",
        "homepage": "https://github.com/open2c/bioframe",
        "repository": "https://github.com/open2c/bioframe"
    },
    "split_keywords": [
        "bed",
        " bedframe",
        " bedtools",
        " bioinformatics",
        " dataframe",
        " epigenomics",
        " genomic ranges",
        " genomics",
        " interval operations",
        " pandas",
        " viewframe"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "313b6f58a614f3bbceafb69ec5a4126620abad253718ae21d66412c14f0c8b64",
                "md5": "966322633a5d557f273d75211b9a3447",
                "sha256": "ee5aa0ee00cdd997aa304d7527b42563d6a0af5fd7eedf22da2224e6848dc3c8"
            },
            "downloads": -1,
            "filename": "bioframe-0.7.2-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "966322633a5d557f273d75211b9a3447",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 151496,
            "upload_time": "2024-06-19T22:03:42",
            "upload_time_iso_8601": "2024-06-19T22:03:42.649971Z",
            "url": "https://files.pythonhosted.org/packages/31/3b/6f58a614f3bbceafb69ec5a4126620abad253718ae21d66412c14f0c8b64/bioframe-0.7.2-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a062dba524f00b36af9f2df52b65431510dd15af5b74f858ee5ca931da922a90",
                "md5": "866af56db5fa6dda270e1cd10e28df19",
                "sha256": "23fa150948fb1f9409a8d608c94f222fd2e144c8f1ac965879517d5e87d2c598"
            },
            "downloads": -1,
            "filename": "bioframe-0.7.2.tar.gz",
            "has_sig": false,
            "md5_digest": "866af56db5fa6dda270e1cd10e28df19",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 963454,
            "upload_time": "2024-06-19T22:03:44",
            "upload_time_iso_8601": "2024-06-19T22:03:44.624615Z",
            "url": "https://files.pythonhosted.org/packages/a0/62/dba524f00b36af9f2df52b65431510dd15af5b74f858ee5ca931da922a90/bioframe-0.7.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-19 22:03:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "open2c",
    "github_project": "bioframe",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "bioframe"
}
        
Elapsed time: 0.25163s