arboreto


Namearboreto JSON
Version 0.1.6 PyPI version JSON
download
home_pagehttps://github.com/aertslab/arboreto
SummaryScalable gene regulatory network inference using tree-based ensemble regressors
upload_time2021-02-09 11:20:21
maintainer
docs_urlNone
authorThomas Moerman
requires_python
licenseBSD 3-Clause License
keywords gene regulatory network inference regression ensemble scalable dask
VCS
bugtrack_url
requirements dask distributed numpy pandas scikit-learn scipy
Travis-CI
coveralls test coverage No coveralls.
            .. image:: img/arboreto.png
    :alt: arboreto
    :scale: 100%
    :align: left

.. image:: https://travis-ci.com/aertslab/arboreto.svg?branch=master
    :alt: Build Status
    :target: https://travis-ci.com/aertslab/arboreto

.. image:: https://readthedocs.org/projects/arboreto/badge/?version=latest
    :alt: Documentation Status
    :target: http://arboreto.readthedocs.io/en/latest/?badge=latest

.. image:: https://anaconda.org/bioconda/arboreto/badges/version.svg
    :alt: Bioconda package
    :target: https://anaconda.org/bioconda/arboreto

.. image:: https://img.shields.io/pypi/v/arboreto
    :alt: PyPI package
    :target: https://pypi.org/project/arboreto/

----

.. epigraph::

    *The most satisfactory definition of man from the scientific point of view is probably Man the Tool-maker.*

.. _arboreto: https://arboreto.readthedocs.io
.. _`arboreto documentation`: https://arboreto.readthedocs.io
.. _notebooks: https://github.com/tmoerman/arboreto/tree/master/notebooks
.. _issue: https://github.com/tmoerman/arboreto/issues/new

.. _dask: https://dask.pydata.org/en/latest/
.. _`dask distributed`: https://distributed.readthedocs.io/en/latest/

.. _GENIE3: http://www.montefiore.ulg.ac.be/~huynh-thu/GENIE3.html
.. _`Random Forest`: https://en.wikipedia.org/wiki/Random_forest
.. _ExtraTrees: https://en.wikipedia.org/wiki/Random_forest#ExtraTrees
.. _`Stochastic Gradient Boosting Machine`: https://en.wikipedia.org/wiki/Gradient_boosting#Stochastic_gradient_boosting
.. _`early-stopping`: https://en.wikipedia.org/wiki/Early_stopping

Inferring a gene regulatory network (GRN) from gene expression data is a computationally expensive task, exacerbated by increasing data sizes due to advances
in high-throughput gene profiling technology.

The arboreto_ software library addresses this issue by providing a computational strategy that allows executing the class of GRN inference algorithms
exemplified by GENIE3_ [1] on hardware ranging from a single computer to a multi-node compute cluster. This class of GRN inference algorithms is defined by
a series of steps, one for each target gene in the dataset, where the most important candidates from a set of regulators are determined from a regression
model to predict a target gene's expression profile.

Members of the above class of GRN inference algorithms are attractive from a computational point of view because they are parallelizable by nature. In arboreto,
we specify the parallelizable computation as a dask_ graph [2], a data structure that represents the task schedule of a computation. A dask scheduler assigns the
tasks in a dask graph to the available computational resources. Arboreto uses the `dask distributed`_ scheduler to
spread out the computational tasks over multiple processes running on one or multiple machines.

Arboreto currently supports 2 GRN inference algorithms:

1. **GRNBoost2**: a novel and fast GRN inference algorithm using `Stochastic Gradient Boosting Machine`_ (SGBM) [3] regression with `early-stopping`_ regularization.
2. **GENIE3**: the classic GRN inference algorithm using `Random Forest`_ (RF) or ExtraTrees_ (ET) regression.

Get Started
***********

Arboreto was conceived with the working bioinformatician or data scientist in mind. We provide extensive documentation and examples to help you get up to speed with the library.

* Read the `arboreto documentation`_.
* Browse example notebooks_.
* Report an issue_.

License
*******

BSD 3-Clause License

pySCENIC
========

.. _pySCENIC: https://github.com/aertslab/pySCENIC
.. _SCENIC: https://aertslab.org/#scenic

Arboreto is a component in pySCENIC_: a lightning-fast python implementation of
the SCENIC_ pipeline [5] (Single-Cell rEgulatory Network Inference and Clustering)
which enables biologists to infer transcription factors, gene regulatory networks
and cell types from single-cell RNA-seq data.

References
**********

1. Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P (2010) Inferring Regulatory Networks from Expression Data Using Tree-Based Methods. PLoS ONE
2. Rocklin, M. (2015). Dask: parallel computation with blocked algorithms and task scheduling. In Proceedings of the 14th Python in Science Conference (pp. 130-136).
3. Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367-378.
4. Marbach, D., Costello, J. C., Kuffner, R., Vega, N. M., Prill, R. J., Camacho, D. M., ... & Dream5 Consortium. (2012). Wisdom of crowds for robust gene network inference. Nature methods, 9(8), 796-804.
5. Aibar S, Bravo Gonzalez-Blas C, Moerman T, Wouters J, Huynh-Thu VA, Imrichova H, Kalender Atak Z, Hulselmans G, Dewaele M, Rambow F, Geurts P, Aerts J, Marine C, van den Oord J, Aerts S. SCENIC: Single-cell regulatory network inference and clustering. Nature Methods 14, 1083–1086 (2017). doi: 10.1038/nmeth.4463



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/aertslab/arboreto",
    "name": "arboreto",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "gene,regulatory,network,inference,regression,ensemble,scalable,dask",
    "author": "Thomas Moerman",
    "author_email": "thomas.moerman@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/d8/b2/1942195d3848abf64b8115e219c4a530b05798f7332938dfd0e80b93c464/arboreto-0.1.6.tar.gz",
    "platform": "any",
    "description": ".. image:: img/arboreto.png\n    :alt: arboreto\n    :scale: 100%\n    :align: left\n\n.. image:: https://travis-ci.com/aertslab/arboreto.svg?branch=master\n    :alt: Build Status\n    :target: https://travis-ci.com/aertslab/arboreto\n\n.. image:: https://readthedocs.org/projects/arboreto/badge/?version=latest\n    :alt: Documentation Status\n    :target: http://arboreto.readthedocs.io/en/latest/?badge=latest\n\n.. image:: https://anaconda.org/bioconda/arboreto/badges/version.svg\n    :alt: Bioconda package\n    :target: https://anaconda.org/bioconda/arboreto\n\n.. image:: https://img.shields.io/pypi/v/arboreto\n    :alt: PyPI package\n    :target: https://pypi.org/project/arboreto/\n\n----\n\n.. epigraph::\n\n    *The most satisfactory definition of man from the scientific point of view is probably Man the Tool-maker.*\n\n.. _arboreto: https://arboreto.readthedocs.io\n.. _`arboreto documentation`: https://arboreto.readthedocs.io\n.. _notebooks: https://github.com/tmoerman/arboreto/tree/master/notebooks\n.. _issue: https://github.com/tmoerman/arboreto/issues/new\n\n.. _dask: https://dask.pydata.org/en/latest/\n.. _`dask distributed`: https://distributed.readthedocs.io/en/latest/\n\n.. _GENIE3: http://www.montefiore.ulg.ac.be/~huynh-thu/GENIE3.html\n.. _`Random Forest`: https://en.wikipedia.org/wiki/Random_forest\n.. _ExtraTrees: https://en.wikipedia.org/wiki/Random_forest#ExtraTrees\n.. _`Stochastic Gradient Boosting Machine`: https://en.wikipedia.org/wiki/Gradient_boosting#Stochastic_gradient_boosting\n.. _`early-stopping`: https://en.wikipedia.org/wiki/Early_stopping\n\nInferring a gene regulatory network (GRN) from gene expression data is a computationally expensive task, exacerbated by increasing data sizes due to advances\nin high-throughput gene profiling technology.\n\nThe arboreto_ software library addresses this issue by providing a computational strategy that allows executing the class of GRN inference algorithms\nexemplified by GENIE3_ [1] on hardware ranging from a single computer to a multi-node compute cluster. This class of GRN inference algorithms is defined by\na series of steps, one for each target gene in the dataset, where the most important candidates from a set of regulators are determined from a regression\nmodel to predict a target gene's expression profile.\n\nMembers of the above class of GRN inference algorithms are attractive from a computational point of view because they are parallelizable by nature. In arboreto,\nwe specify the parallelizable computation as a dask_ graph [2], a data structure that represents the task schedule of a computation. A dask scheduler assigns the\ntasks in a dask graph to the available computational resources. Arboreto uses the `dask distributed`_ scheduler to\nspread out the computational tasks over multiple processes running on one or multiple machines.\n\nArboreto currently supports 2 GRN inference algorithms:\n\n1. **GRNBoost2**: a novel and fast GRN inference algorithm using `Stochastic Gradient Boosting Machine`_ (SGBM) [3] regression with `early-stopping`_ regularization.\n2. **GENIE3**: the classic GRN inference algorithm using `Random Forest`_ (RF) or ExtraTrees_ (ET) regression.\n\nGet Started\n***********\n\nArboreto was conceived with the working bioinformatician or data scientist in mind. We provide extensive documentation and examples to help you get up to speed with the library.\n\n* Read the `arboreto documentation`_.\n* Browse example notebooks_.\n* Report an issue_.\n\nLicense\n*******\n\nBSD 3-Clause License\n\npySCENIC\n========\n\n.. _pySCENIC: https://github.com/aertslab/pySCENIC\n.. _SCENIC: https://aertslab.org/#scenic\n\nArboreto is a component in pySCENIC_: a lightning-fast python implementation of\nthe SCENIC_ pipeline [5] (Single-Cell rEgulatory Network Inference and Clustering)\nwhich enables biologists to infer transcription factors, gene regulatory networks\nand cell types from single-cell RNA-seq data.\n\nReferences\n**********\n\n1. Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P (2010) Inferring Regulatory Networks from Expression Data Using Tree-Based Methods. PLoS ONE\n2. Rocklin, M. (2015). Dask: parallel computation with blocked algorithms and task scheduling. In Proceedings of the 14th Python in Science Conference (pp. 130-136).\n3. Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367-378.\n4. Marbach, D., Costello, J. C., Kuffner, R., Vega, N. M., Prill, R. J., Camacho, D. M., ... & Dream5 Consortium. (2012). Wisdom of crowds for robust gene network inference. Nature methods, 9(8), 796-804.\n5. Aibar S, Bravo Gonzalez-Blas C, Moerman T, Wouters J, Huynh-Thu VA, Imrichova H, Kalender Atak Z, Hulselmans G, Dewaele M, Rambow F, Geurts P, Aerts J, Marine C, van den Oord J, Aerts S. SCENIC: Single-cell regulatory network inference and clustering. Nature Methods 14, 1083\u20131086 (2017). doi: 10.1038/nmeth.4463\n\n\n",
    "bugtrack_url": null,
    "license": "BSD 3-Clause License",
    "summary": "Scalable gene regulatory network inference using tree-based ensemble regressors",
    "version": "0.1.6",
    "split_keywords": [
        "gene",
        "regulatory",
        "network",
        "inference",
        "regression",
        "ensemble",
        "scalable",
        "dask"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "eb58a93dc468b7145743a477ff5bc9a6",
                "sha256": "6c70074b9d7273efaed0f89dd508c886b83c22ef81ae07ca923b7d21e7bbd057"
            },
            "downloads": -1,
            "filename": "arboreto-0.1.6-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "eb58a93dc468b7145743a477ff5bc9a6",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 15504,
            "upload_time": "2021-02-09T11:20:19",
            "upload_time_iso_8601": "2021-02-09T11:20:19.838572Z",
            "url": "https://files.pythonhosted.org/packages/91/26/8c4a9191c2d31c4f30aecd4382bcc209b67629b827955fb164ce03c09e08/arboreto-0.1.6-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "2dc1577ddbb8cf6fc5416b9fa6d4eca6",
                "sha256": "32fdac5e8a3e0ef2e196b5827f067d815ac4e689d2fca0dc437f42abdeeb89ab"
            },
            "downloads": -1,
            "filename": "arboreto-0.1.6.tar.gz",
            "has_sig": false,
            "md5_digest": "2dc1577ddbb8cf6fc5416b9fa6d4eca6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 14625,
            "upload_time": "2021-02-09T11:20:21",
            "upload_time_iso_8601": "2021-02-09T11:20:21.423935Z",
            "url": "https://files.pythonhosted.org/packages/d8/b2/1942195d3848abf64b8115e219c4a530b05798f7332938dfd0e80b93c464/arboreto-0.1.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-02-09 11:20:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "aertslab",
    "github_project": "arboreto",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "dask",
            "specs": []
        },
        {
            "name": "distributed",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.16.5"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "scikit-learn",
            "specs": []
        },
        {
            "name": "scipy",
            "specs": []
        }
    ],
    "lcname": "arboreto"
}
        
Elapsed time: 0.01457s