pyscenic


Namepyscenic JSON
Version 0.12.1 PyPI version JSON
download
home_pagehttps://github.com/aertslab/pySCENIC
SummaryPython implementation of the SCENIC pipeline for transcription factor inference from single-cell transcriptomics experiments.
upload_time2022-11-21 13:08:21
maintainer
docs_urlNone
authorBram Van de Sande
requires_python>=3.6
licenseGPL-3.0+
keywords single-cell transcriptomics gene-regulatory-network transcription-factors
VCS
bugtrack_url
requirements ctxcore cytoolz multiprocessing_on_dill llvmlite numba attrs frozendict numpy pandas numexpr cloudpickle dask distributed arboreto boltons setuptools pyyaml tqdm interlap umap-learn loompy networkx scipy fsspec requests aiohttp scikit-learn
Travis-CI
coveralls test coverage No coveralls.
            pySCENIC
========

|buildstatus|_ |pypipackage|_ |docstatus|_


pySCENIC is a lightning-fast python implementation of the SCENIC_ pipeline (Single-Cell rEgulatory Network Inference and
Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from
single-cell RNA-seq data.

The pioneering work was done in R and results were published in Nature Methods [1]_.
A new and comprehensive description of this Python implementation of the SCENIC pipeline is available in Nature Protocols [4]_.

pySCENIC can be run on a single desktop machine but easily scales to multi-core clusters to analyze thousands of cells
in no time. The latter is achieved via the dask_ framework for distributed computing [2]_.

**Full documentation** for pySCENIC is available on `Read the Docs <https://pyscenic.readthedocs.io/en/latest/>`_

----

pySCENIC is part of the SCENIC Suite of tools! 
See the main `SCENIC website <https://scenic.aertslab.org/>`_ for additional information and a full list of tools available.

----


News and releases
-----------------

0.12.0 | 2022-08-16
^^^^^^^^^^^^^^^^^^^

* Only databases in Feather v2 format are supported now (`ctxcore <https://github.com/aertslab/ctxcore>`_ ``>= 0.2``),
  which allow uses recent versions of pyarrow (``>=8.0.0``) instead of very old ones (``<0.17``).
  Databases in the new format can be downloaded from https://resources.aertslab.org/cistarget/databases/
  and end with ``*.genes_vs_motifs.rankings.feather`` or ``*.genes_vs_tracks.rankings.feather``.
* Support clustered motif databases.
* Use custom multiprocessing instead of dask, by default.
* Docker image uses python 3.10 and contains only needed pySCENIC dependencies for CLI usage.
* Remove unneeded scripts and notebooks for unused/deprecated database formats.

0.11.2 | 2021-05-07
^^^^^^^^^^^^^^^^^^^

* Split some core cisTarget functions out into a separate repository, `ctxcore <https://github.com/aertslab/ctxcore>`_. This is now a required package for pySCENIC.

0.11.1 | 2021-02-11
^^^^^^^^^^^^^^^^^^^

* Fix bug in motif url construction (#275)
* Fix for export2loom with sparse dataframe (#278)
* Fix sklearn t-SNE import (#285)
* Updates to Docker image (expose port 8787 for Dask dashboard)

0.11.0 | 2021-02-10
^^^^^^^^^^^^^^^^^^^

**Major features:**

* Updated arboreto_ release (GRN inference step) includes:

  * Support for sparse matrices (using the ``--sparse`` flag in ``pyscenic grn``, or passing a sparse matrix to ``grnboost2``/``genie3``).
  * Fixes to avoid dask metadata mismatch error

* Updated cisTarget:

  * Fix for metadata mismatch in ctx prune2df step
  * Support for databases Apache Parquet format
  * Faster loading from feather databases
  * Bugfix: loading genes from a database (previously missing the last gene name in the database)

* Support for Anndata input and output

* Package updates:

  * Upgrade to newer pandas version
  * Upgrade to newer numba version
  * Upgrade to newer versions of dask, distributed

* Input checks and more descriptive error messages.

  * Check that regulons loaded are not empty.

* Bugfixes:

  * In the regulons output from the cisTarget step, the gene weights were incorrectly assigned to their respective target genes (PR #254).
  * Motif url construction fixed when running ctx without pruning
  * Compression of intermediate files in the CLI steps
  * Handle loom files with non-standard gene/cell attribute names
  * Reformat the genesig gmt input/output
  * Fix AUCell output to loom with non-standard loom attributes


0.10.4 | 2020-11-24
^^^^^^^^^^^^^^^^^^^

* Included new CLI option to add correlation information to the GRN adjacencies file. This can be called with ``pyscenic add_cor``.



See also the extended `Release Notes <https://pyscenic.readthedocs.io/en/latest/releasenotes.html>`_.

Overview
--------

The pipeline has three steps:

1. First transcription factors (TFs) and their target genes, together defining a regulon, are derived using gene inference methods which solely rely on correlations between expression of genes across cells. The arboreto_ package is used for this step.
2. These regulons are refined by pruning targets that do not have an enrichment for a corresponding motif of the TF effectively separating direct from indirect targets based on the presence of cis-regulatory footprints.
3. Finally, the original cells are differentiated and clustered on the activity of these discovered regulons.

The most impactful speed improvement is introduced by the arboreto_ package in step 1. This package provides an alternative to GENIE3 [3]_ called GRNBoost2. This package can be controlled from within pySCENIC.


All the functionality of the original R implementation is available and in addition:

1. You can leverage multi-core and multi-node clusters using dask_ and its distributed_ scheduler.
2. We implemented a version of the recovery of input genes that takes into account weights associated with these genes.
3. Regulons, i.e. the regulatory network that connects a TF with its target genes, with targets that are repressed are now also derived and used for cell enrichment analysis.


Additional resources
--------------------

For more information, please visit LCB_, 
the main `SCENIC website <https://scenic.aertslab.org/>`_,
or `SCENIC (R version) <https://github.com/aertslab/SCENIC>`_.
There is a tutorial to `create new cisTarget databases <https://github.com/aertslab/create_cisTarget_databases>`_.
The CLI to pySCENIC has also been streamlined into a pipeline that can be run with a single command, using the Nextflow workflow manager.
There are two Nextflow implementations available:

* `SCENICprotocol`_: A Nextflow DSL1 implementation of pySCENIC alongside a basic "best practices" expression analysis. Includes details on pySCENIC installation, usage, and downstream analysis, along with detailed tutorials.
* `VSNPipelines`_: A Nextflow DSL2 implementation of pySCENIC with a comprehensive and customizable pipeline for expression analysis. Includes additional pySCENIC features (multi-runs, integrated motif- and track-based regulon pruning, loom file generation).


Acknowledgments
---------------

We are grateful to all providers of TF-annotated position weight matrices, in particular Martha Bulyk (UNIPROBE), Wyeth Wasserman and Albin Sandelin (JASPAR), BioBase (TRANSFAC), Scot Wolfe and Michael Brodsky (FlyFactorSurvey) and Timothy Hughes (cisBP).


References
----------

.. [1] Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat Meth 14, 1083–1086 (2017). `doi:10.1038/nmeth.4463 <https://doi.org/10.1038/nmeth.4463>`_
.. [2] Rocklin, M. Dask: parallel computation with blocked algorithms and task scheduling. conference.scipy.org
.. [3] Huynh-Thu, V. A. et al. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5, (2010). `doi:10.1371/journal.pone.0012776 <https://doi.org/10.1371/journal.pone.0012776>`_
.. [4] Van de Sande B., Flerin C., et al. A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat Protoc. June 2020:1-30. `doi:10.1038/s41596-020-0336-2 <https://doi.org/10.1038/s41596-020-0336-2>`_

.. |buildstatus| image:: https://travis-ci.org/aertslab/pySCENIC.svg?branch=master
.. _buildstatus: https://travis-ci.org/aertslab/pySCENIC

.. |pypipackage| image:: https://img.shields.io/pypi/v/pySCENIC?color=%23026aab
.. _pypipackage: https://pypi.org/project/pyscenic/

.. |docstatus| image:: https://readthedocs.org/projects/pyscenic/badge/?version=latest
.. _docstatus: http://pyscenic.readthedocs.io/en/latest/?badge=latest

.. _SCENIC: http://scenic.aertslab.org
.. _dask: https://dask.pydata.org/en/latest/
.. _distributed: https://distributed.readthedocs.io/en/latest/
.. _arboreto: https://arboreto.readthedocs.io
.. _LCB: https://aertslab.org
.. _`SCENICprotocol`: https://github.com/aertslab/SCENICprotocol
.. _`VSNPipelines`: https://github.com/vib-singlecell-nf/vsn-pipelines
.. _notebooks: https://github.com/aertslab/pySCENIC/tree/master/notebooks
.. _issue: https://github.com/aertslab/pySCENIC/issues/new
.. _PyPI: https://pypi.python.org/pypi/pyscenic




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/aertslab/pySCENIC",
    "name": "pyscenic",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "single-cell transcriptomics gene-regulatory-network transcription-factors",
    "author": "Bram Van de Sande",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/d2/ea/109aa69d72b54ab78eb353ddc8b6ff7e78208b5d85b7d77f5d146b78681d/pyscenic-0.12.1.tar.gz",
    "platform": null,
    "description": "pySCENIC\n========\n\n|buildstatus|_ |pypipackage|_ |docstatus|_\n\n\npySCENIC is a lightning-fast python implementation of the SCENIC_ pipeline (Single-Cell rEgulatory Network Inference and\nClustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from\nsingle-cell RNA-seq data.\n\nThe pioneering work was done in R and results were published in Nature Methods [1]_.\nA new and comprehensive description of this Python implementation of the SCENIC pipeline is available in Nature Protocols [4]_.\n\npySCENIC can be run on a single desktop machine but easily scales to multi-core clusters to analyze thousands of cells\nin no time. The latter is achieved via the dask_ framework for distributed computing [2]_.\n\n**Full documentation** for pySCENIC is available on `Read the Docs <https://pyscenic.readthedocs.io/en/latest/>`_\n\n----\n\npySCENIC is part of the SCENIC Suite of tools! \nSee the main `SCENIC website <https://scenic.aertslab.org/>`_ for additional information and a full list of tools available.\n\n----\n\n\nNews and releases\n-----------------\n\n0.12.0 | 2022-08-16\n^^^^^^^^^^^^^^^^^^^\n\n* Only databases in Feather v2 format are supported now (`ctxcore <https://github.com/aertslab/ctxcore>`_ ``>= 0.2``),\n  which allow uses recent versions of pyarrow (``>=8.0.0``) instead of very old ones (``<0.17``).\n  Databases in the new format can be downloaded from https://resources.aertslab.org/cistarget/databases/\n  and end with ``*.genes_vs_motifs.rankings.feather`` or ``*.genes_vs_tracks.rankings.feather``.\n* Support clustered motif databases.\n* Use custom multiprocessing instead of dask, by default.\n* Docker image uses python 3.10 and contains only needed pySCENIC dependencies for CLI usage.\n* Remove unneeded scripts and notebooks for unused/deprecated database formats.\n\n0.11.2 | 2021-05-07\n^^^^^^^^^^^^^^^^^^^\n\n* Split some core cisTarget functions out into a separate repository, `ctxcore <https://github.com/aertslab/ctxcore>`_. This is now a required package for pySCENIC.\n\n0.11.1 | 2021-02-11\n^^^^^^^^^^^^^^^^^^^\n\n* Fix bug in motif url construction (#275)\n* Fix for export2loom with sparse dataframe (#278)\n* Fix sklearn t-SNE import (#285)\n* Updates to Docker image (expose port 8787 for Dask dashboard)\n\n0.11.0 | 2021-02-10\n^^^^^^^^^^^^^^^^^^^\n\n**Major features:**\n\n* Updated arboreto_ release (GRN inference step) includes:\n\n  * Support for sparse matrices (using the ``--sparse`` flag in ``pyscenic grn``, or passing a sparse matrix to ``grnboost2``/``genie3``).\n  * Fixes to avoid dask metadata mismatch error\n\n* Updated cisTarget:\n\n  * Fix for metadata mismatch in ctx prune2df step\n  * Support for databases Apache Parquet format\n  * Faster loading from feather databases\n  * Bugfix: loading genes from a database (previously missing the last gene name in the database)\n\n* Support for Anndata input and output\n\n* Package updates:\n\n  * Upgrade to newer pandas version\n  * Upgrade to newer numba version\n  * Upgrade to newer versions of dask, distributed\n\n* Input checks and more descriptive error messages.\n\n  * Check that regulons loaded are not empty.\n\n* Bugfixes:\n\n  * In the regulons output from the cisTarget step, the gene weights were incorrectly assigned to their respective target genes (PR #254).\n  * Motif url construction fixed when running ctx without pruning\n  * Compression of intermediate files in the CLI steps\n  * Handle loom files with non-standard gene/cell attribute names\n  * Reformat the genesig gmt input/output\n  * Fix AUCell output to loom with non-standard loom attributes\n\n\n0.10.4 | 2020-11-24\n^^^^^^^^^^^^^^^^^^^\n\n* Included new CLI option to add correlation information to the GRN adjacencies file. This can be called with ``pyscenic add_cor``.\n\n\n\nSee also the extended `Release Notes <https://pyscenic.readthedocs.io/en/latest/releasenotes.html>`_.\n\nOverview\n--------\n\nThe pipeline has three steps:\n\n1. First transcription factors (TFs) and their target genes, together defining a regulon, are derived using gene inference methods which solely rely on correlations between expression of genes across cells. The arboreto_ package is used for this step.\n2. These regulons are refined by pruning targets that do not have an enrichment for a corresponding motif of the TF effectively separating direct from indirect targets based on the presence of cis-regulatory footprints.\n3. Finally, the original cells are differentiated and clustered on the activity of these discovered regulons.\n\nThe most impactful speed improvement is introduced by the arboreto_ package in step 1. This package provides an alternative to GENIE3 [3]_ called GRNBoost2. This package can be controlled from within pySCENIC.\n\n\nAll the functionality of the original R implementation is available and in addition:\n\n1. You can leverage multi-core and multi-node clusters using dask_ and its distributed_ scheduler.\n2. We implemented a version of the recovery of input genes that takes into account weights associated with these genes.\n3. Regulons, i.e. the regulatory network that connects a TF with its target genes, with targets that are repressed are now also derived and used for cell enrichment analysis.\n\n\nAdditional resources\n--------------------\n\nFor more information, please visit LCB_, \nthe main `SCENIC website <https://scenic.aertslab.org/>`_,\nor `SCENIC (R version) <https://github.com/aertslab/SCENIC>`_.\nThere is a tutorial to `create new cisTarget databases <https://github.com/aertslab/create_cisTarget_databases>`_.\nThe CLI to pySCENIC has also been streamlined into a pipeline that can be run with a single command, using the Nextflow workflow manager.\nThere are two Nextflow implementations available:\n\n* `SCENICprotocol`_: A Nextflow DSL1 implementation of pySCENIC alongside a basic \"best practices\" expression analysis. Includes details on pySCENIC installation, usage, and downstream analysis, along with detailed tutorials.\n* `VSNPipelines`_: A Nextflow DSL2 implementation of pySCENIC with a comprehensive and customizable pipeline for expression analysis. Includes additional pySCENIC features (multi-runs, integrated motif- and track-based regulon pruning, loom file generation).\n\n\nAcknowledgments\n---------------\n\nWe are grateful to all providers of TF-annotated position weight matrices, in particular Martha Bulyk (UNIPROBE), Wyeth Wasserman and Albin Sandelin (JASPAR), BioBase (TRANSFAC), Scot Wolfe and Michael Brodsky (FlyFactorSurvey) and Timothy Hughes (cisBP).\n\n\nReferences\n----------\n\n.. [1] Aibar, S. et al. SCENIC: single-cell regulatory network inference and clustering. Nat Meth 14, 1083\u20131086 (2017). `doi:10.1038/nmeth.4463 <https://doi.org/10.1038/nmeth.4463>`_\n.. [2] Rocklin, M. Dask: parallel computation with blocked algorithms and task scheduling. conference.scipy.org\n.. [3] Huynh-Thu, V. A. et al. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5, (2010). `doi:10.1371/journal.pone.0012776 <https://doi.org/10.1371/journal.pone.0012776>`_\n.. [4] Van de Sande B., Flerin C., et al. A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat Protoc. June 2020:1-30. `doi:10.1038/s41596-020-0336-2 <https://doi.org/10.1038/s41596-020-0336-2>`_\n\n.. |buildstatus| image:: https://travis-ci.org/aertslab/pySCENIC.svg?branch=master\n.. _buildstatus: https://travis-ci.org/aertslab/pySCENIC\n\n.. |pypipackage| image:: https://img.shields.io/pypi/v/pySCENIC?color=%23026aab\n.. _pypipackage: https://pypi.org/project/pyscenic/\n\n.. |docstatus| image:: https://readthedocs.org/projects/pyscenic/badge/?version=latest\n.. _docstatus: http://pyscenic.readthedocs.io/en/latest/?badge=latest\n\n.. _SCENIC: http://scenic.aertslab.org\n.. _dask: https://dask.pydata.org/en/latest/\n.. _distributed: https://distributed.readthedocs.io/en/latest/\n.. _arboreto: https://arboreto.readthedocs.io\n.. _LCB: https://aertslab.org\n.. _`SCENICprotocol`: https://github.com/aertslab/SCENICprotocol\n.. _`VSNPipelines`: https://github.com/vib-singlecell-nf/vsn-pipelines\n.. _notebooks: https://github.com/aertslab/pySCENIC/tree/master/notebooks\n.. _issue: https://github.com/aertslab/pySCENIC/issues/new\n.. _PyPI: https://pypi.python.org/pypi/pyscenic\n\n\n\n",
    "bugtrack_url": null,
    "license": "GPL-3.0+",
    "summary": "Python implementation of the SCENIC pipeline for transcription factor inference from single-cell transcriptomics experiments.",
    "version": "0.12.1",
    "split_keywords": [
        "single-cell",
        "transcriptomics",
        "gene-regulatory-network",
        "transcription-factors"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "ced17e15ac05fd9991a55099efd5e7ba",
                "sha256": "a250d682e073e67dc80505843764d9cade68dada45a40a622e1aefbae78756e9"
            },
            "downloads": -1,
            "filename": "pyscenic-0.12.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ced17e15ac05fd9991a55099efd5e7ba",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 7099773,
            "upload_time": "2022-11-21T13:08:18",
            "upload_time_iso_8601": "2022-11-21T13:08:18.015372Z",
            "url": "https://files.pythonhosted.org/packages/5c/e3/6a0eaf46a897da829c896f0a034fce82133fce72f95d314bea81287c4279/pyscenic-0.12.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "ccf2decf031071b8fcdb3d285f784522",
                "sha256": "ae8fafa707d2578ffe08f9eed85f14a4cd9e1b53d57217420e2e956f0a8ddba2"
            },
            "downloads": -1,
            "filename": "pyscenic-0.12.1.tar.gz",
            "has_sig": false,
            "md5_digest": "ccf2decf031071b8fcdb3d285f784522",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 7028054,
            "upload_time": "2022-11-21T13:08:21",
            "upload_time_iso_8601": "2022-11-21T13:08:21.003907Z",
            "url": "https://files.pythonhosted.org/packages/d2/ea/109aa69d72b54ab78eb353ddc8b6ff7e78208b5d85b7d77f5d146b78681d/pyscenic-0.12.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-11-21 13:08:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "aertslab",
    "github_project": "pySCENIC",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "ctxcore",
            "specs": [
                [
                    ">=",
                    "0.2.0"
                ]
            ]
        },
        {
            "name": "cytoolz",
            "specs": []
        },
        {
            "name": "multiprocessing_on_dill",
            "specs": []
        },
        {
            "name": "llvmlite",
            "specs": []
        },
        {
            "name": "numba",
            "specs": [
                [
                    ">=",
                    "0.51.2"
                ]
            ]
        },
        {
            "name": "attrs",
            "specs": []
        },
        {
            "name": "frozendict",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.3.5"
                ]
            ]
        },
        {
            "name": "numexpr",
            "specs": []
        },
        {
            "name": "cloudpickle",
            "specs": []
        },
        {
            "name": "dask",
            "specs": []
        },
        {
            "name": "distributed",
            "specs": []
        },
        {
            "name": "arboreto",
            "specs": [
                [
                    ">=",
                    "0.1.6"
                ]
            ]
        },
        {
            "name": "boltons",
            "specs": []
        },
        {
            "name": "setuptools",
            "specs": []
        },
        {
            "name": "pyyaml",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        },
        {
            "name": "interlap",
            "specs": []
        },
        {
            "name": "umap-learn",
            "specs": []
        },
        {
            "name": "loompy",
            "specs": []
        },
        {
            "name": "networkx",
            "specs": []
        },
        {
            "name": "scipy",
            "specs": []
        },
        {
            "name": "fsspec",
            "specs": []
        },
        {
            "name": "requests",
            "specs": []
        },
        {
            "name": "aiohttp",
            "specs": []
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "0.22.2"
                ]
            ]
        }
    ],
    "tox": true,
    "lcname": "pyscenic"
}
        
Elapsed time: 0.03568s