netzoopy-sponge


Namenetzoopy-sponge JSON
Version 2.0.6 PyPI version JSON
download
home_pageNone
SummaryA package to generate prior gene regulatory networks.
upload_time2025-07-16 14:41:05
maintainerNone
docs_urlNone
authorNone
requires_pythonNone
licenseNone
keywords transcription-factors gene-regulatory-network
VCS
bugtrack_url
requirements bioframe matplotlib numpy pandas pybbi pyjaspar scikit-learn tqdm
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![master](https://github.com/ladislav-hovan/sponge/actions/workflows/test.yaml/badge.svg?branch=main)](https://github.com/ladislav-hovan/sponge/actions/workflows/test.yaml)
[![devel](https://github.com/ladislav-hovan/sponge/actions/workflows/test.yaml/badge.svg?branch=devel)](https://github.com/ladislav-hovan/sponge/actions/workflows/test.yaml)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)


# SPONGE - Simple Prior Omics Network GEnerator
The SPONGE package generates human prior gene regulatory networks and
protein-protein interaction networks for the involved transcription
factors.


## Table of Contents
- [SPONGE - Simple Prior Omics Network GEnerator](#sponge---simple-prior-omics-network-generator)
  - [Table of Contents](#table-of-contents)
  - [General Information](#general-information)
  - [Features](#features)
  - [Setup](#setup)
  - [Usage](#usage)
    - [File formats](#file-formats)
    - [Container](#container)
  - [Project Status](#project-status)
  - [Room for Improvement](#room-for-improvement)
  - [Acknowledgements](#acknowledgements)
  - [Contact](#contact)
  - [License](#license)


## General Information
This repository contains the SPONGE package, which allows the generation
of human prior gene regulatory networks based mainly on the data from
the JASPAR database.
It also uses NCBI to find the human analogs of vertebrate transcription
factors, UniProt for symbol matching, and STRING to retrieve
protein-protein interactions between transcription factors.
By default, Ensembl is used to collect all the promoter regions in
the human genome as the regions of interest, but different regions can
be provided by the user.
Because SPONGE accesses these databases on the fly, it requires internet
access.

Prior gene regulatory networks are useful mainly as an input for tools
that incorporate additional sources of information to refine them.
The prior networks generated by SPONGE are designed to be compatible
with PANDA and related [NetZoo](https://github.com/netZoo/netZooPy)
tools.

The purpose of this project is to give the ability to generate prior
gene regulatory networks to people who do not have the knowledge or
inclination to do the genome-wide motif search, but would still like
to change some parameters that were used to generate publicly available
prior gene regulatory networks.
It is also designed to facilitate the inclusion of new information from
database updates into the prior networks.

If you just want to use the prior networks generated by the stable
version of SPONGE with the default settings, they are available on
[Zenodo](https://zenodo.org/records/13628784).

This repository only contains the SPONGE package.
The code used to create figures in the SPONGE manuscript can be found
[here](https://github.com/ladislav-hovan/sponge_manuscript).


## Features
The features already available are:
- Generation of prior gene regulatory network
- Generation of prior protein-protein interaction network for
  transcription factors
- Automatic download of required files during setup
- Parallelised motif filtering
- Command line interface


## Setup
The requirements are provided in a `requirements.txt` file.

SPONGE can be installed via pip:

``` bash
pip install netzoopy-sponge
```

Alternatively, it can be installed by downloading this repository and
then installing with pip (possibly in interactive mode):

``` bash
git clone https://github.com/ladislav-hovan/sponge.git
cd sponge
pip install -e .
```


## Usage
SPONGE comes with a `netzoopy-sponge` command line script:

``` bash
# Get information about the available options
netzoopy-sponge --help
# Run the pipeline
netzoopy-sponge
```

SPONGE has a lot of options, which can be seen by generating an example
config file:

``` bash
# Create an example config file in the current directory
netzoopy-sponge -e
```

The defaults are designed to be sensible and the users do not have to
change any of them unless desired.

Within Python, the default workflow can be invoked as follows:

``` python
# Import the class definition
from sponge.sponge import Sponge
# Run the default workflow
# Will create a temporary folder in the current directory
sponge_obj = Sponge()
```

Much like the command line script, the Sponge class accepts a lot of
options for the configuration, which can be specified through a path
to a config file or a dictionary with the options.
For more information, you can run `help(Sponge)` after the import.

In case one needs more control over the individual steps, the workflow
in Python would be as follows:

``` python
# Import the class definition
from sponge.sponge import Sponge
# Create the SPONGE object
# The default workflow option can also just be specified in the config
sponge_obj = Sponge(
  config=path_to_config_file,
  config_update={'default_workflow': False},
)
# Select the appropriate transcription factors from JASPAR
sponge_obj.select_motifs()
# Filter the TF binding sites of the JASPAR bigbed file to the ones
# in the defined regions of interest
sponge_obj.filter_tfbs()
# Retrieve the protein-protein interactions between the transcription
# factors from the STRING database
sponge_obj.retrieve_ppi()
# Write the motif and PPI priors to their respective files
sponge_obj.write_output_files()
```

At each step, there is an option to tweak the settings provided in the
initial configuration, either through keyword arguments or using the
`user_config_update` argument.
We would urge caution when using this setting though, as this can make
the settings inconsistent between different steps.
The final set of settings used will be saved in the temporary directory
after the SPONGE object is deleted.
SPONGE will attempt to download the files it needs into a temporary
directory (`.sponge_temp` by default).
Paths can be provided if these files were downloaded in advance.
The JASPAR bigbed file required for filtering is huge (> 100 GB), so
the download might take some time.
Make sure you're running SPONGE somewhere that has enough space!

As an alternative to the bigbed file download, SPONGE can download
tracks for individual TFs on the fly and filter them individually.
This way of processing is slower than the bigbed file when all TFs in
the database are considered, but it becomes competitive when only
a subset is used.
The physical storage footprint is much reduced.
This option is enabled with `on_the_fly_processing: True` in the
configuration file.

For filtering, the default setting of `n_processes` is set to 1, but
we highly recommend increasing it if your machine is capable of it.
During our testing, the entire default workflow could be done in just
over 10 minutes with 16 processes (this excludes the time taken to
download the required files).


### File formats
Users are free to provide their own files for the list of regions of
interest and their mapping to transcripts and genes
(`region: region_file`, default `regions.tsv`) and the list of predicted
TF binding sites (`motif: tfbs_file`, default `tfbs.bb`).
By default, if the paths are not provided or set to None, SPONGE
attempts to locate these files in the temporary folder under the default
names.
If it fails to do so, it will proceed to download them.

List of regions of interest expects a seven column tsv file with
a defined header, as an example:

```
Chromosome      Start   End     Transcript stable ID    Gene stable ID  Gene name       Gene type
chr1    10676   11676   ENST00000832828 ENSG00000290825 DDX11L16        lncRNA
chr1    11260   12260   ENST00000450305 ENSG00000223972 DDX11L1 transcribed_unprocessed_pseudogene
chr1    17186   18186   ENST00000619216 ENSG00000278267 MIR6859-1       miRNA
chr1    24636   25636   ENST00000488147 ENSG00000227232 WASH7P  transcribed_unprocessed_pseudogene
chr1    27839   28839   ENST00000834619 ENSG00000243485 MIR1302-2HG     lncRNA
chr1    28804   29804   ENST00000473358 ENSG00000243485 MIR1302-2HG     lncRNA
```

The predicted TF binding sites are expected in a binary bigbed file,
with the following format when decoded:

```
chrom   start      end      name  score strand TFName
chr1   10000    10006  MA0467.3    276      -    Crx
chr1   10000    10006  MA0648.2    233      +    GSC
chr1   10000    10006  MA0682.3    231      +  PITX1
chr1   10000    10006  MA0711.2    198      +   OTX1
chr1   10000    10006  MA0714.2    246      +  PITX3
```

Effectively, it is an extended bed format with a header, which uses
the `name` column to provide JASPAR matrix ID and the `TFName` column
to provide the actual name of the transcription factor.
However, currently SPONGE expects a bigbed file and will not work with
a bed file.


### Container
SPONGE releases are also provided as Docker containers.
The most basic way of running would involve mounting a directory to the
`/app` directory on the container, where SPONGE will be run:

``` bash
docker run --mount type=bind,source="$(pwd)"/sponge_run,target=/app ghcr.io/kuijjerlab/netzoopy_sponge:latest --help
```

The arguments match those of the `netzoopy-sponge` command line script.
In particular, it could be useful to generate an example input file
first using the `--example` option, then editing the configuration file
as appropriate.
Without mounting a directory, it is impossible to both provide an input
file and retrieve the generated prior networks, unless of course the
container is run interactively:

``` bash
docker run -it --entrypoint bash ghcr.io/kuijjerlab/netzoopy_sponge:latest
```

In HPC environments, something like the `apptainer shell` command would
work.

Because of the libraries used for bigbed format support, SPONGE is not
currently supported on Windows. 
Therefore, this container is probably the best way to run it there,
and the command equivalent to the above in the command prompt would look
like this:

``` bash
docker.exe run --mount type=bind,source="%cd%"/sponge_run,target=/app ghcr.io/kuijjerlab/netzoopy_sponge:latest --help
```


## Project Status
The project is: _in progress_.


## Room for Improvement
Room for improvement:
- Better tests
- Try incorporating unipressed
- Improve overlap computations

To do:
- Support for more species


## Acknowledgements
Many thanks to the members of the
[Kuijjer group](https://www.kuijjerlab.org/)
at NCMBM/UH for their feedback and support.

This README is based on a template made by
[@flynerdpl](https://www.flynerd.pl/).


## Contact
Created by Ladislav Hovan (ladislav.hovan@ncmbm.uio.no).
Feel free to contact me!


## License
This project is open source and available under the
[GNU General Public License v3](LICENSE).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "netzoopy-sponge",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "transcription-factors, gene-regulatory-network",
    "author": null,
    "author_email": "Ladislav Hovan <ladislav.hovan@ncmbm.uio.no>",
    "download_url": "https://files.pythonhosted.org/packages/a0/aa/1c306e4905faf5099161cf22e9367751b7a4b0361d3700446e0ac73650db/netzoopy_sponge-2.0.6.tar.gz",
    "platform": null,
    "description": "[![master](https://github.com/ladislav-hovan/sponge/actions/workflows/test.yaml/badge.svg?branch=main)](https://github.com/ladislav-hovan/sponge/actions/workflows/test.yaml)\n[![devel](https://github.com/ladislav-hovan/sponge/actions/workflows/test.yaml/badge.svg?branch=devel)](https://github.com/ladislav-hovan/sponge/actions/workflows/test.yaml)\n[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)\n\n\n# SPONGE - Simple Prior Omics Network GEnerator\nThe SPONGE package generates human prior gene regulatory networks and\nprotein-protein interaction networks for the involved transcription\nfactors.\n\n\n## Table of Contents\n- [SPONGE - Simple Prior Omics Network GEnerator](#sponge---simple-prior-omics-network-generator)\n  - [Table of Contents](#table-of-contents)\n  - [General Information](#general-information)\n  - [Features](#features)\n  - [Setup](#setup)\n  - [Usage](#usage)\n    - [File formats](#file-formats)\n    - [Container](#container)\n  - [Project Status](#project-status)\n  - [Room for Improvement](#room-for-improvement)\n  - [Acknowledgements](#acknowledgements)\n  - [Contact](#contact)\n  - [License](#license)\n\n\n## General Information\nThis repository contains the SPONGE package, which allows the generation\nof human prior gene regulatory networks based mainly on the data from\nthe JASPAR database.\nIt also uses NCBI to find the human analogs of vertebrate transcription\nfactors, UniProt for symbol matching, and STRING to retrieve\nprotein-protein interactions between transcription factors.\nBy default, Ensembl is used to collect all the promoter regions in\nthe human genome as the regions of interest, but different regions can\nbe provided by the user.\nBecause SPONGE accesses these databases on the fly, it requires internet\naccess.\n\nPrior gene regulatory networks are useful mainly as an input for tools\nthat incorporate additional sources of information to refine them.\nThe prior networks generated by SPONGE are designed to be compatible\nwith PANDA and related [NetZoo](https://github.com/netZoo/netZooPy)\ntools.\n\nThe purpose of this project is to give the ability to generate prior\ngene regulatory networks to people who do not have the knowledge or\ninclination to do the genome-wide motif search, but would still like\nto change some parameters that were used to generate publicly available\nprior gene regulatory networks.\nIt is also designed to facilitate the inclusion of new information from\ndatabase updates into the prior networks.\n\nIf you just want to use the prior networks generated by the stable\nversion of SPONGE with the default settings, they are available on\n[Zenodo](https://zenodo.org/records/13628784).\n\nThis repository only contains the SPONGE package.\nThe code used to create figures in the SPONGE manuscript can be found\n[here](https://github.com/ladislav-hovan/sponge_manuscript).\n\n\n## Features\nThe features already available are:\n- Generation of prior gene regulatory network\n- Generation of prior protein-protein interaction network for\n  transcription factors\n- Automatic download of required files during setup\n- Parallelised motif filtering\n- Command line interface\n\n\n## Setup\nThe requirements are provided in a `requirements.txt` file.\n\nSPONGE can be installed via pip:\n\n``` bash\npip install netzoopy-sponge\n```\n\nAlternatively, it can be installed by downloading this repository and\nthen installing with pip (possibly in interactive mode):\n\n``` bash\ngit clone https://github.com/ladislav-hovan/sponge.git\ncd sponge\npip install -e .\n```\n\n\n## Usage\nSPONGE comes with a `netzoopy-sponge` command line script:\n\n``` bash\n# Get information about the available options\nnetzoopy-sponge --help\n# Run the pipeline\nnetzoopy-sponge\n```\n\nSPONGE has a lot of options, which can be seen by generating an example\nconfig file:\n\n``` bash\n# Create an example config file in the current directory\nnetzoopy-sponge -e\n```\n\nThe defaults are designed to be sensible and the users do not have to\nchange any of them unless desired.\n\nWithin Python, the default workflow can be invoked as follows:\n\n``` python\n# Import the class definition\nfrom sponge.sponge import Sponge\n# Run the default workflow\n# Will create a temporary folder in the current directory\nsponge_obj = Sponge()\n```\n\nMuch like the command line script, the Sponge class accepts a lot of\noptions for the configuration, which can be specified through a path\nto a config file or a dictionary with the options.\nFor more information, you can run `help(Sponge)` after the import.\n\nIn case one needs more control over the individual steps, the workflow\nin Python would be as follows:\n\n``` python\n# Import the class definition\nfrom sponge.sponge import Sponge\n# Create the SPONGE object\n# The default workflow option can also just be specified in the config\nsponge_obj = Sponge(\n  config=path_to_config_file,\n  config_update={'default_workflow': False},\n)\n# Select the appropriate transcription factors from JASPAR\nsponge_obj.select_motifs()\n# Filter the TF binding sites of the JASPAR bigbed file to the ones\n# in the defined regions of interest\nsponge_obj.filter_tfbs()\n# Retrieve the protein-protein interactions between the transcription\n# factors from the STRING database\nsponge_obj.retrieve_ppi()\n# Write the motif and PPI priors to their respective files\nsponge_obj.write_output_files()\n```\n\nAt each step, there is an option to tweak the settings provided in the\ninitial configuration, either through keyword arguments or using the\n`user_config_update` argument.\nWe would urge caution when using this setting though, as this can make\nthe settings inconsistent between different steps.\nThe final set of settings used will be saved in the temporary directory\nafter the SPONGE object is deleted.\nSPONGE will attempt to download the files it needs into a temporary\ndirectory (`.sponge_temp` by default).\nPaths can be provided if these files were downloaded in advance.\nThe JASPAR bigbed file required for filtering is huge (> 100 GB), so\nthe download might take some time.\nMake sure you're running SPONGE somewhere that has enough space!\n\nAs an alternative to the bigbed file download, SPONGE can download\ntracks for individual TFs on the fly and filter them individually.\nThis way of processing is slower than the bigbed file when all TFs in\nthe database are considered, but it becomes competitive when only\na subset is used.\nThe physical storage footprint is much reduced.\nThis option is enabled with `on_the_fly_processing: True` in the\nconfiguration file.\n\nFor filtering, the default setting of `n_processes` is set to 1, but\nwe highly recommend increasing it if your machine is capable of it.\nDuring our testing, the entire default workflow could be done in just\nover 10 minutes with 16 processes (this excludes the time taken to\ndownload the required files).\n\n\n### File formats\nUsers are free to provide their own files for the list of regions of\ninterest and their mapping to transcripts and genes\n(`region: region_file`, default `regions.tsv`) and the list of predicted\nTF binding sites (`motif: tfbs_file`, default `tfbs.bb`).\nBy default, if the paths are not provided or set to None, SPONGE\nattempts to locate these files in the temporary folder under the default\nnames.\nIf it fails to do so, it will proceed to download them.\n\nList of regions of interest expects a seven column tsv file with\na defined header, as an example:\n\n```\nChromosome      Start   End     Transcript stable ID    Gene stable ID  Gene name       Gene type\nchr1    10676   11676   ENST00000832828 ENSG00000290825 DDX11L16        lncRNA\nchr1    11260   12260   ENST00000450305 ENSG00000223972 DDX11L1 transcribed_unprocessed_pseudogene\nchr1    17186   18186   ENST00000619216 ENSG00000278267 MIR6859-1       miRNA\nchr1    24636   25636   ENST00000488147 ENSG00000227232 WASH7P  transcribed_unprocessed_pseudogene\nchr1    27839   28839   ENST00000834619 ENSG00000243485 MIR1302-2HG     lncRNA\nchr1    28804   29804   ENST00000473358 ENSG00000243485 MIR1302-2HG     lncRNA\n```\n\nThe predicted TF binding sites are expected in a binary bigbed file,\nwith the following format when decoded:\n\n```\nchrom   start      end      name  score strand TFName\nchr1   10000    10006  MA0467.3    276      -    Crx\nchr1   10000    10006  MA0648.2    233      +    GSC\nchr1   10000    10006  MA0682.3    231      +  PITX1\nchr1   10000    10006  MA0711.2    198      +   OTX1\nchr1   10000    10006  MA0714.2    246      +  PITX3\n```\n\nEffectively, it is an extended bed format with a header, which uses\nthe `name` column to provide JASPAR matrix ID and the `TFName` column\nto provide the actual name of the transcription factor.\nHowever, currently SPONGE expects a bigbed file and will not work with\na bed file.\n\n\n### Container\nSPONGE releases are also provided as Docker containers.\nThe most basic way of running would involve mounting a directory to the\n`/app` directory on the container, where SPONGE will be run:\n\n``` bash\ndocker run --mount type=bind,source=\"$(pwd)\"/sponge_run,target=/app ghcr.io/kuijjerlab/netzoopy_sponge:latest --help\n```\n\nThe arguments match those of the `netzoopy-sponge` command line script.\nIn particular, it could be useful to generate an example input file\nfirst using the `--example` option, then editing the configuration file\nas appropriate.\nWithout mounting a directory, it is impossible to both provide an input\nfile and retrieve the generated prior networks, unless of course the\ncontainer is run interactively:\n\n``` bash\ndocker run -it --entrypoint bash ghcr.io/kuijjerlab/netzoopy_sponge:latest\n```\n\nIn HPC environments, something like the `apptainer shell` command would\nwork.\n\nBecause of the libraries used for bigbed format support, SPONGE is not\ncurrently supported on Windows. \nTherefore, this container is probably the best way to run it there,\nand the command equivalent to the above in the command prompt would look\nlike this:\n\n``` bash\ndocker.exe run --mount type=bind,source=\"%cd%\"/sponge_run,target=/app ghcr.io/kuijjerlab/netzoopy_sponge:latest --help\n```\n\n\n## Project Status\nThe project is: _in progress_.\n\n\n## Room for Improvement\nRoom for improvement:\n- Better tests\n- Try incorporating unipressed\n- Improve overlap computations\n\nTo do:\n- Support for more species\n\n\n## Acknowledgements\nMany thanks to the members of the\n[Kuijjer group](https://www.kuijjerlab.org/)\nat NCMBM/UH for their feedback and support.\n\nThis README is based on a template made by\n[@flynerdpl](https://www.flynerd.pl/).\n\n\n## Contact\nCreated by Ladislav Hovan (ladislav.hovan@ncmbm.uio.no).\nFeel free to contact me!\n\n\n## License\nThis project is open source and available under the\n[GNU General Public License v3](LICENSE).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A package to generate prior gene regulatory networks.",
    "version": "2.0.6",
    "project_urls": {
        "Repository": "https://github.com/kuijjerlab/sponge"
    },
    "split_keywords": [
        "transcription-factors",
        " gene-regulatory-network"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a1b5f592be7afc9d562e27ca1f19f3a392a2d8cce7c1e474d3c58dfab0c2a341",
                "md5": "0db9f61060dfaa17b932ee9971c56d1f",
                "sha256": "4980395302d9b29518849fae33a1c99c25ff2d33f2f56b6de49893e62b4a59f1"
            },
            "downloads": -1,
            "filename": "netzoopy_sponge-2.0.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0db9f61060dfaa17b932ee9971c56d1f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 68187,
            "upload_time": "2025-07-16T14:41:03",
            "upload_time_iso_8601": "2025-07-16T14:41:03.815525Z",
            "url": "https://files.pythonhosted.org/packages/a1/b5/f592be7afc9d562e27ca1f19f3a392a2d8cce7c1e474d3c58dfab0c2a341/netzoopy_sponge-2.0.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "a0aa1c306e4905faf5099161cf22e9367751b7a4b0361d3700446e0ac73650db",
                "md5": "b391c43a18bb09e68b1311acf42d7991",
                "sha256": "776d3203229f96c0c5de4a331ec8c2d8729ccbc1f33547d0f21d297cbd6eb305"
            },
            "downloads": -1,
            "filename": "netzoopy_sponge-2.0.6.tar.gz",
            "has_sig": false,
            "md5_digest": "b391c43a18bb09e68b1311acf42d7991",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 178388,
            "upload_time": "2025-07-16T14:41:05",
            "upload_time_iso_8601": "2025-07-16T14:41:05.382464Z",
            "url": "https://files.pythonhosted.org/packages/a0/aa/1c306e4905faf5099161cf22e9367751b7a4b0361d3700446e0ac73650db/netzoopy_sponge-2.0.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-16 14:41:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "kuijjerlab",
    "github_project": "sponge",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "bioframe",
            "specs": []
        },
        {
            "name": "matplotlib",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "pybbi",
            "specs": []
        },
        {
            "name": "pyjaspar",
            "specs": []
        },
        {
            "name": "scikit-learn",
            "specs": []
        },
        {
            "name": "tqdm",
            "specs": []
        }
    ],
    "lcname": "netzoopy-sponge"
}
        
Elapsed time: 0.78974s