pyenrichr


Namepyenrichr JSON
Version 1.0.2 PyPI version JSON
download
home_pagehttps://github.com/maayanlab/pyenrichr
SummaryOfficial Enrichr Python package for fast local gene set enrichment.
upload_time2024-06-10 20:37:07
maintainerNone
docs_urlNone
authorAlexander Lachmann
requires_python>=3.6
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pyEnrichr - Official Enrichr Python Package

The pyEnrichr Python Fisher Exact test package supports EnrichR libraries and mimics the EnrichR backend that can be executed locally. It has high performance for large gene set libraries. Instant enrichment results for a pure Python implementation of the Fisher Exact Test. This implementation allows the calculation of the same p-values as the Enrichr API, but runs locally and results in faster p-value computation.

### Installation

Install Python library using pip.

```
pip3 install pyenrichr
```


### Enrichment Analysis

To run pyEnrichr in Python run the following command. The result will be a dataframe containing the enriched gene sets of the library as rows, sorted by p-value.

```python
import pyenrichr as pye

# list all libraries from Enrichr
libraries = pye.libraries.list_libraries()

# load a gene set library
lib = pye.libraries.get_library("GO_Biological_Process_2023")

# get example gene set
gene_set = pye.libraries.example_set()

# calculate enrichment for gene set against all gene sets in library
result = pye.enrichment.fisher(gene_set, lib)
```

`lib` is a dictionary of sets. `pye.enrichment.fisher` expects as input a set (gene_set) and a library (lib) in the form of a dictionary of sets.

### Example Output

The results are returned as Pandas DataFrames. The columns contain term, p-value, Sidak multiple hypothesis corrected p-value (sidak), False Discovery Rate (fdr), odds ratio (odds), overlap size (overlap), set-size, and gene-overlap.

| #  | Term                                                       | p-value       | sidak          | fdr           | odds      | overlap | set-size | Gene-overlap                                                                                         |
|--- |------------------------------------------------------------|---------------|----------------|---------------|-----------|---------|----------|------------------------------------------------------------------------------------------------------|
| 1  | Regulation Of Cell Population Proliferation...              | 1.041581e-41  | 5.655786e-39   | 5.655786e-39  | 8.903394  | 62      | 766      | PDGFRB,TGFB2,CSF1R,CXCL10,CD86,IL4,CTNNB1,STAT...                                                    |
| 2  | Positive Regulation Of Cell Population Proliferation...     | 2.914662e-37  | 1.582661e-34   | 7.913307e-35  | 11.159420 | 49      | 483      | PDGFRB,TGFB2,CSF1R,CD86,IL4,AKT1,EGFR,JAK2,CDK...                                                    |
| 3  | Positive Regulation Of Cell Migration (GO:0030335)          | 1.929354e-35  | 1.047639e-32   | 3.492131e-33  | 15.772059 | 39      | 272      | PDGFRB,TGFB2,CSF1R,ATM,PECAM1,TWIST1,IL4,STAT3...                                                    |
| 4  | Regulation Of Apoptotic Process (GO:0042981)                | 9.892051e-34  | 5.371384e-31   | 1.342846e-31  | 8.269504  | 53      | 705      | CASP9,CXCL10,ATM,RPS6KB1,FAS,IL4,CTNNB1,CD28,A...                                                    |
| 5  | Positive Regulation Of Intracellular Signal Transmission... | 3.297600e-33  | 1.790597e-30   | 3.581194e-31  | 9.847619  | 47      | 525      | PDGFRB,TGFB2,CD86,CHI3L1,BECN1,ENG,GAPDH,PPARG...                                                    |


### Fisher Initialization

When multiple libraries are computed some calculations can be pre initialized. This will speed up overall execution time. In the example below the 'fisher' object needs to be initialized with a parameter of at least `N`, where `N = a + b + c + d`.

```python
import pyenrichr as pye

# initialize calculations
fisher = pye.enrichment.FastFisher(34000)

# load a gene set library
lib_1 = pye.libraries.get_library("GO_Biological_Process_2023")
lib_2 = pye.libraries.get_library("KEGG_2021_Human")

# get example gene set
gene_set = pye.libraries.example_set()

# calculate enrichment for gene set against all gene sets in library 1 and 2
result_1 = pye.enrichment.fisher(gene_set, lib_1, fisher=fisher)
result_2 = pye.enrichment.fisher(gene_set, lib_2, fisher=fisher)
```

### Gene Set Filtering

Small gene sets and small overlaps can be filtered using the parameters `min_set_size` and `min_overlap`.

```python
import pyenrichr as pye

# load a gene set library
lib = pye.libraries.get_library("GO_Biological_Process_2023")

# get example gene set
gene_set = pye.libraries.example_set()

# calculate enrichment for gene set against all gene sets in library.
# Only gene sets larger than 10 genes are used and the minimum overlap has to be at least 5 to be reported.
result = pye.enrichment.fisher(gene_set, lib, min_set_size=10, min_overlap=5)
```


### Enrichment of Gene Set Library vs Gene Set Library

When computing enrichment for multiple gene sets against a gene set library pyEnrichr uses an optimized implementation of overlap detection and multithreading to increase computational speed. In the example below we compute all pairwise enrichment between gene sets in GO Biological Processes. As before it is calling the fisher function, but instead of a gene set as first parameter it receives a gene set library in dictionary format. The output is a list of results containing a result dataframe for each gene set vs gene set library. The results can be consolidated into a single p-value matrix.

```python
import pyenrichr as pye

# load a gene set library
lib = pye.libraries.get_library("GO_Biological_Process_2023")

# calculate enrichment for gene set library against all gene sets in another library.
# Only gene sets larger than 10 genes are used and the minimum overlap has to be at least 5 to be reported.
result = pye.enrichment.fisher(lib, lib, min_set_size=10, min_overlap=5)

# consolidate all p-values into a single dataframe
pmat = pye.enrichment.consolidate(result)
```



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/maayanlab/pyenrichr",
    "name": "pyenrichr",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Alexander Lachmann",
    "author_email": "alexander.lachmann@mssm.edu",
    "download_url": "https://files.pythonhosted.org/packages/14/d2/78d41f11d82d0fce14f5da51500fb1789a844f2281e7bcd1e145caedd9dc/pyenrichr-1.0.2.tar.gz",
    "platform": null,
    "description": "# pyEnrichr - Official Enrichr Python Package\n\nThe pyEnrichr Python Fisher Exact test package supports EnrichR libraries and mimics the EnrichR backend that can be executed locally. It has high performance for large gene set libraries. Instant enrichment results for a pure Python implementation of the Fisher Exact Test. This implementation allows the calculation of the same p-values as the Enrichr API, but runs locally and results in faster p-value computation.\n\n### Installation\n\nInstall Python library using pip.\n\n```\npip3 install pyenrichr\n```\n\n\n### Enrichment Analysis\n\nTo run pyEnrichr in Python run the following command. The result will be a dataframe containing the enriched gene sets of the library as rows, sorted by p-value.\n\n```python\nimport pyenrichr as pye\n\n# list all libraries from Enrichr\nlibraries = pye.libraries.list_libraries()\n\n# load a gene set library\nlib = pye.libraries.get_library(\"GO_Biological_Process_2023\")\n\n# get example gene set\ngene_set = pye.libraries.example_set()\n\n# calculate enrichment for gene set against all gene sets in library\nresult = pye.enrichment.fisher(gene_set, lib)\n```\n\n`lib` is a dictionary of sets. `pye.enrichment.fisher` expects as input a set (gene_set) and a library (lib) in the form of a dictionary of sets.\n\n### Example Output\n\nThe results are returned as Pandas DataFrames. The columns contain term, p-value, Sidak multiple hypothesis corrected p-value (sidak), False Discovery Rate (fdr), odds ratio (odds), overlap size (overlap), set-size, and gene-overlap.\n\n| #  | Term                                                       | p-value       | sidak          | fdr           | odds      | overlap | set-size | Gene-overlap                                                                                         |\n|--- |------------------------------------------------------------|---------------|----------------|---------------|-----------|---------|----------|------------------------------------------------------------------------------------------------------|\n| 1  | Regulation Of Cell Population Proliferation...              | 1.041581e-41  | 5.655786e-39   | 5.655786e-39  | 8.903394  | 62      | 766      | PDGFRB,TGFB2,CSF1R,CXCL10,CD86,IL4,CTNNB1,STAT...                                                    |\n| 2  | Positive Regulation Of Cell Population Proliferation...     | 2.914662e-37  | 1.582661e-34   | 7.913307e-35  | 11.159420 | 49      | 483      | PDGFRB,TGFB2,CSF1R,CD86,IL4,AKT1,EGFR,JAK2,CDK...                                                    |\n| 3  | Positive Regulation Of Cell Migration (GO:0030335)          | 1.929354e-35  | 1.047639e-32   | 3.492131e-33  | 15.772059 | 39      | 272      | PDGFRB,TGFB2,CSF1R,ATM,PECAM1,TWIST1,IL4,STAT3...                                                    |\n| 4  | Regulation Of Apoptotic Process (GO:0042981)                | 9.892051e-34  | 5.371384e-31   | 1.342846e-31  | 8.269504  | 53      | 705      | CASP9,CXCL10,ATM,RPS6KB1,FAS,IL4,CTNNB1,CD28,A...                                                    |\n| 5  | Positive Regulation Of Intracellular Signal Transmission... | 3.297600e-33  | 1.790597e-30   | 3.581194e-31  | 9.847619  | 47      | 525      | PDGFRB,TGFB2,CD86,CHI3L1,BECN1,ENG,GAPDH,PPARG...                                                    |\n\n\n### Fisher Initialization\n\nWhen multiple libraries are computed some calculations can be pre initialized. This will speed up overall execution time. In the example below the 'fisher' object needs to be initialized with a parameter of at least `N`, where `N = a + b + c + d`.\n\n```python\nimport pyenrichr as pye\n\n# initialize calculations\nfisher = pye.enrichment.FastFisher(34000)\n\n# load a gene set library\nlib_1 = pye.libraries.get_library(\"GO_Biological_Process_2023\")\nlib_2 = pye.libraries.get_library(\"KEGG_2021_Human\")\n\n# get example gene set\ngene_set = pye.libraries.example_set()\n\n# calculate enrichment for gene set against all gene sets in library 1 and 2\nresult_1 = pye.enrichment.fisher(gene_set, lib_1, fisher=fisher)\nresult_2 = pye.enrichment.fisher(gene_set, lib_2, fisher=fisher)\n```\n\n### Gene Set Filtering\n\nSmall gene sets and small overlaps can be filtered using the parameters `min_set_size` and `min_overlap`.\n\n```python\nimport pyenrichr as pye\n\n# load a gene set library\nlib = pye.libraries.get_library(\"GO_Biological_Process_2023\")\n\n# get example gene set\ngene_set = pye.libraries.example_set()\n\n# calculate enrichment for gene set against all gene sets in library.\n# Only gene sets larger than 10 genes are used and the minimum overlap has to be at least 5 to be reported.\nresult = pye.enrichment.fisher(gene_set, lib, min_set_size=10, min_overlap=5)\n```\n\n\n### Enrichment of Gene Set Library vs Gene Set Library\n\nWhen computing enrichment for multiple gene sets against a gene set library pyEnrichr uses an optimized implementation of overlap detection and multithreading to increase computational speed. In the example below we compute all pairwise enrichment between gene sets in GO Biological Processes. As before it is calling the fisher function, but instead of a gene set as first parameter it receives a gene set library in dictionary format. The output is a list of results containing a result dataframe for each gene set vs gene set library. The results can be consolidated into a single p-value matrix.\n\n```python\nimport pyenrichr as pye\n\n# load a gene set library\nlib = pye.libraries.get_library(\"GO_Biological_Process_2023\")\n\n# calculate enrichment for gene set library against all gene sets in another library.\n# Only gene sets larger than 10 genes are used and the minimum overlap has to be at least 5 to be reported.\nresult = pye.enrichment.fisher(lib, lib, min_set_size=10, min_overlap=5)\n\n# consolidate all p-values into a single dataframe\npmat = pye.enrichment.consolidate(result)\n```\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Official Enrichr Python package for fast local gene set enrichment.",
    "version": "1.0.2",
    "project_urls": {
        "Homepage": "https://github.com/maayanlab/pyenrichr"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "14d278d41f11d82d0fce14f5da51500fb1789a844f2281e7bcd1e145caedd9dc",
                "md5": "05b39221f94ba159e6642d2a1a8677fa",
                "sha256": "7df8e043eb8848591d90b22c5d26c97fb9eec6d4c7fe459e057a8f72e089e525"
            },
            "downloads": -1,
            "filename": "pyenrichr-1.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "05b39221f94ba159e6642d2a1a8677fa",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 10342291,
            "upload_time": "2024-06-10T20:37:07",
            "upload_time_iso_8601": "2024-06-10T20:37:07.365336Z",
            "url": "https://files.pythonhosted.org/packages/14/d2/78d41f11d82d0fce14f5da51500fb1789a844f2281e7bcd1e145caedd9dc/pyenrichr-1.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-10 20:37:07",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "maayanlab",
    "github_project": "pyenrichr",
    "github_not_found": true,
    "lcname": "pyenrichr"
}
        
Elapsed time: 0.31758s