pfex


Namepfex JSON
Version 0.1.13 PyPI version JSON
download
home_pagehttps://github.com/maayanlab/pfex
SummaryPackage for fast and accurate calculation of Fisher Exact Test with Enrichr library support.
upload_time2024-05-02 20:57:45
maintainerNone
docs_urlNone
authorAlexander Lachmann
requires_python>=3.6
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PFEX - Python Fisher EXact

The Python Fisher Exact test package supports EnrichR libraries and mimics the EnrichR backend. It has high performance for large gene set libraries. Instant enrichment results for a pure Python implementation of the Fisher Exact Test. This implementation allows the calculation of the same p-values as the Enrichr API, but runs locally and results in faster p-value computation.

### Installation

Install Python library using pip.

```
pip3 install pfex
```


### Enrichment Analysis

To run PFEX in Python run the following command. The result will be a dataframe containing the enriched gene sets of the library as rows, sorted by p-value.

```python
import pfex as pfx

# list all libraries from Enrichr
libraries = pfx.libraries.list_libraries()

# load a gene set library
lib = pfx.libraries.get_library("GO_Biological_Process_2023")

# get example gene set
gene_set = pfx.libraries.example_set()

# calculate enrichment for gene set against all gene sets in library
result = pfx.enrichment.fisher(gene_set, lib)
```

`lib` is a dictionary of sets. `pfx.enrichment.fisher` expects as input a set (gene_set) and a library (lib) in the form of a dictionary of sets.

### Example Output

The results are returned as Pandas DataFrames. The columns contain term, p-value, Sidak multiple hypothesis corrected p-value (sidak), False Discovery Rate (fdr), odds ratio (odds), overlap size (overlap), set-size, and gene-overlap.

| #  | Term                                                       | p-value       | sidak          | fdr           | odds      | overlap | set-size | Gene-overlap                                                                                         |
|--- |------------------------------------------------------------|---------------|----------------|---------------|-----------|---------|----------|------------------------------------------------------------------------------------------------------|
| 1  | Regulation Of Cell Population Proliferation...              | 1.041581e-41  | 5.655786e-39   | 5.655786e-39  | 8.903394  | 62      | 766      | PDGFRB,TGFB2,CSF1R,CXCL10,CD86,IL4,CTNNB1,STAT...                                                    |
| 2  | Positive Regulation Of Cell Population Proliferation...     | 2.914662e-37  | 1.582661e-34   | 7.913307e-35  | 11.159420 | 49      | 483      | PDGFRB,TGFB2,CSF1R,CD86,IL4,AKT1,EGFR,JAK2,CDK...                                                    |
| 3  | Positive Regulation Of Cell Migration (GO:0030335)          | 1.929354e-35  | 1.047639e-32   | 3.492131e-33  | 15.772059 | 39      | 272      | PDGFRB,TGFB2,CSF1R,ATM,PECAM1,TWIST1,IL4,STAT3...                                                    |
| 4  | Regulation Of Apoptotic Process (GO:0042981)                | 9.892051e-34  | 5.371384e-31   | 1.342846e-31  | 8.269504  | 53      | 705      | CASP9,CXCL10,ATM,RPS6KB1,FAS,IL4,CTNNB1,CD28,A...                                                    |
| 5  | Positive Regulation Of Intracellular Signal Transmission... | 3.297600e-33  | 1.790597e-30   | 3.581194e-31  | 9.847619  | 47      | 525      | PDGFRB,TGFB2,CD86,CHI3L1,BECN1,ENG,GAPDH,PPARG...                                                    |


### Fisher Initialization

When multiple libraries are computed some calculations can be pre initialized. This will speed up overall execution time.

```python
import pfex as pfx

# initialize calculations
fisher = pfx.enrichment.FastFisher(34000)

# load a gene set library
lib_1 = pfx.libraries.get_library("GO_Biological_Process_2023")
lib_2 = pfx.libraries.get_library("KEGG_2021_Human")

# get example gene set
gene_set = pfx.libraries.example_set()

# calculate enrichment for gene set against all gene sets in library 1 and 2
result_1 = pfx.enrichment.fisher(gene_set, lib_1, fisher=fisher)
result_2 = pfx.enrichment.fisher(gene_set, lib_2, fisher=fisher)
```

### Gene Set Filtering

Small gene sets and small overlaps can be filtered using the parameters `min_set_size` and `min_overlap`.

```python
import pfex as pfx

# load a gene set library
lib = pfx.libraries.get_library("GO_Biological_Process_2023")

# get example gene set
gene_set = pfx.libraries.example_set()

# calculate enrichment for gene set against all gene sets in library.
# Only gene sets larger than 10 genes are used and the minimum overlap has to be at least 5 to be reported.
result = pfx.enrichment.fisher(gene_set, lib, min_set_size=10, min_overlap=5)
```


### Enrichment of Gene Set Library vs Gene Set Library

When computing enrichment for multiple gene sets against a gene set library PFEX uses an optimized implementation of overlap detection and multithreading to increase computational speed. In the example below we compute all pairwise enrichment between gene sets in GO Biological Processes. As before it is calling the fisher function, but instead of a gene set as first parameter it receives a gene set library in dictionary format. The output is a list of results containing a result dataframe for each gene set vs gene set library. The results can be consolidated into a single p-value matrix.

```python
import pfex as pfx

# load a gene set library
lib = pfx.libraries.get_library("GO_Biological_Process_2023")

# calculate enrichment for gene set library against all gene sets in another library.
# Only gene sets larger than 10 genes are used and the minimum overlap has to be at least 5 to be reported.
result = pfx.enrichment.fisher(lib, lib, min_set_size=10, min_overlap=5)

# consolidate all p-values into a single dataframe
pmat = pfx.enrichment.consolidate(result)
```



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/maayanlab/pfex",
    "name": "pfex",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": null,
    "author": "Alexander Lachmann",
    "author_email": "alexander.lachmann@mssm.edu",
    "download_url": "https://files.pythonhosted.org/packages/0d/d7/e3da6734b9fdf56bdee60f15354711f1e38365cdbd0353bb7effeeaebac4/pfex-0.1.13.tar.gz",
    "platform": null,
    "description": "# PFEX - Python Fisher EXact\n\nThe Python Fisher Exact test package supports EnrichR libraries and mimics the EnrichR backend. It has high performance for large gene set libraries. Instant enrichment results for a pure Python implementation of the Fisher Exact Test. This implementation allows the calculation of the same p-values as the Enrichr API, but runs locally and results in faster p-value computation.\n\n### Installation\n\nInstall Python library using pip.\n\n```\npip3 install pfex\n```\n\n\n### Enrichment Analysis\n\nTo run PFEX in Python run the following command. The result will be a dataframe containing the enriched gene sets of the library as rows, sorted by p-value.\n\n```python\nimport pfex as pfx\n\n# list all libraries from Enrichr\nlibraries = pfx.libraries.list_libraries()\n\n# load a gene set library\nlib = pfx.libraries.get_library(\"GO_Biological_Process_2023\")\n\n# get example gene set\ngene_set = pfx.libraries.example_set()\n\n# calculate enrichment for gene set against all gene sets in library\nresult = pfx.enrichment.fisher(gene_set, lib)\n```\n\n`lib` is a dictionary of sets. `pfx.enrichment.fisher` expects as input a set (gene_set) and a library (lib) in the form of a dictionary of sets.\n\n### Example Output\n\nThe results are returned as Pandas DataFrames. The columns contain term, p-value, Sidak multiple hypothesis corrected p-value (sidak), False Discovery Rate (fdr), odds ratio (odds), overlap size (overlap), set-size, and gene-overlap.\n\n| #  | Term                                                       | p-value       | sidak          | fdr           | odds      | overlap | set-size | Gene-overlap                                                                                         |\n|--- |------------------------------------------------------------|---------------|----------------|---------------|-----------|---------|----------|------------------------------------------------------------------------------------------------------|\n| 1  | Regulation Of Cell Population Proliferation...              | 1.041581e-41  | 5.655786e-39   | 5.655786e-39  | 8.903394  | 62      | 766      | PDGFRB,TGFB2,CSF1R,CXCL10,CD86,IL4,CTNNB1,STAT...                                                    |\n| 2  | Positive Regulation Of Cell Population Proliferation...     | 2.914662e-37  | 1.582661e-34   | 7.913307e-35  | 11.159420 | 49      | 483      | PDGFRB,TGFB2,CSF1R,CD86,IL4,AKT1,EGFR,JAK2,CDK...                                                    |\n| 3  | Positive Regulation Of Cell Migration (GO:0030335)          | 1.929354e-35  | 1.047639e-32   | 3.492131e-33  | 15.772059 | 39      | 272      | PDGFRB,TGFB2,CSF1R,ATM,PECAM1,TWIST1,IL4,STAT3...                                                    |\n| 4  | Regulation Of Apoptotic Process (GO:0042981)                | 9.892051e-34  | 5.371384e-31   | 1.342846e-31  | 8.269504  | 53      | 705      | CASP9,CXCL10,ATM,RPS6KB1,FAS,IL4,CTNNB1,CD28,A...                                                    |\n| 5  | Positive Regulation Of Intracellular Signal Transmission... | 3.297600e-33  | 1.790597e-30   | 3.581194e-31  | 9.847619  | 47      | 525      | PDGFRB,TGFB2,CD86,CHI3L1,BECN1,ENG,GAPDH,PPARG...                                                    |\n\n\n### Fisher Initialization\n\nWhen multiple libraries are computed some calculations can be pre initialized. This will speed up overall execution time.\n\n```python\nimport pfex as pfx\n\n# initialize calculations\nfisher = pfx.enrichment.FastFisher(34000)\n\n# load a gene set library\nlib_1 = pfx.libraries.get_library(\"GO_Biological_Process_2023\")\nlib_2 = pfx.libraries.get_library(\"KEGG_2021_Human\")\n\n# get example gene set\ngene_set = pfx.libraries.example_set()\n\n# calculate enrichment for gene set against all gene sets in library 1 and 2\nresult_1 = pfx.enrichment.fisher(gene_set, lib_1, fisher=fisher)\nresult_2 = pfx.enrichment.fisher(gene_set, lib_2, fisher=fisher)\n```\n\n### Gene Set Filtering\n\nSmall gene sets and small overlaps can be filtered using the parameters `min_set_size` and `min_overlap`.\n\n```python\nimport pfex as pfx\n\n# load a gene set library\nlib = pfx.libraries.get_library(\"GO_Biological_Process_2023\")\n\n# get example gene set\ngene_set = pfx.libraries.example_set()\n\n# calculate enrichment for gene set against all gene sets in library.\n# Only gene sets larger than 10 genes are used and the minimum overlap has to be at least 5 to be reported.\nresult = pfx.enrichment.fisher(gene_set, lib, min_set_size=10, min_overlap=5)\n```\n\n\n### Enrichment of Gene Set Library vs Gene Set Library\n\nWhen computing enrichment for multiple gene sets against a gene set library PFEX uses an optimized implementation of overlap detection and multithreading to increase computational speed. In the example below we compute all pairwise enrichment between gene sets in GO Biological Processes. As before it is calling the fisher function, but instead of a gene set as first parameter it receives a gene set library in dictionary format. The output is a list of results containing a result dataframe for each gene set vs gene set library. The results can be consolidated into a single p-value matrix.\n\n```python\nimport pfex as pfx\n\n# load a gene set library\nlib = pfx.libraries.get_library(\"GO_Biological_Process_2023\")\n\n# calculate enrichment for gene set library against all gene sets in another library.\n# Only gene sets larger than 10 genes are used and the minimum overlap has to be at least 5 to be reported.\nresult = pfx.enrichment.fisher(lib, lib, min_set_size=10, min_overlap=5)\n\n# consolidate all p-values into a single dataframe\npmat = pfx.enrichment.consolidate(result)\n```\n\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Package for fast and accurate calculation of Fisher Exact Test with Enrichr library support.",
    "version": "0.1.13",
    "project_urls": {
        "Homepage": "https://github.com/maayanlab/pfex"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0dd7e3da6734b9fdf56bdee60f15354711f1e38365cdbd0353bb7effeeaebac4",
                "md5": "55f48034c1903d9ca109676bf8916122",
                "sha256": "a7b5915a2241e088e55b51b0ca0c8f3956a9015312af83be14565a3fc075cf4c"
            },
            "downloads": -1,
            "filename": "pfex-0.1.13.tar.gz",
            "has_sig": false,
            "md5_digest": "55f48034c1903d9ca109676bf8916122",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 540999,
            "upload_time": "2024-05-02T20:57:45",
            "upload_time_iso_8601": "2024-05-02T20:57:45.460579Z",
            "url": "https://files.pythonhosted.org/packages/0d/d7/e3da6734b9fdf56bdee60f15354711f1e38365cdbd0353bb7effeeaebac4/pfex-0.1.13.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-02 20:57:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "maayanlab",
    "github_project": "pfex",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "pfex"
}
        
Elapsed time: 0.25772s