gprofiler-official


Namegprofiler-official JSON
Version 1.0.0 PyPI version JSON
download
home_pagehttps://biit.cs.ut.ee/gprofiler/
SummaryFunctional enrichment analysis and more via the g:Profiler toolkit
upload_time2019-04-02 10:52:19
maintainer
docs_urlNone
authorUku Raudvere
requires_python
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # gprofiler

## Project description

The official Python 3 interface to the [g:Profiler](https://biit.cs.ut.ee/gprofiler/) 
toolkit for enrichment analysis of functional (GO and other) terms, 
conversion between identifier namespaces and mapping orhologous genes in related organisms. 

It has an optional dependency on pandas.

### Installing gprofiler

the recommended way of installing gprofiler is using pip
```bash
pip install gprofiler-official
```

### Legacy version 

The `0.3.x` series of gprofiler-official is incompatible with the `1.0.x` series. We changed the major version number to 
signify the breaking changes in the API. To install the previous version of `gprofiler-official`, use the command
```bash
pip install gprofiler-official==0.3.5
```

## Tools:

To use any of the tools in the g:Profiler toolkit, first initialize the GProfiler object.

```python
from gprofiler import GProfiler
gp = GProfiler(
    user_agent='ExampleTool', #optional user agent
    return_dataframe=True, #return pandas dataframe or plain python structures    
)
```


### g:GOSt (profile)

```python
from gprofiler import GProfiler

gp = GProfiler(return_dataframe=True)
gp.profile(organism='hsapiens',
            query=['NR1H4','TRIP12','UBC','FCRL3','PLXNA3','GDNF','VPS11'])
```

Output:
```
source      native                                            name   p_value  significant                                        description  term_size  query_size  intersection_size  effective_domain_size  precision    recall    query                               parents
GO:BP  GO:0048585     negative regulation of response to stimulus  0.004229         True  "Any process that stops, prevents, or reduces ...       1610           7                  6                  17622   0.857143  0.003727  query_1  [GO:0048583, GO:0048519, GO:0050896]
GO:BP  GO:0002224            toll-like receptor signaling pathway  0.016351         True  "Any series of molecular signals generated as ...        133           7                  3                  17622   0.428571  0.022556  query_1                          [GO:0002221]
GO:BP  GO:0048486      parasympathetic nervous system development  0.026199         True  "The process whose specific outcome is the pro...         19           7                  2                  17622   0.285714  0.105263  query_1              [GO:0048483, GO:0048731]
GO:BP  GO:0034162          toll-like receptor 9 signaling pathway  0.038733         True  "Any series of molecular signals generated as ...         23           7                  2                  17622   0.285714  0.086957  query_1                          [GO:0002224]
GO:BP  GO:0002221  pattern recognition receptor signaling pathway  0.039782         True  "Any series of molecular signals generated as ...        179           7                  3                  17622   0.428571  0.016760  query_1                          [GO:0002758]
CORUM  CORUM:5669                           PlexinA3-Nrp1 complex  0.049767         True                              PlexinA3-Nrp1 complex          2           2                  1                   3620   0.500000  0.500000  query_1                       [CORUM:0000000]
CORUM  CORUM:5759                           PLXNA3-RANBPM complex  0.049767         True                              PLXNA3-RANBPM complex          2           2                  1                   3620   0.500000  0.500000  query_1                       [CORUM:0000000]
```

* `source` is the code for the datasource
* `native` is the ID for the enriched term/functional category in its native namespace.
* `name` is the readable name for the enriched term, `description` is the longer description if available.
* `p_value` is the corrected p-value for the 
* `term_size`, `query_size`, `intersection_size`, `effective_domain_size` are parameters to the hypergeometric test.
* `query` is the name of the query and is significant if multiple queries were made in one call (e.g `gp.profile(query={'query1':['NR1H4'], 'query2':['NR1H4','TRIP12']})`)

Setting the parameter `no_evidences=False` would add the column `intersections` (a list of genes that are annotated to the term and are present in the query )
and the column `evidences` (a list of lists of GO evidence codes for the intersecting genes)




NB! the parameter `combined` significantly changes the output structure by packing the results of distinct queries together.
For example:

```python
gp.profile(query={'query1':['NR1H4'], 'query2':['NR1H4','TRIP12']}, combined=True)
```
Output (truncated):
```
source      native                                               name                                     p_values                                        description  term_size query_sizes intersection_sizes  effective_domain_size                                           parents
GO:MF  GO:1902122                      chenodeoxycholic acid binding  [0.024822026073022193, 0.04964405214614093]  "Interacting selectively and non-covalently wi...          1      [1, 2]             [1, 1]                  17516                          [GO:0032052, GO:0005496]
GO:MF  GO:0035257                   nuclear hormone receptor binding                  [1.0, 0.033391754400990514]  "Interacting selectively and non-covalently wi...        154      [1, 2]             [1, 2]                  17516                          [GO:0051427, GO:0061629]
GO:MF  GO:0051427                           hormone receptor binding                   [1.0, 0.04929258983003374]  "Interacting selectively and non-covalently wi...        187      [1, 2]             [1, 2]                  17516                                      [GO:0005102]
```


### g:Convert (convert)

```python
from gprofiler import GProfiler

gp = GProfiler(return_dataframe=True)
gp.convert(organism='hsapiens',
            query=['NR1H4','TRIP12','UBC','FCRL3','PLXNA3','GDNF','VPS11'],
            target_namespace='ENTREZGENE_ACC')

```

Output:
```
incoming converted  n_incoming  n_converted    name                                        description                           namespaces    query
  NR1H4      9971           1            1   NR1H4  nuclear receptor subfamily 1 group H member 4 ...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1
 TRIP12      9320           2            1  TRIP12  thyroid hormone receptor interactor 12 [Source...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1
    UBC      7316           3            1     UBC    ubiquitin C [Source:HGNC Symbol;Acc:HGNC:12468]  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1
  FCRL3    115352           4            1   FCRL3  Fc receptor like 3 [Source:HGNC Symbol;Acc:HGN...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1
 PLXNA3     55558           5            1  PLXNA3       plexin A3 [Source:HGNC Symbol;Acc:HGNC:9101]             ENTREZGENE,HGNC,WIKIGENE  query_1
   GDNF      2668           6            1    GDNF  glial cell derived neurotrophic factor [Source...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1
  VPS11     55823           7            1   VPS11  VPS11, CORVET/HOPS core subunit [Source:HGNC S...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1
 PLXNA3     55558           5            1  PLXNA3       plexin A3 [Source:HGNC Symbol;Acc:HGNC:9101]             ENTREZGENE,HGNC,WIKIGENE  query_1
```

`incoming` column lists the input gene, `converted` lists the gene in the target namespace (Entrez Gene accession number in this case). 



### g:Orth (orth)

```python
from gprofiler import GProfiler

gp = GProfiler(return_dataframe=True)
gp.orth(organism='hsapiens',
            query=['NR1H4','TRIP12','UBC','FCRL3','PLXNA3','GDNF','VPS11'],
            target='mmusculus')

```
Output:
```
incoming        converted       ortholog_ensg  n_incoming  n_converted  n_result    name                                        description                           namespaces
  NR1H4  ENSG00000012504  ENSMUSG00000047638           1            1         1   Nr1h4  nuclear receptor subfamily 1, group H, member ...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
 TRIP12  ENSG00000153827  ENSMUSG00000026219           2            1         1  Trip12  thyroid hormone receptor interactor 12 [Source...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
    UBC  ENSG00000150991  ENSMUSG00000008348           3            1         1     Ubc      ubiquitin C [Source:MGI Symbol;Acc:MGI:98889]  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
  FCRL3  ENSG00000160856                 N/A           4            1         1     N/A                                                N/A  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
 PLXNA3  ENSG00000130827  ENSMUSG00000031398           5            1         1  Plxna3       plexin A3 [Source:MGI Symbol;Acc:MGI:107683]             ENTREZGENE,HGNC,WIKIGENE
   GDNF  ENSG00000168621  ENSMUSG00000022144           6            1         1    Gdnf  glial cell line derived neurotrophic factor [S...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
  VPS11  ENSG00000160695  ENSMUSG00000032127           7            1         1   Vps11  VPS11, CORVET/HOPS core subunit [Source:MGI Sy...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE

```

`incoming` is the input gene, `converted` is the canonical Ensembl ID for the input gene, 
`ortholog_ensg` is the canonical Ensembl ID for the orthologous gene in the target organism.

### g:SNPense (snpense)

```python
from gprofiler import GProfiler

gp = GProfiler(return_dataframe=True)
gp.snpense(query=['rs11734132', 'rs7961894', 'rs4305276', 'rs17396340'])
```
Output:

```
rs_id chromosome strand      start        end              ensgs gene_names                                           variants
rs11734132                           -1         -1                 []         []  {'intron_variant': 0, 'non_coding_transcript_v...
 rs7961894         12      +  121927677  121927677  [ENSG00000158023]    [WDR66]  {'intron_variant': 3, 'non_coding_transcript_v...
 rs4305276          2      +  240555596  240555596  [ENSG00000144504]   [ANKMY1]  {'intron_variant': 57, 'non_coding_transcript_...
rs17396340          1      +   10226118   10226118  [ENSG00000054523]    [KIF1B]  {'intron_variant': 8, 'non_coding_transcript_v...

```
* `rs_id` is the input rs-number
* `chromosome`, `strand`, `start` and `end` encode the position of the variation
* `ensgs` and `gene_names` are lists of protein-encoding genes associated with the rs-number.
* `variants` are predicted variant effects.


            

Raw data

            {
    "_id": null,
    "home_page": "https://biit.cs.ut.ee/gprofiler/",
    "name": "gprofiler-official",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Uku Raudvere",
    "author_email": "biit.support@ut.ee",
    "download_url": "https://files.pythonhosted.org/packages/ec/c1/d9252620d09a064247d1623ebc4732d624921a2ed80a677f8b9ce61810dd/gprofiler-official-1.0.0.tar.gz",
    "platform": "",
    "description": "# gprofiler\n\n## Project description\n\nThe official Python 3 interface to the [g:Profiler](https://biit.cs.ut.ee/gprofiler/) \ntoolkit for enrichment analysis of functional (GO and other) terms, \nconversion between identifier namespaces and mapping orhologous genes in related organisms. \n\nIt has an optional dependency on pandas.\n\n### Installing gprofiler\n\nthe recommended way of installing gprofiler is using pip\n```bash\npip install gprofiler-official\n```\n\n### Legacy version \n\nThe `0.3.x` series of gprofiler-official is incompatible with the `1.0.x` series. We changed the major version number to \nsignify the breaking changes in the API. To install the previous version of `gprofiler-official`, use the command\n```bash\npip install gprofiler-official==0.3.5\n```\n\n## Tools:\n\nTo use any of the tools in the g:Profiler toolkit, first initialize the GProfiler object.\n\n```python\nfrom gprofiler import GProfiler\ngp = GProfiler(\n    user_agent='ExampleTool', #optional user agent\n    return_dataframe=True, #return pandas dataframe or plain python structures    \n)\n```\n\n\n### g:GOSt (profile)\n\n```python\nfrom gprofiler import GProfiler\n\ngp = GProfiler(return_dataframe=True)\ngp.profile(organism='hsapiens',\n            query=['NR1H4','TRIP12','UBC','FCRL3','PLXNA3','GDNF','VPS11'])\n```\n\nOutput:\n```\nsource      native                                            name   p_value  significant                                        description  term_size  query_size  intersection_size  effective_domain_size  precision    recall    query                               parents\nGO:BP  GO:0048585     negative regulation of response to stimulus  0.004229         True  \"Any process that stops, prevents, or reduces ...       1610           7                  6                  17622   0.857143  0.003727  query_1  [GO:0048583, GO:0048519, GO:0050896]\nGO:BP  GO:0002224            toll-like receptor signaling pathway  0.016351         True  \"Any series of molecular signals generated as ...        133           7                  3                  17622   0.428571  0.022556  query_1                          [GO:0002221]\nGO:BP  GO:0048486      parasympathetic nervous system development  0.026199         True  \"The process whose specific outcome is the pro...         19           7                  2                  17622   0.285714  0.105263  query_1              [GO:0048483, GO:0048731]\nGO:BP  GO:0034162          toll-like receptor 9 signaling pathway  0.038733         True  \"Any series of molecular signals generated as ...         23           7                  2                  17622   0.285714  0.086957  query_1                          [GO:0002224]\nGO:BP  GO:0002221  pattern recognition receptor signaling pathway  0.039782         True  \"Any series of molecular signals generated as ...        179           7                  3                  17622   0.428571  0.016760  query_1                          [GO:0002758]\nCORUM  CORUM:5669                           PlexinA3-Nrp1 complex  0.049767         True                              PlexinA3-Nrp1 complex          2           2                  1                   3620   0.500000  0.500000  query_1                       [CORUM:0000000]\nCORUM  CORUM:5759                           PLXNA3-RANBPM complex  0.049767         True                              PLXNA3-RANBPM complex          2           2                  1                   3620   0.500000  0.500000  query_1                       [CORUM:0000000]\n```\n\n* `source` is the code for the datasource\n* `native` is the ID for the enriched term/functional category in its native namespace.\n* `name` is the readable name for the enriched term, `description` is the longer description if available.\n* `p_value` is the corrected p-value for the \n* `term_size`, `query_size`, `intersection_size`, `effective_domain_size` are parameters to the hypergeometric test.\n* `query` is the name of the query and is significant if multiple queries were made in one call (e.g `gp.profile(query={'query1':['NR1H4'], 'query2':['NR1H4','TRIP12']})`)\n\nSetting the parameter `no_evidences=False` would add the column `intersections` (a list of genes that are annotated to the term and are present in the query )\nand the column `evidences` (a list of lists of GO evidence codes for the intersecting genes)\n\n\n\n\nNB! the parameter `combined` significantly changes the output structure by packing the results of distinct queries together.\nFor example:\n\n```python\ngp.profile(query={'query1':['NR1H4'], 'query2':['NR1H4','TRIP12']}, combined=True)\n```\nOutput (truncated):\n```\nsource      native                                               name                                     p_values                                        description  term_size query_sizes intersection_sizes  effective_domain_size                                           parents\nGO:MF  GO:1902122                      chenodeoxycholic acid binding  [0.024822026073022193, 0.04964405214614093]  \"Interacting selectively and non-covalently wi...          1      [1, 2]             [1, 1]                  17516                          [GO:0032052, GO:0005496]\nGO:MF  GO:0035257                   nuclear hormone receptor binding                  [1.0, 0.033391754400990514]  \"Interacting selectively and non-covalently wi...        154      [1, 2]             [1, 2]                  17516                          [GO:0051427, GO:0061629]\nGO:MF  GO:0051427                           hormone receptor binding                   [1.0, 0.04929258983003374]  \"Interacting selectively and non-covalently wi...        187      [1, 2]             [1, 2]                  17516                                      [GO:0005102]\n```\n\n\n### g:Convert (convert)\n\n```python\nfrom gprofiler import GProfiler\n\ngp = GProfiler(return_dataframe=True)\ngp.convert(organism='hsapiens',\n            query=['NR1H4','TRIP12','UBC','FCRL3','PLXNA3','GDNF','VPS11'],\n            target_namespace='ENTREZGENE_ACC')\n\n```\n\nOutput:\n```\nincoming converted  n_incoming  n_converted    name                                        description                           namespaces    query\n  NR1H4      9971           1            1   NR1H4  nuclear receptor subfamily 1 group H member 4 ...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1\n TRIP12      9320           2            1  TRIP12  thyroid hormone receptor interactor 12 [Source...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1\n    UBC      7316           3            1     UBC    ubiquitin C [Source:HGNC Symbol;Acc:HGNC:12468]  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1\n  FCRL3    115352           4            1   FCRL3  Fc receptor like 3 [Source:HGNC Symbol;Acc:HGN...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1\n PLXNA3     55558           5            1  PLXNA3       plexin A3 [Source:HGNC Symbol;Acc:HGNC:9101]             ENTREZGENE,HGNC,WIKIGENE  query_1\n   GDNF      2668           6            1    GDNF  glial cell derived neurotrophic factor [Source...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1\n  VPS11     55823           7            1   VPS11  VPS11, CORVET/HOPS core subunit [Source:HGNC S...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE  query_1\n PLXNA3     55558           5            1  PLXNA3       plexin A3 [Source:HGNC Symbol;Acc:HGNC:9101]             ENTREZGENE,HGNC,WIKIGENE  query_1\n```\n\n`incoming` column lists the input gene, `converted` lists the gene in the target namespace (Entrez Gene accession number in this case). \n\n\n\n### g:Orth (orth)\n\n```python\nfrom gprofiler import GProfiler\n\ngp = GProfiler(return_dataframe=True)\ngp.orth(organism='hsapiens',\n            query=['NR1H4','TRIP12','UBC','FCRL3','PLXNA3','GDNF','VPS11'],\n            target='mmusculus')\n\n```\nOutput:\n```\nincoming        converted       ortholog_ensg  n_incoming  n_converted  n_result    name                                        description                           namespaces\n  NR1H4  ENSG00000012504  ENSMUSG00000047638           1            1         1   Nr1h4  nuclear receptor subfamily 1, group H, member ...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE\n TRIP12  ENSG00000153827  ENSMUSG00000026219           2            1         1  Trip12  thyroid hormone receptor interactor 12 [Source...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE\n    UBC  ENSG00000150991  ENSMUSG00000008348           3            1         1     Ubc      ubiquitin C [Source:MGI Symbol;Acc:MGI:98889]  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE\n  FCRL3  ENSG00000160856                 N/A           4            1         1     N/A                                                N/A  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE\n PLXNA3  ENSG00000130827  ENSMUSG00000031398           5            1         1  Plxna3       plexin A3 [Source:MGI Symbol;Acc:MGI:107683]             ENTREZGENE,HGNC,WIKIGENE\n   GDNF  ENSG00000168621  ENSMUSG00000022144           6            1         1    Gdnf  glial cell line derived neurotrophic factor [S...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE\n  VPS11  ENSG00000160695  ENSMUSG00000032127           7            1         1   Vps11  VPS11, CORVET/HOPS core subunit [Source:MGI Sy...  ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE\n\n```\n\n`incoming` is the input gene, `converted` is the canonical Ensembl ID for the input gene, \n`ortholog_ensg` is the canonical Ensembl ID for the orthologous gene in the target organism.\n\n### g:SNPense (snpense)\n\n```python\nfrom gprofiler import GProfiler\n\ngp = GProfiler(return_dataframe=True)\ngp.snpense(query=['rs11734132', 'rs7961894', 'rs4305276', 'rs17396340'])\n```\nOutput:\n\n```\nrs_id chromosome strand      start        end              ensgs gene_names                                           variants\nrs11734132                           -1         -1                 []         []  {'intron_variant': 0, 'non_coding_transcript_v...\n rs7961894         12      +  121927677  121927677  [ENSG00000158023]    [WDR66]  {'intron_variant': 3, 'non_coding_transcript_v...\n rs4305276          2      +  240555596  240555596  [ENSG00000144504]   [ANKMY1]  {'intron_variant': 57, 'non_coding_transcript_...\nrs17396340          1      +   10226118   10226118  [ENSG00000054523]    [KIF1B]  {'intron_variant': 8, 'non_coding_transcript_v...\n\n```\n* `rs_id` is the input rs-number\n* `chromosome`, `strand`, `start` and `end` encode the position of the variation\n* `ensgs` and `gene_names` are lists of protein-encoding genes associated with the rs-number.\n* `variants` are predicted variant effects.\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Functional enrichment analysis and more via the g:Profiler toolkit",
    "version": "1.0.0",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "df1b5a87c1a1da8f601c00a0ce4dedb5aab8a5cad6a0f4a5062c4da22a045072",
                "md5": "a31adb48d09059958b1f48cf0d356879",
                "sha256": "c582baf728e5a6cddac964e4085ca385e082c4ef0279e3af1a16a9af07ab5395"
            },
            "downloads": -1,
            "filename": "gprofiler_official-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a31adb48d09059958b1f48cf0d356879",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 9277,
            "upload_time": "2019-04-02T10:52:17",
            "upload_time_iso_8601": "2019-04-02T10:52:17.769447Z",
            "url": "https://files.pythonhosted.org/packages/df/1b/5a87c1a1da8f601c00a0ce4dedb5aab8a5cad6a0f4a5062c4da22a045072/gprofiler_official-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ecc1d9252620d09a064247d1623ebc4732d624921a2ed80a677f8b9ce61810dd",
                "md5": "27af90e2bdce5603262f6b23f97679b0",
                "sha256": "5015b47f10fbdcb59c57e342e815c9c07afbe57cd3984154f75b845ddef2445d"
            },
            "downloads": -1,
            "filename": "gprofiler-official-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "27af90e2bdce5603262f6b23f97679b0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 9584,
            "upload_time": "2019-04-02T10:52:19",
            "upload_time_iso_8601": "2019-04-02T10:52:19.527145Z",
            "url": "https://files.pythonhosted.org/packages/ec/c1/d9252620d09a064247d1623ebc4732d624921a2ed80a677f8b9ce61810dd/gprofiler-official-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2019-04-02 10:52:19",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "gprofiler-official"
}
        
Elapsed time: 0.06462s