gilda


Namegilda JSON
Version 1.2.0 PyPI version JSON
download
home_pagehttps://github.com/indralab/gilda
SummaryGrounding for biomedical entities with contextual disambiguation
upload_time2024-04-30 20:07:00
maintainerNone
docs_urlNone
authorBenjamin M. Gyori, Harvard Medical School
requires_pythonNone
licenseNone
keywords nlp biology
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Gilda: Grounding Integrating Learned Disambiguation
[![License](https://img.shields.io/badge/License-BSD%202--Clause-orange.svg)](https://opensource.org/licenses/BSD-2-Clause)
[![Build](https://github.com/indralab/gilda/actions/workflows/tests.yml/badge.svg)](https://github.com/indralab/gilda/actions)
[![Documentation](https://readthedocs.org/projects/gilda/badge/?version=latest)](https://gilda.readthedocs.io/en/latest/?badge=latest)
[![PyPI version](https://badge.fury.io/py/gilda.svg)](https://badge.fury.io/py/gilda)
[![DOI](https://img.shields.io/badge/DOI-10.1093/bioadv/vbac034-green.svg)](https://doi.org/10.1093/bioadv/vbac034)

Gilda is a Python package and REST service that grounds (i.e., finds
appropriate identifiers in namespaces for) named entities in biomedical text.

Gyori BM, Hoyt CT, Steppi A (2022). Gilda: biomedical entity text normalization with machine-learned disambiguation as a service. Bioinformatics Advances, 2022; vbac034 [https://doi.org/10.1093/bioadv/vbac034](https://doi.org/10.1093/bioadv/vbac034).

## Installation
Gilda is deployed as a web service at http://grounding.indra.bio/ (see
Usage instructions below), however, it can also be used locally as a Python
package.

The recommended method to install Gilda is through PyPI as
```bash
pip install gilda
```
Note that Gilda uses a single large resource file for grounding, which is
automatically downloaded into the `~/.data/gilda/<version>` folder during
runtime (see [pystow](https://github.com/cthoyt/pystow#%EF%B8%8F%EF%B8%8F-configuration) for options to
configure the location of this folder).

Given some additional dependencies, the grounding resource file can
also be regenerated locally by running `python -m gilda.generate_terms`.

## Documentation and notebooks
Documentation for Gilda is available [here](https://gilda.readthedocs.io).
We also provide several interactive Jupyter notebooks to help use and customize Gilda:
- [Gilda Introduction](https://github.com/indralab/gilda/blob/master/notebooks/gilda_introduction.ipynb) provides an interactive tutorial for using Gilda.
- [Custom Grounders](https://github.com/indralab/gilda/blob/master/notebooks/custom_grounders.ipynb) shows several examples of how Gilda can be instantiated with custom
grounding resources.
- [Model Training](https://github.com/indralab/gilda/blob/master/models/model_training.ipynb) provides interactive sample code for training
new disambiguation models.

## Usage
Gilda can either be used as a REST web service or used programmatically
via its Python API. An introduction Jupyter notebook for using Gilda
is available at
https://github.com/indralab/gilda/blob/master/notebooks/gilda_introduction.ipynb

### Use as a Python package
For using Gilda as a Python package, the documentation at
http://gilda.readthedocs.org provides detailed descriptions of each module of
Gilda and their usage. A basic usage example for named entity normalization (NEN),
or _grounding_ is as follows:

```python
import gilda
scored_matches = gilda.ground('ER', context='Calcium is released from the ER.')
```

Gilda also implements a simple dictionary-based named entity recognition (NER)
algorithm that can be used as follows:

```python
import gilda
results = gilda.annotate('Calcium is released from the ER.')
```

### Use as a web service

The REST service accepts POST requests with a JSON header on the `/ground`
endpoint. There is a public REST service running at http://grounding.indra.bio
but the service can also be run locally as

```bash
python -m gilda.app
```
which, by default, launches the server at `localhost:8001` (for local usage
replace the URL in the examples below with this address).

Below is an example request using `curl`:

```bash
curl -X POST -H "Content-Type: application/json" -d '{"text": "kras"}' http://grounding.indra.bio/ground
```

The same request using Python's request package would be as follows:

```python
import requests
requests.post('http://grounding.indra.bio/ground', json={'text': 'kras'})
```

The web service also supports multiple inputs in a single request on the
`ground_multi` endpoint, for instance

```python
import requests
requests.post('http://grounding.indra.bio/ground_multi',
              json=[
                  {'text': 'braf'},
                  {'text': 'ER', 'context': 'endoplasmic reticulum (ER) is a cellular component'}
              ]
          )
```

## Resource usage
Gilda loads grounding terms into memory when first used. If memory usage
is an issue, the following options are recommended.

1. Run a single instance of Gilda as a local web service that one or more
other processes send requests to.

2. Create a custom Grounder instance that only loads a subset of terms
appropriate for a narrow use case.

3. Gilda also offers an optional sqlite back-end which significantly decreases
memory usage and results in minor drop in the number of strings grounder per
unit time. The sqlite back-end database can be built as follows with an
optional `[db_path]` argument, which if used, should use the .db extension. If
not specified, the .db file is generated in Gilda's default resource folder.

```bash
python -m gilda.resources.sqlite_adapter [db_path]
```

A Grounder instance can then be instantiated as follows:

```python
from gilda.grounder import Grounder
gr = Grounder(db_path)
matches = gr.ground('kras')
```

## Run web service with Docker

After cloning the repository locally, you can build and run a Docker image
of Gilda using the following commands:

```shell
$ docker build -t gilda:latest .
$ docker run -d -p 8001:8001 gilda:latest
```

Alternatively, you can use `docker-compose` to do both the initial build and
run the container based on the `docker-compose.yml` configuration:

```shell
$ docker-compose up
```

## Default grounding resources

Gilda is customizable with terms coming from different vocabularies. However,
Gilda comes with a default set of resources from which terms are collected
(almost 2 million entries as of v1.1.0), without any additional configuration
needed. These resources include:
- [HGNC](https://bioregistry.io/hgnc) (human genes)
- [UniProt](https://bioregistry.io/uniprot) (human and model organism proteins)
- [FamPlex](https://bioregistry.io/famplex) (human protein families and complexes)
- [CHeBI](https://bioregistry.io/chebi) (small molecules, metabolites, etc.)
- [GO](https://bioregistry.io/go) (biological processes, molecular functions, complexes)
- [DOID](https://bioregistry.io/doid) (diseases)
- [EFO](https://bioregistry.io/efo) (experimental factors: cell lines, cell types, anatomical entities, etc.)
- [HP](https://bioregistry.io/hp) (human phenotypes)
- [MeSH](https://bioregistry.io/mesh) (general: diseases, proteins, small molecules, cell types, etc.)
- [Adeft](https://github.com/gyorilab/adeft) (misc. terms corresponding to ambiguous acronyms)

## Citation

```bibtex
@article{gyori2022gilda,
    author = {Gyori, Benjamin M and Hoyt, Charles Tapley and Steppi, Albert},
    title = "{{Gilda: biomedical entity text normalization with machine-learned disambiguation as a service}}",
    journal = {Bioinformatics Advances},
    year = {2022},
    month = {05},
    issn = {2635-0041},
    doi = {10.1093/bioadv/vbac034},
    url = {https://doi.org/10.1093/bioadv/vbac034},
    note = {vbac034}
}
```

## Funding
The development of Gilda was funded under the DARPA Communicating with Computers
program (ARO grant W911NF-15-1-0544) and the DARPA Young Faculty Award
(ARO grant W911NF-20-1-0255).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/indralab/gilda",
    "name": "gilda",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "nlp, biology",
    "author": "Benjamin M. Gyori, Harvard Medical School",
    "author_email": "benjamin_gyori@hms.harvard.edu",
    "download_url": "https://files.pythonhosted.org/packages/dc/83/6bbe9bd88c508f704dd81eb5fc33b4277a5312796f6d9940e4cf2dfa296f/gilda-1.2.0.tar.gz",
    "platform": null,
    "description": "# Gilda: Grounding Integrating Learned Disambiguation\n[![License](https://img.shields.io/badge/License-BSD%202--Clause-orange.svg)](https://opensource.org/licenses/BSD-2-Clause)\n[![Build](https://github.com/indralab/gilda/actions/workflows/tests.yml/badge.svg)](https://github.com/indralab/gilda/actions)\n[![Documentation](https://readthedocs.org/projects/gilda/badge/?version=latest)](https://gilda.readthedocs.io/en/latest/?badge=latest)\n[![PyPI version](https://badge.fury.io/py/gilda.svg)](https://badge.fury.io/py/gilda)\n[![DOI](https://img.shields.io/badge/DOI-10.1093/bioadv/vbac034-green.svg)](https://doi.org/10.1093/bioadv/vbac034)\n\nGilda is a Python package and REST service that grounds (i.e., finds\nappropriate identifiers in namespaces for) named entities in biomedical text.\n\nGyori BM, Hoyt CT, Steppi A (2022). Gilda: biomedical entity text normalization with machine-learned disambiguation as a service. Bioinformatics Advances, 2022; vbac034 [https://doi.org/10.1093/bioadv/vbac034](https://doi.org/10.1093/bioadv/vbac034).\n\n## Installation\nGilda is deployed as a web service at http://grounding.indra.bio/ (see\nUsage instructions below), however, it can also be used locally as a Python\npackage.\n\nThe recommended method to install Gilda is through PyPI as\n```bash\npip install gilda\n```\nNote that Gilda uses a single large resource file for grounding, which is\nautomatically downloaded into the `~/.data/gilda/<version>` folder during\nruntime (see [pystow](https://github.com/cthoyt/pystow#%EF%B8%8F%EF%B8%8F-configuration) for options to\nconfigure the location of this folder).\n\nGiven some additional dependencies, the grounding resource file can\nalso be regenerated locally by running `python -m gilda.generate_terms`.\n\n## Documentation and notebooks\nDocumentation for Gilda is available [here](https://gilda.readthedocs.io).\nWe also provide several interactive Jupyter notebooks to help use and customize Gilda:\n- [Gilda Introduction](https://github.com/indralab/gilda/blob/master/notebooks/gilda_introduction.ipynb) provides an interactive tutorial for using Gilda.\n- [Custom Grounders](https://github.com/indralab/gilda/blob/master/notebooks/custom_grounders.ipynb) shows several examples of how Gilda can be instantiated with custom\ngrounding resources.\n- [Model Training](https://github.com/indralab/gilda/blob/master/models/model_training.ipynb) provides interactive sample code for training\nnew disambiguation models.\n\n## Usage\nGilda can either be used as a REST web service or used programmatically\nvia its Python API. An introduction Jupyter notebook for using Gilda\nis available at\nhttps://github.com/indralab/gilda/blob/master/notebooks/gilda_introduction.ipynb\n\n### Use as a Python package\nFor using Gilda as a Python package, the documentation at\nhttp://gilda.readthedocs.org provides detailed descriptions of each module of\nGilda and their usage. A basic usage example for named entity normalization (NEN),\nor _grounding_ is as follows:\n\n```python\nimport gilda\nscored_matches = gilda.ground('ER', context='Calcium is released from the ER.')\n```\n\nGilda also implements a simple dictionary-based named entity recognition (NER)\nalgorithm that can be used as follows:\n\n```python\nimport gilda\nresults = gilda.annotate('Calcium is released from the ER.')\n```\n\n### Use as a web service\n\nThe REST service accepts POST requests with a JSON header on the `/ground`\nendpoint. There is a public REST service running at http://grounding.indra.bio\nbut the service can also be run locally as\n\n```bash\npython -m gilda.app\n```\nwhich, by default, launches the server at `localhost:8001` (for local usage\nreplace the URL in the examples below with this address).\n\nBelow is an example request using `curl`:\n\n```bash\ncurl -X POST -H \"Content-Type: application/json\" -d '{\"text\": \"kras\"}' http://grounding.indra.bio/ground\n```\n\nThe same request using Python's request package would be as follows:\n\n```python\nimport requests\nrequests.post('http://grounding.indra.bio/ground', json={'text': 'kras'})\n```\n\nThe web service also supports multiple inputs in a single request on the\n`ground_multi` endpoint, for instance\n\n```python\nimport requests\nrequests.post('http://grounding.indra.bio/ground_multi',\n              json=[\n                  {'text': 'braf'},\n                  {'text': 'ER', 'context': 'endoplasmic reticulum (ER) is a cellular component'}\n              ]\n          )\n```\n\n## Resource usage\nGilda loads grounding terms into memory when first used. If memory usage\nis an issue, the following options are recommended.\n\n1. Run a single instance of Gilda as a local web service that one or more\nother processes send requests to.\n\n2. Create a custom Grounder instance that only loads a subset of terms\nappropriate for a narrow use case.\n\n3. Gilda also offers an optional sqlite back-end which significantly decreases\nmemory usage and results in minor drop in the number of strings grounder per\nunit time. The sqlite back-end database can be built as follows with an\noptional `[db_path]` argument, which if used, should use the .db extension. If\nnot specified, the .db file is generated in Gilda's default resource folder.\n\n```bash\npython -m gilda.resources.sqlite_adapter [db_path]\n```\n\nA Grounder instance can then be instantiated as follows:\n\n```python\nfrom gilda.grounder import Grounder\ngr = Grounder(db_path)\nmatches = gr.ground('kras')\n```\n\n## Run web service with Docker\n\nAfter cloning the repository locally, you can build and run a Docker image\nof Gilda using the following commands:\n\n```shell\n$ docker build -t gilda:latest .\n$ docker run -d -p 8001:8001 gilda:latest\n```\n\nAlternatively, you can use `docker-compose` to do both the initial build and\nrun the container based on the `docker-compose.yml` configuration:\n\n```shell\n$ docker-compose up\n```\n\n## Default grounding resources\n\nGilda is customizable with terms coming from different vocabularies. However,\nGilda comes with a default set of resources from which terms are collected\n(almost 2 million entries as of v1.1.0), without any additional configuration\nneeded. These resources include:\n- [HGNC](https://bioregistry.io/hgnc) (human genes)\n- [UniProt](https://bioregistry.io/uniprot) (human and model organism proteins)\n- [FamPlex](https://bioregistry.io/famplex) (human protein families and complexes)\n- [CHeBI](https://bioregistry.io/chebi) (small molecules, metabolites, etc.)\n- [GO](https://bioregistry.io/go) (biological processes, molecular functions, complexes)\n- [DOID](https://bioregistry.io/doid) (diseases)\n- [EFO](https://bioregistry.io/efo) (experimental factors: cell lines, cell types, anatomical entities, etc.)\n- [HP](https://bioregistry.io/hp) (human phenotypes)\n- [MeSH](https://bioregistry.io/mesh) (general: diseases, proteins, small molecules, cell types, etc.)\n- [Adeft](https://github.com/gyorilab/adeft) (misc. terms corresponding to ambiguous acronyms)\n\n## Citation\n\n```bibtex\n@article{gyori2022gilda,\n    author = {Gyori, Benjamin M and Hoyt, Charles Tapley and Steppi, Albert},\n    title = \"{{Gilda: biomedical entity text normalization with machine-learned disambiguation as a service}}\",\n    journal = {Bioinformatics Advances},\n    year = {2022},\n    month = {05},\n    issn = {2635-0041},\n    doi = {10.1093/bioadv/vbac034},\n    url = {https://doi.org/10.1093/bioadv/vbac034},\n    note = {vbac034}\n}\n```\n\n## Funding\nThe development of Gilda was funded under the DARPA Communicating with Computers\nprogram (ARO grant W911NF-15-1-0544) and the DARPA Young Faculty Award\n(ARO grant W911NF-20-1-0255).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Grounding for biomedical entities with contextual disambiguation",
    "version": "1.2.0",
    "project_urls": {
        "Homepage": "https://github.com/indralab/gilda"
    },
    "split_keywords": [
        "nlp",
        " biology"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fa88c450be479c2b15aa3f1ac265a773b80974b4c8df09467f8ac62e3460a067",
                "md5": "98d350cccc9baa9f661c74feaeec0db2",
                "sha256": "0ed74847d4f52cea9d48e38aa32c431e787f905676c1a5e111a9f59ae20e9696"
            },
            "downloads": -1,
            "filename": "gilda-1.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "98d350cccc9baa9f661c74feaeec0db2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 181689,
            "upload_time": "2024-04-30T20:06:54",
            "upload_time_iso_8601": "2024-04-30T20:06:54.190004Z",
            "url": "https://files.pythonhosted.org/packages/fa/88/c450be479c2b15aa3f1ac265a773b80974b4c8df09467f8ac62e3460a067/gilda-1.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dc836bbe9bd88c508f704dd81eb5fc33b4277a5312796f6d9940e4cf2dfa296f",
                "md5": "cb129c640823b99ffaad31756893f1b9",
                "sha256": "5ef630d6de7d03704ed57408ce5b8679f1f42e4e72ea3dfc0eaf4c50555686dd"
            },
            "downloads": -1,
            "filename": "gilda-1.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "cb129c640823b99ffaad31756893f1b9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 179123,
            "upload_time": "2024-04-30T20:07:00",
            "upload_time_iso_8601": "2024-04-30T20:07:00.635272Z",
            "url": "https://files.pythonhosted.org/packages/dc/83/6bbe9bd88c508f704dd81eb5fc33b4277a5312796f6d9940e4cf2dfa296f/gilda-1.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-30 20:07:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "indralab",
    "github_project": "gilda",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "gilda"
}
        
Elapsed time: 0.34724s