# Gilda: Grounding Integrating Learned Disambiguation
[![License](https://img.shields.io/badge/License-BSD%202--Clause-orange.svg)](https://opensource.org/licenses/BSD-2-Clause)
[![Build](https://github.com/indralab/gilda/actions/workflows/tests.yml/badge.svg)](https://github.com/indralab/gilda/actions)
[![Documentation](https://readthedocs.org/projects/gilda/badge/?version=latest)](https://gilda.readthedocs.io/en/latest/?badge=latest)
[![PyPI version](https://badge.fury.io/py/gilda.svg)](https://badge.fury.io/py/gilda)
[![DOI](https://img.shields.io/badge/DOI-10.1093/bioadv/vbac034-green.svg)](https://doi.org/10.1093/bioadv/vbac034)
Gilda is a Python package and REST service that grounds (i.e., finds
appropriate identifiers in various namespaces for) named entities in biomedical text.
Gyori BM, Hoyt CT, Steppi A (2022). Gilda: biomedical entity text normalization with machine-learned disambiguation as a service. Bioinformatics Advances, 2022; vbac034 [https://doi.org/10.1093/bioadv/vbac034](https://doi.org/10.1093/bioadv/vbac034).
## Installation
Gilda is deployed as a web service at http://grounding.indra.bio/ (see
Usage instructions below), however, it can also be used locally as a Python
package.
The recommended method to install Gilda is through PyPI as
```bash
pip install gilda
```
Note that Gilda uses a single large resource file for grounding, which is
automatically downloaded into the `~/.data/gilda/<version>` folder during
runtime (see [pystow](https://github.com/cthoyt/pystow#%EF%B8%8F%EF%B8%8F-configuration) for options to
configure the location of this folder).
Given some additional dependencies, the grounding resource file can
also be regenerated locally by running `python -m gilda.generate_terms`.
## Documentation and notebooks
Documentation for Gilda is available [here](https://gilda.readthedocs.io).
We also provide several interactive Jupyter notebooks to help use and customize Gilda:
- [Gilda Introduction](https://github.com/indralab/gilda/blob/master/notebooks/gilda_introduction.ipynb) provides an interactive tutorial for using Gilda.
- [Custom Grounders](https://github.com/indralab/gilda/blob/master/notebooks/custom_grounders.ipynb) shows several examples of how Gilda can be instantiated with custom
grounding resources.
- [Model Training](https://github.com/indralab/gilda/blob/master/models/model_training.ipynb) provides interactive sample code for training
new disambiguation models.
## Usage
Gilda can either be used as a REST web service or used programmatically
via its Python API. An introduction Jupyter notebook for using Gilda
is available at
https://github.com/indralab/gilda/blob/master/notebooks/gilda_introduction.ipynb
### Use as a Python package
For using Gilda as a Python package, the documentation at
http://gilda.readthedocs.org provides detailed descriptions of each module of
Gilda and their usage. A basic usage example for named entity normalization (NEN),
or _grounding_ is as follows:
```python
import gilda
scored_matches = gilda.ground('ER', context='Calcium is released from the ER.')
```
Gilda also implements a simple dictionary-based named entity recognition (NER)
algorithm that can be used as follows:
```python
import gilda
results = gilda.annotate('Calcium is released from the ER.')
```
### Use as a web service
The REST service accepts POST requests with a JSON header on the `/ground`
endpoint. There is a public REST service running at http://grounding.indra.bio
but the service can also be run locally as
```bash
python -m gilda.app
```
which, by default, launches the server at `localhost:8001` (for local usage
replace the URL in the examples below with this address).
Below is an example request using `curl`:
```bash
curl -X POST -H "Content-Type: application/json" -d '{"text": "kras"}' http://grounding.indra.bio/ground
```
The same request using Python's request package would be as follows:
```python
import requests
requests.post('http://grounding.indra.bio/ground', json={'text': 'kras'})
```
The web service also supports multiple inputs in a single request on the
`ground_multi` endpoint, for instance
```python
import requests
requests.post('http://grounding.indra.bio/ground_multi',
json=[
{'text': 'braf'},
{'text': 'ER', 'context': 'endoplasmic reticulum (ER) is a cellular component'}
]
)
```
## Resource usage
Gilda loads grounding terms into memory when first used. If memory usage
is an issue, the following options are recommended.
1. Run a single instance of Gilda as a local web service that one or more
other processes send requests to.
2. Create a custom Grounder instance that only loads a subset of terms
appropriate for a narrow use case.
3. Gilda also offers an optional sqlite back-end which significantly decreases
memory usage and results in minor drop in the number of strings grounder per
unit time. The sqlite back-end database can be built as follows with an
optional `[db_path]` argument, which if used, should use the .db extension. If
not specified, the .db file is generated in Gilda's default resource folder.
```bash
python -m gilda.resources.sqlite_adapter [db_path]
```
A Grounder instance can then be instantiated as follows:
```python
from gilda.grounder import Grounder
gr = Grounder(db_path)
matches = gr.ground('kras')
```
## Run web service with Docker
After cloning the repository locally, you can build and run a Docker image
of Gilda using the following commands:
```shell
$ docker build -t gilda:latest .
$ docker run -d -p 8001:8001 gilda:latest
```
Alternatively, you can use `docker-compose` to do both the initial build and
run the container based on the `docker-compose.yml` configuration:
```shell
$ docker-compose up
```
## Default grounding resources
Gilda is customizable with terms coming from different vocabularies. However,
Gilda comes with a default set of resources from which terms are collected
(almost 2 million entries as of v1.1.0), without any additional configuration
needed. These resources include:
- [HGNC](https://bioregistry.io/hgnc) (human genes)
- [UniProt](https://bioregistry.io/uniprot) (human and model organism proteins)
- [FamPlex](https://bioregistry.io/famplex) (human protein families and complexes)
- [CHeBI](https://bioregistry.io/chebi) (small molecules, metabolites, etc.)
- [GO](https://bioregistry.io/go) (biological processes, molecular functions, complexes)
- [DOID](https://bioregistry.io/doid) (diseases)
- [EFO](https://bioregistry.io/efo) (experimental factors: cell lines, cell types, anatomical entities, etc.)
- [HP](https://bioregistry.io/hp) (human phenotypes)
- [MeSH](https://bioregistry.io/mesh) (general: diseases, proteins, small molecules, cell types, etc.)
- [Adeft](https://github.com/gyorilab/adeft) (misc. terms corresponding to ambiguous acronyms)
## Citation
```bibtex
@article{gyori2022gilda,
author = {Gyori, Benjamin M and Hoyt, Charles Tapley and Steppi, Albert},
title = "{{Gilda: biomedical entity text normalization with machine-learned disambiguation as a service}}",
journal = {Bioinformatics Advances},
year = {2022},
month = {05},
issn = {2635-0041},
doi = {10.1093/bioadv/vbac034},
url = {https://doi.org/10.1093/bioadv/vbac034},
note = {vbac034}
}
```
## Funding
The development of Gilda was funded under the DARPA Communicating with Computers
program (ARO grant W911NF-15-1-0544) and the DARPA Young Faculty Award
(ARO grant W911NF-20-1-0255).
Raw data
{
"_id": null,
"home_page": "https://github.com/indralab/gilda",
"name": "gilda",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "nlp, biology",
"author": "Benjamin M. Gyori, Harvard Medical School",
"author_email": "benjamin_gyori@hms.harvard.edu",
"download_url": "https://files.pythonhosted.org/packages/ee/f5/73f4351368f383c0fa1a11513213d64f6178aaf6f4944b364f12102beb77/gilda-1.4.0.tar.gz",
"platform": null,
"description": "# Gilda: Grounding Integrating Learned Disambiguation\n[![License](https://img.shields.io/badge/License-BSD%202--Clause-orange.svg)](https://opensource.org/licenses/BSD-2-Clause)\n[![Build](https://github.com/indralab/gilda/actions/workflows/tests.yml/badge.svg)](https://github.com/indralab/gilda/actions)\n[![Documentation](https://readthedocs.org/projects/gilda/badge/?version=latest)](https://gilda.readthedocs.io/en/latest/?badge=latest)\n[![PyPI version](https://badge.fury.io/py/gilda.svg)](https://badge.fury.io/py/gilda)\n[![DOI](https://img.shields.io/badge/DOI-10.1093/bioadv/vbac034-green.svg)](https://doi.org/10.1093/bioadv/vbac034)\n\nGilda is a Python package and REST service that grounds (i.e., finds\nappropriate identifiers in various namespaces for) named entities in biomedical text.\n\nGyori BM, Hoyt CT, Steppi A (2022). Gilda: biomedical entity text normalization with machine-learned disambiguation as a service. Bioinformatics Advances, 2022; vbac034 [https://doi.org/10.1093/bioadv/vbac034](https://doi.org/10.1093/bioadv/vbac034).\n\n## Installation\nGilda is deployed as a web service at http://grounding.indra.bio/ (see\nUsage instructions below), however, it can also be used locally as a Python\npackage.\n\nThe recommended method to install Gilda is through PyPI as\n```bash\npip install gilda\n```\nNote that Gilda uses a single large resource file for grounding, which is\nautomatically downloaded into the `~/.data/gilda/<version>` folder during\nruntime (see [pystow](https://github.com/cthoyt/pystow#%EF%B8%8F%EF%B8%8F-configuration) for options to\nconfigure the location of this folder).\n\nGiven some additional dependencies, the grounding resource file can\nalso be regenerated locally by running `python -m gilda.generate_terms`.\n\n## Documentation and notebooks\nDocumentation for Gilda is available [here](https://gilda.readthedocs.io).\nWe also provide several interactive Jupyter notebooks to help use and customize Gilda:\n- [Gilda Introduction](https://github.com/indralab/gilda/blob/master/notebooks/gilda_introduction.ipynb) provides an interactive tutorial for using Gilda.\n- [Custom Grounders](https://github.com/indralab/gilda/blob/master/notebooks/custom_grounders.ipynb) shows several examples of how Gilda can be instantiated with custom\ngrounding resources.\n- [Model Training](https://github.com/indralab/gilda/blob/master/models/model_training.ipynb) provides interactive sample code for training\nnew disambiguation models.\n\n## Usage\nGilda can either be used as a REST web service or used programmatically\nvia its Python API. An introduction Jupyter notebook for using Gilda\nis available at\nhttps://github.com/indralab/gilda/blob/master/notebooks/gilda_introduction.ipynb\n\n### Use as a Python package\nFor using Gilda as a Python package, the documentation at\nhttp://gilda.readthedocs.org provides detailed descriptions of each module of\nGilda and their usage. A basic usage example for named entity normalization (NEN),\nor _grounding_ is as follows:\n\n```python\nimport gilda\nscored_matches = gilda.ground('ER', context='Calcium is released from the ER.')\n```\n\nGilda also implements a simple dictionary-based named entity recognition (NER)\nalgorithm that can be used as follows:\n\n```python\nimport gilda\nresults = gilda.annotate('Calcium is released from the ER.')\n```\n\n### Use as a web service\n\nThe REST service accepts POST requests with a JSON header on the `/ground`\nendpoint. There is a public REST service running at http://grounding.indra.bio\nbut the service can also be run locally as\n\n```bash\npython -m gilda.app\n```\nwhich, by default, launches the server at `localhost:8001` (for local usage\nreplace the URL in the examples below with this address).\n\nBelow is an example request using `curl`:\n\n```bash\ncurl -X POST -H \"Content-Type: application/json\" -d '{\"text\": \"kras\"}' http://grounding.indra.bio/ground\n```\n\nThe same request using Python's request package would be as follows:\n\n```python\nimport requests\nrequests.post('http://grounding.indra.bio/ground', json={'text': 'kras'})\n```\n\nThe web service also supports multiple inputs in a single request on the\n`ground_multi` endpoint, for instance\n\n```python\nimport requests\nrequests.post('http://grounding.indra.bio/ground_multi',\n json=[\n {'text': 'braf'},\n {'text': 'ER', 'context': 'endoplasmic reticulum (ER) is a cellular component'}\n ]\n )\n```\n\n## Resource usage\nGilda loads grounding terms into memory when first used. If memory usage\nis an issue, the following options are recommended.\n\n1. Run a single instance of Gilda as a local web service that one or more\nother processes send requests to.\n\n2. Create a custom Grounder instance that only loads a subset of terms\nappropriate for a narrow use case.\n\n3. Gilda also offers an optional sqlite back-end which significantly decreases\nmemory usage and results in minor drop in the number of strings grounder per\nunit time. The sqlite back-end database can be built as follows with an\noptional `[db_path]` argument, which if used, should use the .db extension. If\nnot specified, the .db file is generated in Gilda's default resource folder.\n\n```bash\npython -m gilda.resources.sqlite_adapter [db_path]\n```\n\nA Grounder instance can then be instantiated as follows:\n\n```python\nfrom gilda.grounder import Grounder\ngr = Grounder(db_path)\nmatches = gr.ground('kras')\n```\n\n## Run web service with Docker\n\nAfter cloning the repository locally, you can build and run a Docker image\nof Gilda using the following commands:\n\n```shell\n$ docker build -t gilda:latest .\n$ docker run -d -p 8001:8001 gilda:latest\n```\n\nAlternatively, you can use `docker-compose` to do both the initial build and\nrun the container based on the `docker-compose.yml` configuration:\n\n```shell\n$ docker-compose up\n```\n\n## Default grounding resources\n\nGilda is customizable with terms coming from different vocabularies. However,\nGilda comes with a default set of resources from which terms are collected\n(almost 2 million entries as of v1.1.0), without any additional configuration\nneeded. These resources include:\n- [HGNC](https://bioregistry.io/hgnc) (human genes)\n- [UniProt](https://bioregistry.io/uniprot) (human and model organism proteins)\n- [FamPlex](https://bioregistry.io/famplex) (human protein families and complexes)\n- [CHeBI](https://bioregistry.io/chebi) (small molecules, metabolites, etc.)\n- [GO](https://bioregistry.io/go) (biological processes, molecular functions, complexes)\n- [DOID](https://bioregistry.io/doid) (diseases)\n- [EFO](https://bioregistry.io/efo) (experimental factors: cell lines, cell types, anatomical entities, etc.)\n- [HP](https://bioregistry.io/hp) (human phenotypes)\n- [MeSH](https://bioregistry.io/mesh) (general: diseases, proteins, small molecules, cell types, etc.)\n- [Adeft](https://github.com/gyorilab/adeft) (misc. terms corresponding to ambiguous acronyms)\n\n## Citation\n\n```bibtex\n@article{gyori2022gilda,\n author = {Gyori, Benjamin M and Hoyt, Charles Tapley and Steppi, Albert},\n title = \"{{Gilda: biomedical entity text normalization with machine-learned disambiguation as a service}}\",\n journal = {Bioinformatics Advances},\n year = {2022},\n month = {05},\n issn = {2635-0041},\n doi = {10.1093/bioadv/vbac034},\n url = {https://doi.org/10.1093/bioadv/vbac034},\n note = {vbac034}\n}\n```\n\n## Funding\nThe development of Gilda was funded under the DARPA Communicating with Computers\nprogram (ARO grant W911NF-15-1-0544) and the DARPA Young Faculty Award\n(ARO grant W911NF-20-1-0255).\n",
"bugtrack_url": null,
"license": null,
"summary": "Grounding for biomedical entities with contextual disambiguation",
"version": "1.4.0",
"project_urls": {
"Homepage": "https://github.com/indralab/gilda"
},
"split_keywords": [
"nlp",
" biology"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "925b24dd317c5cbc7524020641d6098958d17eee4376e1993713e962060b0049",
"md5": "b28dadd46e1bfffa8a2fd8cd2f9f5043",
"sha256": "ee83f5e44755fa2da9c1c14e8018a0b6fcf988b756289bb50fa2aa51392ec131"
},
"downloads": -1,
"filename": "gilda-1.4.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b28dadd46e1bfffa8a2fd8cd2f9f5043",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 885539,
"upload_time": "2024-10-29T02:09:12",
"upload_time_iso_8601": "2024-10-29T02:09:12.672133Z",
"url": "https://files.pythonhosted.org/packages/92/5b/24dd317c5cbc7524020641d6098958d17eee4376e1993713e962060b0049/gilda-1.4.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "eef573f4351368f383c0fa1a11513213d64f6178aaf6f4944b364f12102beb77",
"md5": "7bcef8b46d67f8d56feb9ee30dc5827f",
"sha256": "d008249cbfa90f51e795751ff29c61a1166f853fba43089e97a7e4c79626ce6c"
},
"downloads": -1,
"filename": "gilda-1.4.0.tar.gz",
"has_sig": false,
"md5_digest": "7bcef8b46d67f8d56feb9ee30dc5827f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 10788144,
"upload_time": "2024-10-29T02:09:16",
"upload_time_iso_8601": "2024-10-29T02:09:16.835450Z",
"url": "https://files.pythonhosted.org/packages/ee/f5/73f4351368f383c0fa1a11513213d64f6178aaf6f4944b364f12102beb77/gilda-1.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-29 02:09:16",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "indralab",
"github_project": "gilda",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "gilda"
}