biolexica


Namebiolexica JSON
Version 0.0.5 PyPI version JSON
download
home_pagehttps://github.com/biopragmatics/biolexica
SummaryGenerate and apply coherent biomedical lexica
upload_time2024-02-05 17:30:29
maintainerCharles Tapley Hoyt
docs_urlNone
authorCharles Tapley Hoyt
requires_python>=3.8
licenseMIT
keywords snekpack cookiecutter
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <!--
<p align="center">
  <img src="https://github.com/biopragmatics/biolexica/raw/main/docs/source/logo.png" height="150">
</p>
-->

<h1 align="center">
  Biolexica
</h1>

<p align="center">
    <a href="https://github.com/biopragmatics/biolexica/actions/workflows/tests.yml">
        <img alt="Tests" src="https://github.com/biopragmatics/biolexica/actions/workflows/tests.yml/badge.svg" /></a>
    <a href="https://pypi.org/project/biolexica">
        <img alt="PyPI" src="https://img.shields.io/pypi/v/biolexica" /></a>
    <a href="https://pypi.org/project/biolexica">
        <img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/biolexica" /></a>
    <a href="https://github.com/biopragmatics/biolexica/blob/main/LICENSE">
        <img alt="PyPI - License" src="https://img.shields.io/pypi/l/biolexica" /></a>
    <a href='https://biolexica.readthedocs.io/en/latest/?badge=latest'>
        <img src='https://readthedocs.org/projects/biolexica/badge/?version=latest' alt='Documentation Status' /></a>
    <a href="https://codecov.io/gh/biopragmatics/biolexica/branch/main">
        <img src="https://codecov.io/gh/biopragmatics/biolexica/branch/main/graph/badge.svg" alt="Codecov status" /></a>  
    <a href="https://github.com/cthoyt/cookiecutter-python-package">
        <img alt="Cookiecutter template from @cthoyt" src="https://img.shields.io/badge/Cookiecutter-snekpack-blue" /></a>
    <a href='https://github.com/psf/black'>
        <img src='https://img.shields.io/badge/code%20style-black-000000.svg' alt='Code style: black' /></a>
    <a href="https://github.com/biopragmatics/biolexica/blob/main/.github/CODE_OF_CONDUCT.md">
        <img src="https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg" alt="Contributor Covenant"/></a>
</p>

`biolexica` helps generate and apply coherent biomedical lexica. It takes care of the following:

1. Getting names and synonyms from a diverse set of inputs (ontologies, databases, custom)
   using [`pyobo`](https://github.com/pyobo/pyobo), [`bioontologies`](https://github.com/biopragmatics/bioontologies), 
   [`biosynonyms`](https://github.com/biopragmatics/biosynonyms), and more.
2. Merging equivalent terms to best take advantage of different synonyms for the same term from different sources
   using [`semra`](https://github.com/biopragmatics/semra).
3. Generating lexical index and doing NER using [Gilda](https://github.com/gyorilab/gilda)

Importantly, we pre-define lexica for several entity types that can be readily used with Gilda in
the [`lexica/`](lexica/) folder including:

1. [Cells and cell lines](lexica/cell)
2. [Diseases, conditions, and other phenotypes](lexica/phenotype)
3. [Anatomical terms, tissues, organ systems, etc.](lexica/anatomy)

## Getting Started

Load a pre-defined grounder like this:

```python
import biolexica

grounder = biolexica.load_grounder("phenotype")

>>> grounder.get_best_match("Alzheimer's disease")
Match(reference=Reference(prefix='doid', identifier='10652'), name="Alzheimer's disease", score=0.7778)

>>> grounder.annotate("Clinical trials for reducing Aβ levels in Alzheimer's disease have been controversial.")
[Annotation(text="Alzheimer's disease", start=42, end=61, match=Match(reference=Reference(prefix='doid', identifier='10652'), name="Alzheimer's disease", score=0.7339))]
```

Note: Biolexica constructs extended version of `gilda.Grounder` that has convenience functions and a more
simple match data model encoded with Pydantic.

Search PubMed for abstracts and annotate them using a given grounder with:

```python
import biolexica
from biolexica.literature import annotate_abstracts_from_search

grounder = biolexica.load_grounder("phenotype")
pubmed_query = "alzheimer's disease"
annotations = annotate_abstracts_from_search(pubmed_query, grounder=grounder, limit=30)
```

## 🚀 Installation

The most recent release can be installed from
[PyPI](https://pypi.org/project/biolexica/) with:

```shell
pip install biolexica
```

The most recent code and data can be installed directly from GitHub with:

```shell
pip install git+https://github.com/biopragmatics/biolexica.git
```

## 👐 Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See
[CONTRIBUTING.md](https://github.com/biopragmatics/biolexica/blob/master/.github/CONTRIBUTING.md) for more information
on getting involved.

## 👋 Attribution

### ⚖️ License

The code in this package is licensed under the MIT License.

<!--
### 📖 Citation

Citation goes here!
-->

<!--
### 💰 Funding

This project has been supported by the following grants:

| Funding Body                                             | Program                                                                                                                       | Grant           |
|----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|-----------------|
| DARPA                                                    | [Automating Scientific Knowledge Extraction (ASKE)](https://www.darpa.mil/program/automating-scientific-knowledge-extraction) | HR00111990009   |
-->

### 🍪 Cookiecutter

This package was created with [@audreyfeldroy](https://github.com/audreyfeldroy)'s
[cookiecutter](https://github.com/cookiecutter/cookiecutter) package using [@cthoyt](https://github.com/cthoyt)'s
[cookiecutter-snekpack](https://github.com/cthoyt/cookiecutter-snekpack) template.

## 🛠️ For Developers

<details>
  <summary>See developer instructions</summary>

The final section of the README is for if you want to get involved by making a code contribution.

### Development Installation

To install in development mode, use the following:

```bash
git clone git+https://github.com/biopragmatics/biolexica.git
cd biolexica
pip install -e .
```

### 🥼 Testing

After cloning the repository and installing `tox` with `pip install tox`, the unit tests in the `tests/` folder can be
run reproducibly with:

```shell
tox
```

Additionally, these tests are automatically re-run with each commit in a
[GitHub Action](https://github.com/biopragmatics/biolexica/actions?query=workflow%3ATests).

### 📖 Building the Documentation

The documentation can be built locally using the following:

```shell
git clone git+https://github.com/biopragmatics/biolexica.git
cd biolexica
tox -e docs
open docs/build/html/index.html
``` 

The documentation automatically installs the package as well as the `docs`
extra specified in the [`setup.cfg`](setup.cfg). `sphinx` plugins
like `texext` can be added there. Additionally, they need to be added to the
`extensions` list in [`docs/source/conf.py`](docs/source/conf.py).

The documentation can be deployed to [ReadTheDocs](https://readthedocs.io) using
[this guide](https://docs.readthedocs.io/en/stable/intro/import-guide.html).
The [`.readthedocs.yml`](.readthedocs.yml) YAML file contains all the configuration you'll need.
You can also set up continuous integration on GitHub to check not only that
Sphinx can build the documentation in an isolated environment (i.e., with ``tox -e docs-test``)
but also that [ReadTheDocs can build it too](https://docs.readthedocs.io/en/stable/pull-requests.html).

### 📦 Making a Release

After installing the package in development mode and installing
`tox` with `pip install tox`, the commands for making a new release are contained within the `finish` environment
in `tox.ini`. Run the following from the shell:

```shell
tox -e finish
```

This script does the following:

1. Uses [Bump2Version](https://github.com/c4urself/bump2version) to switch the version number in the `setup.cfg`,
   `src/biolexica/version.py`, and [`docs/source/conf.py`](docs/source/conf.py) to not have the `-dev` suffix
2. Packages the code in both a tar archive and a wheel using [`build`](https://github.com/pypa/build)
3. Uploads to PyPI using [`twine`](https://github.com/pypa/twine). Be sure to have a `.pypirc` file
   configured to avoid the need for manual input at this step
4. Push to GitHub. You'll need to make a release going with the commit where the version was bumped.
5. Bump the version to the next patch. If you made big changes and want to bump the version by minor, you can
   use `tox -e bumpversion -- minor` after.

</details>

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/biopragmatics/biolexica",
    "name": "biolexica",
    "maintainer": "Charles Tapley Hoyt",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "cthoyt@gmail.com",
    "keywords": "snekpack,cookiecutter",
    "author": "Charles Tapley Hoyt",
    "author_email": "cthoyt@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f6/ec/8303f2925e19b56cd0b44c0dcefa445a0b7b18325a0a9ad192445eaf494c/biolexica-0.0.5.tar.gz",
    "platform": null,
    "description": "<!--\n<p align=\"center\">\n  <img src=\"https://github.com/biopragmatics/biolexica/raw/main/docs/source/logo.png\" height=\"150\">\n</p>\n-->\n\n<h1 align=\"center\">\n  Biolexica\n</h1>\n\n<p align=\"center\">\n    <a href=\"https://github.com/biopragmatics/biolexica/actions/workflows/tests.yml\">\n        <img alt=\"Tests\" src=\"https://github.com/biopragmatics/biolexica/actions/workflows/tests.yml/badge.svg\" /></a>\n    <a href=\"https://pypi.org/project/biolexica\">\n        <img alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/biolexica\" /></a>\n    <a href=\"https://pypi.org/project/biolexica\">\n        <img alt=\"PyPI - Python Version\" src=\"https://img.shields.io/pypi/pyversions/biolexica\" /></a>\n    <a href=\"https://github.com/biopragmatics/biolexica/blob/main/LICENSE\">\n        <img alt=\"PyPI - License\" src=\"https://img.shields.io/pypi/l/biolexica\" /></a>\n    <a href='https://biolexica.readthedocs.io/en/latest/?badge=latest'>\n        <img src='https://readthedocs.org/projects/biolexica/badge/?version=latest' alt='Documentation Status' /></a>\n    <a href=\"https://codecov.io/gh/biopragmatics/biolexica/branch/main\">\n        <img src=\"https://codecov.io/gh/biopragmatics/biolexica/branch/main/graph/badge.svg\" alt=\"Codecov status\" /></a>  \n    <a href=\"https://github.com/cthoyt/cookiecutter-python-package\">\n        <img alt=\"Cookiecutter template from @cthoyt\" src=\"https://img.shields.io/badge/Cookiecutter-snekpack-blue\" /></a>\n    <a href='https://github.com/psf/black'>\n        <img src='https://img.shields.io/badge/code%20style-black-000000.svg' alt='Code style: black' /></a>\n    <a href=\"https://github.com/biopragmatics/biolexica/blob/main/.github/CODE_OF_CONDUCT.md\">\n        <img src=\"https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg\" alt=\"Contributor Covenant\"/></a>\n</p>\n\n`biolexica` helps generate and apply coherent biomedical lexica. It takes care of the following:\n\n1. Getting names and synonyms from a diverse set of inputs (ontologies, databases, custom)\n   using [`pyobo`](https://github.com/pyobo/pyobo), [`bioontologies`](https://github.com/biopragmatics/bioontologies), \n   [`biosynonyms`](https://github.com/biopragmatics/biosynonyms), and more.\n2. Merging equivalent terms to best take advantage of different synonyms for the same term from different sources\n   using [`semra`](https://github.com/biopragmatics/semra).\n3. Generating lexical index and doing NER using [Gilda](https://github.com/gyorilab/gilda)\n\nImportantly, we pre-define lexica for several entity types that can be readily used with Gilda in\nthe [`lexica/`](lexica/) folder including:\n\n1. [Cells and cell lines](lexica/cell)\n2. [Diseases, conditions, and other phenotypes](lexica/phenotype)\n3. [Anatomical terms, tissues, organ systems, etc.](lexica/anatomy)\n\n## Getting Started\n\nLoad a pre-defined grounder like this:\n\n```python\nimport biolexica\n\ngrounder = biolexica.load_grounder(\"phenotype\")\n\n>>> grounder.get_best_match(\"Alzheimer's disease\")\nMatch(reference=Reference(prefix='doid', identifier='10652'), name=\"Alzheimer's disease\", score=0.7778)\n\n>>> grounder.annotate(\"Clinical trials for reducing A\u03b2 levels in Alzheimer's disease have been controversial.\")\n[Annotation(text=\"Alzheimer's disease\", start=42, end=61, match=Match(reference=Reference(prefix='doid', identifier='10652'), name=\"Alzheimer's disease\", score=0.7339))]\n```\n\nNote: Biolexica constructs extended version of `gilda.Grounder` that has convenience functions and a more\nsimple match data model encoded with Pydantic.\n\nSearch PubMed for abstracts and annotate them using a given grounder with:\n\n```python\nimport biolexica\nfrom biolexica.literature import annotate_abstracts_from_search\n\ngrounder = biolexica.load_grounder(\"phenotype\")\npubmed_query = \"alzheimer's disease\"\nannotations = annotate_abstracts_from_search(pubmed_query, grounder=grounder, limit=30)\n```\n\n## \ud83d\ude80 Installation\n\nThe most recent release can be installed from\n[PyPI](https://pypi.org/project/biolexica/) with:\n\n```shell\npip install biolexica\n```\n\nThe most recent code and data can be installed directly from GitHub with:\n\n```shell\npip install git+https://github.com/biopragmatics/biolexica.git\n```\n\n## \ud83d\udc50 Contributing\n\nContributions, whether filing an issue, making a pull request, or forking, are appreciated. See\n[CONTRIBUTING.md](https://github.com/biopragmatics/biolexica/blob/master/.github/CONTRIBUTING.md) for more information\non getting involved.\n\n## \ud83d\udc4b Attribution\n\n### \u2696\ufe0f License\n\nThe code in this package is licensed under the MIT License.\n\n<!--\n### \ud83d\udcd6 Citation\n\nCitation goes here!\n-->\n\n<!--\n### \ud83d\udcb0 Funding\n\nThis project has been supported by the following grants:\n\n| Funding Body                                             | Program                                                                                                                       | Grant           |\n|----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|-----------------|\n| DARPA                                                    | [Automating Scientific Knowledge Extraction (ASKE)](https://www.darpa.mil/program/automating-scientific-knowledge-extraction) | HR00111990009   |\n-->\n\n### \ud83c\udf6a Cookiecutter\n\nThis package was created with [@audreyfeldroy](https://github.com/audreyfeldroy)'s\n[cookiecutter](https://github.com/cookiecutter/cookiecutter) package using [@cthoyt](https://github.com/cthoyt)'s\n[cookiecutter-snekpack](https://github.com/cthoyt/cookiecutter-snekpack) template.\n\n## \ud83d\udee0\ufe0f For Developers\n\n<details>\n  <summary>See developer instructions</summary>\n\nThe final section of the README is for if you want to get involved by making a code contribution.\n\n### Development Installation\n\nTo install in development mode, use the following:\n\n```bash\ngit clone git+https://github.com/biopragmatics/biolexica.git\ncd biolexica\npip install -e .\n```\n\n### \ud83e\udd7c Testing\n\nAfter cloning the repository and installing `tox` with `pip install tox`, the unit tests in the `tests/` folder can be\nrun reproducibly with:\n\n```shell\ntox\n```\n\nAdditionally, these tests are automatically re-run with each commit in a\n[GitHub Action](https://github.com/biopragmatics/biolexica/actions?query=workflow%3ATests).\n\n### \ud83d\udcd6 Building the Documentation\n\nThe documentation can be built locally using the following:\n\n```shell\ngit clone git+https://github.com/biopragmatics/biolexica.git\ncd biolexica\ntox -e docs\nopen docs/build/html/index.html\n``` \n\nThe documentation automatically installs the package as well as the `docs`\nextra specified in the [`setup.cfg`](setup.cfg). `sphinx` plugins\nlike `texext` can be added there. Additionally, they need to be added to the\n`extensions` list in [`docs/source/conf.py`](docs/source/conf.py).\n\nThe documentation can be deployed to [ReadTheDocs](https://readthedocs.io) using\n[this guide](https://docs.readthedocs.io/en/stable/intro/import-guide.html).\nThe [`.readthedocs.yml`](.readthedocs.yml) YAML file contains all the configuration you'll need.\nYou can also set up continuous integration on GitHub to check not only that\nSphinx can build the documentation in an isolated environment (i.e., with ``tox -e docs-test``)\nbut also that [ReadTheDocs can build it too](https://docs.readthedocs.io/en/stable/pull-requests.html).\n\n### \ud83d\udce6 Making a Release\n\nAfter installing the package in development mode and installing\n`tox` with `pip install tox`, the commands for making a new release are contained within the `finish` environment\nin `tox.ini`. Run the following from the shell:\n\n```shell\ntox -e finish\n```\n\nThis script does the following:\n\n1. Uses [Bump2Version](https://github.com/c4urself/bump2version) to switch the version number in the `setup.cfg`,\n   `src/biolexica/version.py`, and [`docs/source/conf.py`](docs/source/conf.py) to not have the `-dev` suffix\n2. Packages the code in both a tar archive and a wheel using [`build`](https://github.com/pypa/build)\n3. Uploads to PyPI using [`twine`](https://github.com/pypa/twine). Be sure to have a `.pypirc` file\n   configured to avoid the need for manual input at this step\n4. Push to GitHub. You'll need to make a release going with the commit where the version was bumped.\n5. Bump the version to the next patch. If you made big changes and want to bump the version by minor, you can\n   use `tox -e bumpversion -- minor` after.\n\n</details>\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Generate and apply coherent biomedical lexica",
    "version": "0.0.5",
    "project_urls": {
        "Documentation": "https://biolexica.readthedocs.io",
        "Download": "https://github.com/biopragmatics/biolexica/releases",
        "Homepage": "https://github.com/biopragmatics/biolexica",
        "Source": "https://github.com/biopragmatics/biolexica",
        "Tracker": "https://github.com/biopragmatics/biolexica/issues"
    },
    "split_keywords": [
        "snekpack",
        "cookiecutter"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "05aac859c68a9b0848e35fbebcbccd2ebb2586cef23ce254dc9593c9c63402c8",
                "md5": "0d3e9dae43be5ed9bdeede3ceb158892",
                "sha256": "d7bb2af69aa37528869b56475628f6977a9ebbdef8f3d2d7638e54086640eda4"
            },
            "downloads": -1,
            "filename": "biolexica-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0d3e9dae43be5ed9bdeede3ceb158892",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 18618,
            "upload_time": "2024-02-05T17:30:28",
            "upload_time_iso_8601": "2024-02-05T17:30:28.119983Z",
            "url": "https://files.pythonhosted.org/packages/05/aa/c859c68a9b0848e35fbebcbccd2ebb2586cef23ce254dc9593c9c63402c8/biolexica-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f6ec8303f2925e19b56cd0b44c0dcefa445a0b7b18325a0a9ad192445eaf494c",
                "md5": "e9139a7780b4b8095ca47474c1b2e084",
                "sha256": "1152d1984017eecb137282e365d52bec0cadf563500d291e4b5c27240ec49dfb"
            },
            "downloads": -1,
            "filename": "biolexica-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "e9139a7780b4b8095ca47474c1b2e084",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 24617,
            "upload_time": "2024-02-05T17:30:29",
            "upload_time_iso_8601": "2024-02-05T17:30:29.811967Z",
            "url": "https://files.pythonhosted.org/packages/f6/ec/8303f2925e19b56cd0b44c0dcefa445a0b7b18325a0a9ad192445eaf494c/biolexica-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-05 17:30:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "biopragmatics",
    "github_project": "biolexica",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "biolexica"
}
        
Elapsed time: 0.39411s