zensols.mednlp


Namezensols.mednlp JSON
Version 1.7.0 PyPI version JSON
download
home_pagehttps://github.com/plandes/mednlp
SummaryA natural language medical domain parsing library.
upload_time2024-04-14 20:16:22
maintainerNone
docs_urlNone
authorPaul Landes
requires_pythonNone
licenseNone
keywords tooling
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Medical natural language parsing and utility library

[![PyPI][pypi-badge]][pypi-link]
[![Python 3.10][python310-badge]][python310-link]
[![Python 3.11][python311-badge]][python311-link]
[![Build Status][build-badge]][build-link]

A natural language medical domain parsing library.  This library:

- Provides an interface to the [UTS] ([UMLS] Terminology Services) RESTful
  service with data caching (NIH login needed).
- Wraps the [MedCAT] library by parsing medical and clinical text into first
  class Python objects reflecting the structure of the natural language
  complete with [UMLS] entity linking with [CUIs] and other domain specific
  features.
- Combines non-medical (such as POS and NER tags) and medical features (such as
  [CUIs]) in one API and resulting data structure and/or as a [Pandas] data
  frame.
- Provides [cui2vec] as a [word embedding model] for either fast indexing and
  access or to use directly as features in a [Zensols Deep NLP embedding layer]
  model.
- Provides access to [cTAKES] using as a dictionary like [Stash] abstraction.
- Includes a command line program to access all of these features without
  having to write any code.


## Documentation

See the [full documentation](https://plandes.github.io/mednlp/index.html).
The [API reference](https://plandes.github.io/mednlp/api.html) is also
available.


## Obtaining

The easiest way to install the command line program is via the `pip` installer.
```bash
pip3 install zensols.mednlp
```

Binaries are also available on [pypi].


## Usage

To parse text, create features, and extract clinical concept identifiers:
```python
>>> from zensols.mednlp import ApplicationFactory
>>> doc_parser = ApplicationFactory.get_doc_parser()
>>> doc = doc_parser('John was diagnosed with kidney failure')
>>> for tok in doc.tokens: print(tok.norm, tok.pos_, tok.tag_, tok.cui_, tok.detected_name_)
John PROPN NNP -<N>- -<N>-
was AUX VBD -<N>- -<N>-
diagnosed VERB VBN -<N>- -<N>-
with ADP IN -<N>- -<N>-
kidney NOUN NN C0035078 kidney~failure
failure NOUN NN C0035078 kidney~failure
>>> print(doc.entities)
(<John>, <kidney failure>)
```
See the [full example](example/features/simple.py), and for other
functionality, see the [examples](example).


## MedCAT Models

By default, this library uses the small MedCAT model used for
[tutorials](https://github.com/CogStack/MedCATtutorials/pull/12), and is not
sufficient for any serious project.  To get the UMLS trained model,the [MedCAT
UMLS request form] from be filled out (see the [MedCAT] repository).

After you obtain access and download the new model, add the following to
`~/.mednlprc` with the following:

```ini
[medcat_status_resource]
url = file:///location/to/the/downloaded/file/umls_sm_wstatus_2021_oct.zip'
```


## Attribution

This API utilizes the following frameworks:

* [MedCAT]: used to extract information from Electronic Health Records (EHRs)
  and link it to biomedical ontologies like SNOMED-CT and UMLS.
* [cTAKES]: a natural language processing system for extraction of information
  from electronic medical record clinical free-text.
* [cui2vec]: a new set of (like word) embeddings for medical concepts learned
  using an extremely large collection of multimodal medical data.
* [Zensols Deep NLP library]: a deep learning utility library for natural
  language processing that aids in feature engineering and embedding layers.
* [ctakes-parser]: parses [cTAKES] output in to a [Pandas] data frame.


## Citation

If you use this project in your research please use the following BibTeX entry:

```bibtex
@inproceedings{landes-etal-2023-deepzensols,
    title = "{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility",
    author = "Landes, Paul  and
      Di Eugenio, Barbara  and
      Caragea, Cornelia",
    editor = "Tan, Liling  and
      Milajevs, Dmitrijs  and
      Chauhan, Geeticka  and
      Gwinnup, Jeremy  and
      Rippeth, Elijah",
    booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
    month = dec,
    year = "2023",
    address = "Singapore, Singapore",
    publisher = "Empirical Methods in Natural Language Processing",
    url = "https://aclanthology.org/2023.nlposs-1.16",
    pages = "141--146"
}
```


## Community

Please star the project and let me know how and where you use this API.
Contributions as pull requests, feedback and any input is welcome.


## Changelog

An extensive changelog is available [here](CHANGELOG.md).


## License

[MIT License](LICENSE.md)

Copyright (c) 2021 - 2023 Paul Landes


<!-- links -->
[pypi]: https://pypi.org/project/zensols.mednlp/
[pypi-link]: https://pypi.python.org/pypi/zensols.mednlp
[pypi-badge]: https://img.shields.io/pypi/v/zensols.mednlp.svg
[python310-badge]: https://img.shields.io/badge/python-3.10-blue.svg
[python310-link]: https://www.python.org/downloads/release/python-3100
[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg
[python311-link]: https://www.python.org/downloads/release/python-3110
[build-badge]: https://github.com/plandes/mednlp/workflows/CI/badge.svg
[build-link]: https://github.com/plandes/mednlp/actions

[MedCAT]: https://github.com/CogStack/MedCAT
[MedCAT UMLS request form]: https://uts.nlm.nih.gov/uts/login?service=https:%2F%2Fmedcat.rosalind.kcl.ac.uk%2Fauth-callback

[Pandas]: https://pandas.pydata.org
[ctakes-parser]: https://pypi.org/project/ctakes-parser

[UTS]: https://uts.nlm.nih.gov/uts/
[UMLS]: https://www.nlm.nih.gov/research/umls/
[CUIs]: https://www.nlm.nih.gov/research/umls/new_users/online_learning/Meta_005.html
[cui2vec]: https://arxiv.org/abs/1804.01486
[cTAKES]: https://ctakes.apache.org
[word embedding model]: https://plandes.github.io/deepnlp/api/zensols.deepnlp.embed.html#zensols.deepnlp.embed.domain.WordEmbedModel
[Zensols NLP parsing API]: https://plandes.github.io/nlparse/doc/feature-doc.html
[Zensols Deep NLP library]: https://github.com/plandes/deepnlp
[Zensols Deep NLP embedding layer]: https://plandes.github.io/deepnlp/api/zensols.deepnlp.layer.html#zensols.deepnlp.layer.embed.EmbeddingNetworkModule
[Stash]: https://plandes.github.io/util/api/zensols.persist.html#zensols.persist.domain.Stash

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/plandes/mednlp",
    "name": "zensols.mednlp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "tooling",
    "author": "Paul Landes",
    "author_email": "landes@mailc.net",
    "download_url": "https://github.com/plandes/mednlp/releases/download/v1.7.0/zensols.mednlp-1.7.0-py3-none-any.whl",
    "platform": null,
    "description": "# Medical natural language parsing and utility library\n\n[![PyPI][pypi-badge]][pypi-link]\n[![Python 3.10][python310-badge]][python310-link]\n[![Python 3.11][python311-badge]][python311-link]\n[![Build Status][build-badge]][build-link]\n\nA natural language medical domain parsing library.  This library:\n\n- Provides an interface to the [UTS] ([UMLS] Terminology Services) RESTful\n  service with data caching (NIH login needed).\n- Wraps the [MedCAT] library by parsing medical and clinical text into first\n  class Python objects reflecting the structure of the natural language\n  complete with [UMLS] entity linking with [CUIs] and other domain specific\n  features.\n- Combines non-medical (such as POS and NER tags) and medical features (such as\n  [CUIs]) in one API and resulting data structure and/or as a [Pandas] data\n  frame.\n- Provides [cui2vec] as a [word embedding model] for either fast indexing and\n  access or to use directly as features in a [Zensols Deep NLP embedding layer]\n  model.\n- Provides access to [cTAKES] using as a dictionary like [Stash] abstraction.\n- Includes a command line program to access all of these features without\n  having to write any code.\n\n\n## Documentation\n\nSee the [full documentation](https://plandes.github.io/mednlp/index.html).\nThe [API reference](https://plandes.github.io/mednlp/api.html) is also\navailable.\n\n\n## Obtaining\n\nThe easiest way to install the command line program is via the `pip` installer.\n```bash\npip3 install zensols.mednlp\n```\n\nBinaries are also available on [pypi].\n\n\n## Usage\n\nTo parse text, create features, and extract clinical concept identifiers:\n```python\n>>> from zensols.mednlp import ApplicationFactory\n>>> doc_parser = ApplicationFactory.get_doc_parser()\n>>> doc = doc_parser('John was diagnosed with kidney failure')\n>>> for tok in doc.tokens: print(tok.norm, tok.pos_, tok.tag_, tok.cui_, tok.detected_name_)\nJohn PROPN NNP -<N>- -<N>-\nwas AUX VBD -<N>- -<N>-\ndiagnosed VERB VBN -<N>- -<N>-\nwith ADP IN -<N>- -<N>-\nkidney NOUN NN C0035078 kidney~failure\nfailure NOUN NN C0035078 kidney~failure\n>>> print(doc.entities)\n(<John>, <kidney failure>)\n```\nSee the [full example](example/features/simple.py), and for other\nfunctionality, see the [examples](example).\n\n\n## MedCAT Models\n\nBy default, this library uses the small MedCAT model used for\n[tutorials](https://github.com/CogStack/MedCATtutorials/pull/12), and is not\nsufficient for any serious project.  To get the UMLS trained model,the [MedCAT\nUMLS request form] from be filled out (see the [MedCAT] repository).\n\nAfter you obtain access and download the new model, add the following to\n`~/.mednlprc` with the following:\n\n```ini\n[medcat_status_resource]\nurl = file:///location/to/the/downloaded/file/umls_sm_wstatus_2021_oct.zip'\n```\n\n\n## Attribution\n\nThis API utilizes the following frameworks:\n\n* [MedCAT]: used to extract information from Electronic Health Records (EHRs)\n  and link it to biomedical ontologies like SNOMED-CT and UMLS.\n* [cTAKES]: a natural language processing system for extraction of information\n  from electronic medical record clinical free-text.\n* [cui2vec]: a new set of (like word) embeddings for medical concepts learned\n  using an extremely large collection of multimodal medical data.\n* [Zensols Deep NLP library]: a deep learning utility library for natural\n  language processing that aids in feature engineering and embedding layers.\n* [ctakes-parser]: parses [cTAKES] output in to a [Pandas] data frame.\n\n\n## Citation\n\nIf you use this project in your research please use the following BibTeX entry:\n\n```bibtex\n@inproceedings{landes-etal-2023-deepzensols,\n    title = \"{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility\",\n    author = \"Landes, Paul  and\n      Di Eugenio, Barbara  and\n      Caragea, Cornelia\",\n    editor = \"Tan, Liling  and\n      Milajevs, Dmitrijs  and\n      Chauhan, Geeticka  and\n      Gwinnup, Jeremy  and\n      Rippeth, Elijah\",\n    booktitle = \"Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)\",\n    month = dec,\n    year = \"2023\",\n    address = \"Singapore, Singapore\",\n    publisher = \"Empirical Methods in Natural Language Processing\",\n    url = \"https://aclanthology.org/2023.nlposs-1.16\",\n    pages = \"141--146\"\n}\n```\n\n\n## Community\n\nPlease star the project and let me know how and where you use this API.\nContributions as pull requests, feedback and any input is welcome.\n\n\n## Changelog\n\nAn extensive changelog is available [here](CHANGELOG.md).\n\n\n## License\n\n[MIT License](LICENSE.md)\n\nCopyright (c) 2021 - 2023 Paul Landes\n\n\n<!-- links -->\n[pypi]: https://pypi.org/project/zensols.mednlp/\n[pypi-link]: https://pypi.python.org/pypi/zensols.mednlp\n[pypi-badge]: https://img.shields.io/pypi/v/zensols.mednlp.svg\n[python310-badge]: https://img.shields.io/badge/python-3.10-blue.svg\n[python310-link]: https://www.python.org/downloads/release/python-3100\n[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg\n[python311-link]: https://www.python.org/downloads/release/python-3110\n[build-badge]: https://github.com/plandes/mednlp/workflows/CI/badge.svg\n[build-link]: https://github.com/plandes/mednlp/actions\n\n[MedCAT]: https://github.com/CogStack/MedCAT\n[MedCAT UMLS request form]: https://uts.nlm.nih.gov/uts/login?service=https:%2F%2Fmedcat.rosalind.kcl.ac.uk%2Fauth-callback\n\n[Pandas]: https://pandas.pydata.org\n[ctakes-parser]: https://pypi.org/project/ctakes-parser\n\n[UTS]: https://uts.nlm.nih.gov/uts/\n[UMLS]: https://www.nlm.nih.gov/research/umls/\n[CUIs]: https://www.nlm.nih.gov/research/umls/new_users/online_learning/Meta_005.html\n[cui2vec]: https://arxiv.org/abs/1804.01486\n[cTAKES]: https://ctakes.apache.org\n[word embedding model]: https://plandes.github.io/deepnlp/api/zensols.deepnlp.embed.html#zensols.deepnlp.embed.domain.WordEmbedModel\n[Zensols NLP parsing API]: https://plandes.github.io/nlparse/doc/feature-doc.html\n[Zensols Deep NLP library]: https://github.com/plandes/deepnlp\n[Zensols Deep NLP embedding layer]: https://plandes.github.io/deepnlp/api/zensols.deepnlp.layer.html#zensols.deepnlp.layer.embed.EmbeddingNetworkModule\n[Stash]: https://plandes.github.io/util/api/zensols.persist.html#zensols.persist.domain.Stash\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A natural language medical domain parsing library.",
    "version": "1.7.0",
    "project_urls": {
        "Download": "https://github.com/plandes/mednlp/releases/download/v1.7.0/zensols.mednlp-1.7.0-py3-none-any.whl",
        "Homepage": "https://github.com/plandes/mednlp"
    },
    "split_keywords": [
        "tooling"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d06dc082ffcfd9e5a07a4c0d3635f5f72c343a10cd253784c351a88494cac0a0",
                "md5": "3d21f9561b266c31e14097f40f170cf2",
                "sha256": "adc737d3d4423f32401199e4786827842e392c2d53c0346c9829f3df5a54206c"
            },
            "downloads": -1,
            "filename": "zensols.mednlp-1.7.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3d21f9561b266c31e14097f40f170cf2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 32242,
            "upload_time": "2024-04-14T20:16:22",
            "upload_time_iso_8601": "2024-04-14T20:16:22.484409Z",
            "url": "https://files.pythonhosted.org/packages/d0/6d/c082ffcfd9e5a07a4c0d3635f5f72c343a10cd253784c351a88494cac0a0/zensols.mednlp-1.7.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-14 20:16:22",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "plandes",
    "github_project": "mednlp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "zensols.mednlp"
}
        
Elapsed time: 0.23911s