# Medical <img src="https://github.com/CogStack/cogstack-nlp/blob/main/media/cat-logo.png?raw=true" width=45> oncept Annotation Tool (version 2)
**There's a number of breaking changes in MedCAT v2 compared to v1.**
When moving from v1 to v2, please refer to the [migration guide](docs/migration_guide_v2.md).
Details on breaking are outlined [here](docs/breaking_changes.md).
[](https://github.com/CogStack/cogstack-nlp/actions/workflows/medcat-v2_main.yml/badge.svg?branch=main)
[](https://readthedocs.org/projects/cogstack-nlp/badge/?version=latest)
[](https://github.com/CogStack/cogstack-nlp/releases/latest)
<!-- [](https://pypi.org/project/medcat/) -->
MedCAT can be used to extract information from Electronic Health Records (EHRs) and link it to biomedical ontologies like SNOMED-CT, UMLS, or HPO (and potentially other ontologies).
Original paper for v1 on [arXiv](https://arxiv.org/abs/2010.01165).
**Official Docs [here](https://cogstack-nlp.readthedocs.io/)**
**Discussion Forum [discourse](https://discourse.cogstack.org/)**
## Available Models
As MedCAT v2 is still in Beta, we do not currently have any models publically available.
You can still use models for v1, however (see the [README](https://github.com/CogStack/cogstack-nlp/blob/main/medcat-v2/README.md) there).
If you wish you can also convert the v1 models into the v2 format (see tutorial (TODO + link)).
## News
- **MedCAT v2 beta** \[1. April 2025\] MedCATv2 beta 0.1.5 was released 1. April 2025.
<!-- - **Paper** van Es, B., Reteig, L.C., Tan, S.C. et al. [Negation detection in Dutch clinical texts: an evaluation of rule-based and machine learning methods](https://doi.org/10.1186/s12859-022-05130-x). BMC Bioinformatics 24, 10 (2023).
- **New tool in the Cogstack ecosystem \[19. December 2022\]** [Foresight -- Deep Generative Modelling of Patient Timelines using Electronic Health Records](https://arxiv.org/abs/2212.08072)
- **New Paper using MedCAT \[21. October 2022\]**: [A New Public Corpus for Clinical Section Identification: MedSecId.](https://aclanthology.org/2022.coling-1.326.pdf)
- **Major Change to the Permissions of Use \[4. August 2022\]** MedCAT now uses the [Elastic License 2.0](https://github.com/CogStack/MedCAT/pull/271/commits/c9f4e86116ec751a97c618c97dadaa23e1feb6bc). For further information please click [here.](https://www.elastic.co/licensing/elastic-license)
- **New Downloader \[15. March 2022\]**: You can now [download](https://uts.nlm.nih.gov/uts/login?service=https://medcat.rosalind.kcl.ac.uk/auth-callback) the latest SNOMED-CT and UMLS model packs via UMLS user authentication.
- **New Feature and Tutorial \[7. December 2021\]**: [Exploring Electronic Health Records with MedCAT and Neo4j](https://towardsdatascience.com/exploring-electronic-health-records-with-medcat-and-neo4j-f376c03d8eef)
- **New Minor Release \[20. October 2021\]** Introducing model packs, new faster multiprocessing for large datasets (100M+ documents) and improved MetaCAT.
- **New Release \[1. August 2021\]**: Upgraded MedCAT to use spaCy v3, new scispaCy models have to be downloaded - all old CDBs (compatble with MedCAT v1) will work without any changes.
- **New Feature and Tutorial \[8. July 2021\]**: [Integrating 🤗 Transformers with MedCAT for biomedical NER+L](https://towardsdatascience.com/integrating-transformers-with-medcat-for-biomedical-ner-l-8869c76762a)
- **General \[1. April 2021\]**: MedCAT is upgraded to v1, unforunately this introduces breaking changes with older models (MedCAT v0.4),
as well as potential problems with all code that used the MedCAT package. MedCAT v0.4 is available on the legacy
branch and will still be supported until 1. July 2021
(with respect to potential bug fixes), after it will still be available but not updated anymore.
- **Paper**: [What’s in a Summary? Laying the Groundwork for Advances in Hospital-Course Summarization](https://www.aclweb.org/anthology/2021.naacl-main.382.pdf)
- ([more...](https://github.com/CogStack/cogstack-nlp/blob/main/medcat-v2/media/news.md)) -->
## Installation
Currently MedCAT v2 is in Beta.
As such, you need to explicitly specify the beta release.
```
pip install medcat~=2.0.0b
```
Do note that **this installs only the core MedCAT v2**.
**It does not necessary dependencies for `spacy`-based tokenizing or MetaCATs or DeID**.
However, all of those are supported as well.
You can install them as follows:
```
pip install medcat[spacy]~=2.0.0b # for spacy-based tokenizer
pip install medcat[meta-cat]~=2.0.0b # for MetaCAT
pip install medcat[deid]~=2.0.0b # for DeID models
pip install medcat[spacy,meta-cat,deid,rel-cat,dict-ner]~=2.0.0b # for all of the above
```
PS:
For in the above example, we're installing the MedCAT v2 BETA version of `v0.8.0`.
The README is unlikely to change after every new release.
If another version is available / required, substitute the version tag as appropriate.
## Demo
The MedCAT v2 demo web app is available [here](https://medcatv2.sites.er.kcl.ac.uk/).
## Tutorials
A guide on how to use MedCAT v2 is available at [MedCATv2 Tutorials](https://github.com/CogStack/cogstack-nlp/tree/main/medcat-v2-tutorials).
However, the tutorials are a bit of a work in progress at this point in time.
## Acknowledgements
Entity extraction was trained on [MedMentions](https://github.com/chanzuckerberg/MedMentions) In total it has ~ 35K entites from UMLS
The vocabulary was compiled from [Wiktionary](https://en.wiktionary.org/wiki/Wiktionary:Main_Page) In total ~ 800K unique words
## Powered By
A big thank you goes to [spaCy](https://spacy.io/) and [Hugging Face](https://huggingface.co/) - who made life a million times easier.
<!-- ## Citation
```
@ARTICLE{Kraljevic2021-ln,
title="Multi-domain clinical natural language processing with {MedCAT}: The Medical Concept Annotation Toolkit",
author="Kraljevic, Zeljko and Searle, Thomas and Shek, Anthony and Roguski, Lukasz and Noor, Kawsar and Bean, Daniel and Mascio, Aurelie and Zhu, Leilei and Folarin, Amos A and Roberts, Angus and Bendayan, Rebecca and Richardson, Mark P and Stewart, Robert and Shah, Anoop D and Wong, Wai Keong and Ibrahim, Zina and Teo, James T and Dobson, Richard J B",
journal="Artif. Intell. Med.",
volume=117,
pages="102083",
month=jul,
year=2021,
issn="0933-3657",
doi="10.1016/j.artmed.2021.102083"
}
``` -->
Raw data
{
"_id": null,
"home_page": null,
"name": "medcat",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "CogStack <contact@cogstack.org>",
"keywords": "ML, NLP, NER+L",
"author": "Z. Kraljevic, A. Shek, T. Searle, X. Bai, M. Ratas",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/ad/b3/cb1e824929a808bc74984453b02df7bde4faa8acb1431f927be66fcd6b0d/medcat-2.0.0.tar.gz",
"platform": null,
"description": "# Medical <img src=\"https://github.com/CogStack/cogstack-nlp/blob/main/media/cat-logo.png?raw=true\" width=45> oncept Annotation Tool (version 2)\n\n**There's a number of breaking changes in MedCAT v2 compared to v1.**\nWhen moving from v1 to v2, please refer to the [migration guide](docs/migration_guide_v2.md).\nDetails on breaking are outlined [here](docs/breaking_changes.md).\n\n[](https://github.com/CogStack/cogstack-nlp/actions/workflows/medcat-v2_main.yml/badge.svg?branch=main)\n[](https://readthedocs.org/projects/cogstack-nlp/badge/?version=latest)\n[](https://github.com/CogStack/cogstack-nlp/releases/latest)\n<!-- [](https://pypi.org/project/medcat/) -->\n\nMedCAT can be used to extract information from Electronic Health Records (EHRs) and link it to biomedical ontologies like SNOMED-CT, UMLS, or HPO (and potentially other ontologies).\nOriginal paper for v1 on [arXiv](https://arxiv.org/abs/2010.01165). \n\n**Official Docs [here](https://cogstack-nlp.readthedocs.io/)**\n\n**Discussion Forum [discourse](https://discourse.cogstack.org/)**\n\n## Available Models\n\nAs MedCAT v2 is still in Beta, we do not currently have any models publically available.\nYou can still use models for v1, however (see the [README](https://github.com/CogStack/cogstack-nlp/blob/main/medcat-v2/README.md) there).\nIf you wish you can also convert the v1 models into the v2 format (see tutorial (TODO + link)).\n\n## News\n- **MedCAT v2 beta** \\[1. April 2025\\] MedCATv2 beta 0.1.5 was released 1. April 2025.\n<!-- - **Paper** van Es, B., Reteig, L.C., Tan, S.C. et al. [Negation detection in Dutch clinical texts: an evaluation of rule-based and machine learning methods](https://doi.org/10.1186/s12859-022-05130-x). BMC Bioinformatics 24, 10 (2023). \n- **New tool in the Cogstack ecosystem \\[19. December 2022\\]** [Foresight -- Deep Generative Modelling of Patient Timelines using Electronic Health Records](https://arxiv.org/abs/2212.08072)\n- **New Paper using MedCAT \\[21. October 2022\\]**: [A New Public Corpus for Clinical Section Identification: MedSecId.](https://aclanthology.org/2022.coling-1.326.pdf)\n- **Major Change to the Permissions of Use \\[4. August 2022\\]** MedCAT now uses the [Elastic License 2.0](https://github.com/CogStack/MedCAT/pull/271/commits/c9f4e86116ec751a97c618c97dadaa23e1feb6bc). For further information please click [here.](https://www.elastic.co/licensing/elastic-license)\n- **New Downloader \\[15. March 2022\\]**: You can now [download](https://uts.nlm.nih.gov/uts/login?service=https://medcat.rosalind.kcl.ac.uk/auth-callback) the latest SNOMED-CT and UMLS model packs via UMLS user authentication.\n- **New Feature and Tutorial \\[7. December 2021\\]**: [Exploring Electronic Health Records with MedCAT and Neo4j](https://towardsdatascience.com/exploring-electronic-health-records-with-medcat-and-neo4j-f376c03d8eef)\n- **New Minor Release \\[20. October 2021\\]** Introducing model packs, new faster multiprocessing for large datasets (100M+ documents) and improved MetaCAT.\n- **New Release \\[1. August 2021\\]**: Upgraded MedCAT to use spaCy v3, new scispaCy models have to be downloaded - all old CDBs (compatble with MedCAT v1) will work without any changes.\n- **New Feature and Tutorial \\[8. July 2021\\]**: [Integrating \ud83e\udd17 Transformers with MedCAT for biomedical NER+L](https://towardsdatascience.com/integrating-transformers-with-medcat-for-biomedical-ner-l-8869c76762a)\n- **General \\[1. April 2021\\]**: MedCAT is upgraded to v1, unforunately this introduces breaking changes with older models (MedCAT v0.4),\n as well as potential problems with all code that used the MedCAT package. MedCAT v0.4 is available on the legacy\n branch and will still be supported until 1. July 2021\n (with respect to potential bug fixes), after it will still be available but not updated anymore.\n- **Paper**: [What\u2019s in a Summary? Laying the Groundwork for Advances in Hospital-Course Summarization](https://www.aclweb.org/anthology/2021.naacl-main.382.pdf)\n- ([more...](https://github.com/CogStack/cogstack-nlp/blob/main/medcat-v2/media/news.md)) -->\n\n## Installation\n\nCurrently MedCAT v2 is in Beta.\nAs such, you need to explicitly specify the beta release.\n```\npip install medcat~=2.0.0b\n```\nDo note that **this installs only the core MedCAT v2**.\n**It does not necessary dependencies for `spacy`-based tokenizing or MetaCATs or DeID**.\nHowever, all of those are supported as well.\nYou can install them as follows:\n```\npip install medcat[spacy]~=2.0.0b # for spacy-based tokenizer\npip install medcat[meta-cat]~=2.0.0b # for MetaCAT\npip install medcat[deid]~=2.0.0b # for DeID models\npip install medcat[spacy,meta-cat,deid,rel-cat,dict-ner]~=2.0.0b # for all of the above\n```\n\nPS:\nFor in the above example, we're installing the MedCAT v2 BETA version of `v0.8.0`.\nThe README is unlikely to change after every new release.\nIf another version is available / required, substitute the version tag as appropriate.\n\n## Demo\n\nThe MedCAT v2 demo web app is available [here](https://medcatv2.sites.er.kcl.ac.uk/).\n\n## Tutorials\nA guide on how to use MedCAT v2 is available at [MedCATv2 Tutorials](https://github.com/CogStack/cogstack-nlp/tree/main/medcat-v2-tutorials).\nHowever, the tutorials are a bit of a work in progress at this point in time.\n\n\n## Acknowledgements\nEntity extraction was trained on [MedMentions](https://github.com/chanzuckerberg/MedMentions) In total it has ~ 35K entites from UMLS\n\nThe vocabulary was compiled from [Wiktionary](https://en.wiktionary.org/wiki/Wiktionary:Main_Page) In total ~ 800K unique words\n\n## Powered By\nA big thank you goes to [spaCy](https://spacy.io/) and [Hugging Face](https://huggingface.co/) - who made life a million times easier.\n\n\n<!-- ## Citation\n```\n@ARTICLE{Kraljevic2021-ln,\n title=\"Multi-domain clinical natural language processing with {MedCAT}: The Medical Concept Annotation Toolkit\",\n author=\"Kraljevic, Zeljko and Searle, Thomas and Shek, Anthony and Roguski, Lukasz and Noor, Kawsar and Bean, Daniel and Mascio, Aurelie and Zhu, Leilei and Folarin, Amos A and Roberts, Angus and Bendayan, Rebecca and Richardson, Mark P and Stewart, Robert and Shah, Anoop D and Wong, Wai Keong and Ibrahim, Zina and Teo, James T and Dobson, Richard J B\",\n journal=\"Artif. Intell. Med.\",\n volume=117,\n pages=\"102083\",\n month=jul,\n year=2021,\n issn=\"0933-3657\",\n doi=\"10.1016/j.artmed.2021.102083\"\n}\n``` -->\n",
"bugtrack_url": null,
"license": null,
"summary": "Medical Concept Annotation Toolkit (v2)",
"version": "2.0.0",
"project_urls": {
"Bug Reports": "https://discourse.cogstack.org/",
"Homepage": "https://cogstack.org/",
"Source": "https://github.com/CogStack/cogstack-nlp/"
},
"split_keywords": [
"ml",
" nlp",
" ner+l"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "69c2124222fcbf526880c5f211add8a4a0a9aad51359f91076b53a7aa16b30cf",
"md5": "6b78cc650bf8e85026113cf92a42cd9d",
"sha256": "af2f027073b30fbf753b834fc333fd696f1254df533189517ac9f1362f0dc5d8"
},
"downloads": -1,
"filename": "medcat-2.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6b78cc650bf8e85026113cf92a42cd9d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 273266,
"upload_time": "2025-08-18T12:24:31",
"upload_time_iso_8601": "2025-08-18T12:24:31.980381Z",
"url": "https://files.pythonhosted.org/packages/69/c2/124222fcbf526880c5f211add8a4a0a9aad51359f91076b53a7aa16b30cf/medcat-2.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "adb3cb1e824929a808bc74984453b02df7bde4faa8acb1431f927be66fcd6b0d",
"md5": "75a6c7ea3efdb015b8ef41b8a02198a3",
"sha256": "a4b7a773781502aef25972a4e131dda6e65d2662120308a705a84e4d820c29b4"
},
"downloads": -1,
"filename": "medcat-2.0.0.tar.gz",
"has_sig": false,
"md5_digest": "75a6c7ea3efdb015b8ef41b8a02198a3",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 235454,
"upload_time": "2025-08-18T12:24:33",
"upload_time_iso_8601": "2025-08-18T12:24:33.773455Z",
"url": "https://files.pythonhosted.org/packages/ad/b3/cb1e824929a808bc74984453b02df7bde4faa8acb1431f927be66fcd6b0d/medcat-2.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-18 12:24:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "CogStack",
"github_project": "cogstack-nlp",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "medcat"
}