# MIMIC III Corpus Parsing
[![PyPI][pypi-badge]][pypi-link]
[![Python 3.10][python310-badge]][python310-link]
[![Python 3.11][python311-badge]][python311-link]
[![Build Status][build-badge]][build-link]
A utility library for parsing the [MIMIC-III] corpus. This uses [spaCy] and
extends the [zensols.mednlp] to parse the [MIMIC-III] medical note dataset.
Features include:
* Creates both natural language and medical features from medical notes. The
latter is generated using linked entity concepts parsed with [MedCAT] via
[zensols.mednlp].
* Modifies the [spaCy] tokenizer to chunk masked tokens. For example, `[`,
`**`, `First`, `Name` `**` `]` becomes `[**First Name**]`.
* Provides a clean Pythonic object oriented representation of MIMIC-III
admissions and medical notes.
* Interfaces MIMIC-III data as a relational database (either PostgreSQL or
SQLite).
* Paragraph chunking using the most common syntax/physician templates provided
in the MIMIC-III dataset.
## Documentation
See the [full documentation](https://plandes.github.io/mimic/index.html).
The [API reference](https://plandes.github.io/mimic/api.html) is also
available.
## Obtaining
The easiest way to install the command line program is via the `pip` installer:
```bash
pip3 install zensols.mimic
```
Binaries are also available on [pypi].
## Installation
1. Install the package: `pip3 install zensols.mimic`
2. Install the database (either PostgreSQL or SQLite).
## Configuration
After a database is installed it must be configured in a new file `~/.mimicrc`
that you create. This INI formatted file also specifies where to cache data:
```ini
[default]
# the directory where cached data is stored
data_dir = ~/directory/to/cached/data
```
If this file doesn't exist, it must be specified with the `--config` option.
### SQLite
SQLite is the default database used for MIMIC-III access, but, it is slower and
not as well tested compared to the [PostgreSQL](PostgreSQL) driver. See the
[SQLite database file] using the [SQLite instructions] to create the SQLite
file from MIMIC-III if you need database access.
Once you create the file, configure it with the API using the following
additional configuration in the `--config` specified file is also necessary (or in
`~/.mimicrc`):
```ini
[mimic_sqlite_conn_manager]
db_file = path: <some directory>/mimic3.sqlite3
```
### PostgreSQL
PostgreSQL is the preferred way to access MIMIC-II for this API. The MIMIC-III
database can be loaded by following the [PostgreSQL instructions], or consider
the [PostgreSQL Docker image]. Then configure the database by adding the
following to `~/.mimicrc`:
```ini
[mimic_default]
resources_dir = resource(zensols.mimic): resources
sql_resources = ${resources_dir}/postgres
conn_manager = mimic_postgres_conn_manager
[mimic_db]
database = <needs a value>
host = <needs a value>
port = <needs a value>
user = <needs a value>
password = <needs a value>
```
The Python PostgreSQL client package is also needed (not needed for the
[SQLite](#sqlite-configuration) installs), which can be installed with:
```bash
pip3 install zensols.dbpg
```
## Usage
The [Corpus] class is the data access object used to read and parse the corpus:
```python
# get the MIMIC-III corpus data acceess object
>>> from zensols.mimic import ApplicationFactory
>>> corpus = ApplicationFactory.get_corpus()
# get an admission by hadm_id
>>> adm = corpus.hospital_adm_stash['165315']
# get the first discharge note (some have admissions have addendums)
>>> from zensols.mimic.regexnote import DischargeSummaryNote
>>> ds = adm.notes_by_category[DischargeSummaryNote.CATEGORY][0]
# dump the note as a human readable section-by-section
>>> ds.write()
row_id: 12144
category: Discharge summary
description: Report
annotator: regular_expression
----------------------0:chief-complaint (CHIEF COMPLAINT)-----------------------
Unresponsiveness
-----------1:history-of-present-illness (HISTORY OF PRESENT ILLNESS)------------
The patient is a ...
# get features of the note useful in ML models as a Pandas dataframe
>>> df = ds.feature_dataframe
# get only medical features (CUI, entity, NER and POS tag) for the HPI section
>>> df[(df['section'] == 'history-of-present-illness') & (df['cui_'] != '-<N>-')]['norm cui_ detected_name_ ent_ tag_'.split()]
norm cui_ detected_name_ ent_ tag_
15 history C0455527 history~of~hypertension concept NN
```
See the [application example], which gives a fine grain way of configuring the
API.
## Medical Note Segmentation
This package uses regular expressions to segment notes. However, the
[zensols.mimicsid] uses annotations and a model trained by clinical informatics
physicians. Using this package gives this enhanced segmentation without any
API changes.
## Citation
If you use this project in your research please use the following BibTeX entry:
```bibtex
@inproceedings{landes-etal-2023-deepzensols,
title = "{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility",
author = "Landes, Paul and
Di Eugenio, Barbara and
Caragea, Cornelia",
editor = "Tan, Liling and
Milajevs, Dmitrijs and
Chauhan, Geeticka and
Gwinnup, Jeremy and
Rippeth, Elijah",
booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
month = dec,
year = "2023",
address = "Singapore, Singapore",
publisher = "Empirical Methods in Natural Language Processing",
url = "https://aclanthology.org/2023.nlposs-1.16",
pages = "141--146"
}
```
## Changelog
An extensive changelog is available [here](CHANGELOG.md).
## Community
Please star this repository and let me know how and where you use this API.
Contributions as pull requests, feedback and any input is welcome.
## License
[MIT License](LICENSE.md)
Copyright (c) 2022 - 2024 Paul Landes
<!-- links -->
[pypi]: https://pypi.org/project/zensols.mimic/
[pypi-link]: https://pypi.python.org/pypi/zensols.mimic
[pypi-badge]: https://img.shields.io/pypi/v/zensols.mimic.svg
[python310-badge]: https://img.shields.io/badge/python-3.10-blue.svg
[python310-link]: https://www.python.org/downloads/release/python-3100
[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg
[python311-link]: https://www.python.org/downloads/release/python-3110
[build-badge]: https://github.com/plandes/mimic/workflows/CI/badge.svg
[build-link]: https://github.com/plandes/mimic/actions
[MIMIC-III]: https://physionet.org/content/mimiciii-demo/1.4/
[MedCAT]: https://github.com/CogStack/MedCAT
[spaCy]: https://spacy.io
[zensols.mednlp]: https://github.com/plandes/mednlp
[SQLite instructions]: https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iii/buildmimic/sqlite
[PostgreSQL instructions]: https://github.com/MIT-LCP/mimic-code/blob/main/mimic-iii/buildmimic/postgres/README.md
[PostgreSQL Docker image]: https://github.com/plandes/mimicdb
[SQLite database file]: https://github.com/plandes/mimicdbsqlite
[Corpus]: https://plandes.github.io/mimic/api/zensols.mimic.html#zensols.mimic.corpus.Corpus
[application example]: https://github.com/plandes/mimic/blob/master/example/shownote.py
[zensols.mimicsid]: https://github.com/plandes/mimicsid
Raw data
{
"_id": null,
"home_page": "https://github.com/plandes/mimic",
"name": "zensols.mimic",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "tooling",
"author": "Paul Landes",
"author_email": "landes@mailc.net",
"download_url": "https://github.com/plandes/mimic/releases/download/v1.7.0/zensols.mimic-1.7.0-py3-none-any.whl",
"platform": null,
"description": "# MIMIC III Corpus Parsing\n\n[![PyPI][pypi-badge]][pypi-link]\n[![Python 3.10][python310-badge]][python310-link]\n[![Python 3.11][python311-badge]][python311-link]\n[![Build Status][build-badge]][build-link]\n\nA utility library for parsing the [MIMIC-III] corpus. This uses [spaCy] and\nextends the [zensols.mednlp] to parse the [MIMIC-III] medical note dataset.\nFeatures include:\n\n* Creates both natural language and medical features from medical notes. The\n latter is generated using linked entity concepts parsed with [MedCAT] via\n [zensols.mednlp].\n* Modifies the [spaCy] tokenizer to chunk masked tokens. For example, `[`,\n `**`, `First`, `Name` `**` `]` becomes `[**First Name**]`.\n* Provides a clean Pythonic object oriented representation of MIMIC-III\n admissions and medical notes.\n* Interfaces MIMIC-III data as a relational database (either PostgreSQL or\n SQLite).\n* Paragraph chunking using the most common syntax/physician templates provided\n in the MIMIC-III dataset.\n\n\n## Documentation\n\nSee the [full documentation](https://plandes.github.io/mimic/index.html).\nThe [API reference](https://plandes.github.io/mimic/api.html) is also\navailable.\n\n\n## Obtaining\n\nThe easiest way to install the command line program is via the `pip` installer:\n```bash\npip3 install zensols.mimic\n```\n\nBinaries are also available on [pypi].\n\n\n## Installation\n\n1. Install the package: `pip3 install zensols.mimic`\n2. Install the database (either PostgreSQL or SQLite).\n\n\n## Configuration\n\nAfter a database is installed it must be configured in a new file `~/.mimicrc`\nthat you create. This INI formatted file also specifies where to cache data:\n```ini\n[default]\n# the directory where cached data is stored\ndata_dir = ~/directory/to/cached/data\n```\nIf this file doesn't exist, it must be specified with the `--config` option.\n\n\n### SQLite\n\nSQLite is the default database used for MIMIC-III access, but, it is slower and\nnot as well tested compared to the [PostgreSQL](PostgreSQL) driver. See the\n[SQLite database file] using the [SQLite instructions] to create the SQLite\nfile from MIMIC-III if you need database access.\n\nOnce you create the file, configure it with the API using the following\nadditional configuration in the `--config` specified file is also necessary (or in\n`~/.mimicrc`):\n```ini\n[mimic_sqlite_conn_manager]\ndb_file = path: <some directory>/mimic3.sqlite3\n```\n\n### PostgreSQL\n\nPostgreSQL is the preferred way to access MIMIC-II for this API. The MIMIC-III\ndatabase can be loaded by following the [PostgreSQL instructions], or consider\nthe [PostgreSQL Docker image]. Then configure the database by adding the\nfollowing to `~/.mimicrc`:\n```ini\n[mimic_default]\nresources_dir = resource(zensols.mimic): resources\nsql_resources = ${resources_dir}/postgres\nconn_manager = mimic_postgres_conn_manager\n\n[mimic_db]\ndatabase = <needs a value>\nhost = <needs a value>\nport = <needs a value>\nuser = <needs a value>\npassword = <needs a value>\n```\n\n\nThe Python PostgreSQL client package is also needed (not needed for the\n[SQLite](#sqlite-configuration) installs), which can be installed with:\n```bash\npip3 install zensols.dbpg\n```\n\n\n## Usage\n\nThe [Corpus] class is the data access object used to read and parse the corpus:\n\n```python\n# get the MIMIC-III corpus data acceess object\n>>> from zensols.mimic import ApplicationFactory\n>>> corpus = ApplicationFactory.get_corpus()\n\n# get an admission by hadm_id\n>>> adm = corpus.hospital_adm_stash['165315']\n\n# get the first discharge note (some have admissions have addendums)\n>>> from zensols.mimic.regexnote import DischargeSummaryNote\n>>> ds = adm.notes_by_category[DischargeSummaryNote.CATEGORY][0]\n\n# dump the note as a human readable section-by-section\n>>> ds.write()\nrow_id: 12144\ncategory: Discharge summary\ndescription: Report\nannotator: regular_expression\n----------------------0:chief-complaint (CHIEF COMPLAINT)-----------------------\nUnresponsiveness\n-----------1:history-of-present-illness (HISTORY OF PRESENT ILLNESS)------------\nThe patient is a ...\n\n# get features of the note useful in ML models as a Pandas dataframe\n>>> df = ds.feature_dataframe\n\n# get only medical features (CUI, entity, NER and POS tag) for the HPI section\n>>> df[(df['section'] == 'history-of-present-illness') & (df['cui_'] != '-<N>-')]['norm cui_ detected_name_ ent_ tag_'.split()]\n norm cui_ detected_name_ ent_ tag_\n15 history C0455527 history~of~hypertension concept NN\n```\n\nSee the [application example], which gives a fine grain way of configuring the\nAPI.\n\n\n## Medical Note Segmentation\n\nThis package uses regular expressions to segment notes. However, the\n[zensols.mimicsid] uses annotations and a model trained by clinical informatics\nphysicians. Using this package gives this enhanced segmentation without any\nAPI changes.\n\n\n## Citation\n\nIf you use this project in your research please use the following BibTeX entry:\n\n```bibtex\n@inproceedings{landes-etal-2023-deepzensols,\n title = \"{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility\",\n author = \"Landes, Paul and\n Di Eugenio, Barbara and\n Caragea, Cornelia\",\n editor = \"Tan, Liling and\n Milajevs, Dmitrijs and\n Chauhan, Geeticka and\n Gwinnup, Jeremy and\n Rippeth, Elijah\",\n booktitle = \"Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)\",\n month = dec,\n year = \"2023\",\n address = \"Singapore, Singapore\",\n publisher = \"Empirical Methods in Natural Language Processing\",\n url = \"https://aclanthology.org/2023.nlposs-1.16\",\n pages = \"141--146\"\n}\n```\n\n\n## Changelog\n\nAn extensive changelog is available [here](CHANGELOG.md).\n\n\n## Community\n\nPlease star this repository and let me know how and where you use this API.\nContributions as pull requests, feedback and any input is welcome.\n\n\n## License\n\n[MIT License](LICENSE.md)\n\nCopyright (c) 2022 - 2024 Paul Landes\n\n\n<!-- links -->\n[pypi]: https://pypi.org/project/zensols.mimic/\n[pypi-link]: https://pypi.python.org/pypi/zensols.mimic\n[pypi-badge]: https://img.shields.io/pypi/v/zensols.mimic.svg\n[python310-badge]: https://img.shields.io/badge/python-3.10-blue.svg\n[python310-link]: https://www.python.org/downloads/release/python-3100\n[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg\n[python311-link]: https://www.python.org/downloads/release/python-3110\n[build-badge]: https://github.com/plandes/mimic/workflows/CI/badge.svg\n[build-link]: https://github.com/plandes/mimic/actions\n\n[MIMIC-III]: https://physionet.org/content/mimiciii-demo/1.4/\n[MedCAT]: https://github.com/CogStack/MedCAT\n[spaCy]: https://spacy.io\n[zensols.mednlp]: https://github.com/plandes/mednlp\n\n[SQLite instructions]: https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iii/buildmimic/sqlite\n[PostgreSQL instructions]: https://github.com/MIT-LCP/mimic-code/blob/main/mimic-iii/buildmimic/postgres/README.md\n[PostgreSQL Docker image]: https://github.com/plandes/mimicdb\n[SQLite database file]: https://github.com/plandes/mimicdbsqlite\n[Corpus]: https://plandes.github.io/mimic/api/zensols.mimic.html#zensols.mimic.corpus.Corpus\n[application example]: https://github.com/plandes/mimic/blob/master/example/shownote.py\n[zensols.mimicsid]: https://github.com/plandes/mimicsid\n",
"bugtrack_url": null,
"license": null,
"summary": "MIMIC III Corpus Parsing",
"version": "1.7.0",
"project_urls": {
"Download": "https://github.com/plandes/mimic/releases/download/v1.7.0/zensols.mimic-1.7.0-py3-none-any.whl",
"Homepage": "https://github.com/plandes/mimic"
},
"split_keywords": [
"tooling"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "41573930e11ed5d6fd7f4eb474de6aa36459df2de9bfd363aa026fd1f1549d13",
"md5": "5908ca302a3d9b4d75440ea97789a1ee",
"sha256": "5a77ab2b287d47c38d8731cf287cdd00f9c5c05620b830f196ad3752f0de0d74"
},
"downloads": -1,
"filename": "zensols.mimic-1.7.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5908ca302a3d9b4d75440ea97789a1ee",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 49013,
"upload_time": "2024-04-14T20:20:46",
"upload_time_iso_8601": "2024-04-14T20:20:46.835884Z",
"url": "https://files.pythonhosted.org/packages/41/57/3930e11ed5d6fd7f4eb474de6aa36459df2de9bfd363aa026fd1f1549d13/zensols.mimic-1.7.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-14 20:20:46",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "plandes",
"github_project": "mimic",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "zensols.mimic"
}