zensols.mimic

Name	zensols.mimic JSON
Version	1.8.0 JSON
	download
home_page	https://github.com/plandes/mimic
Summary	MIMIC III Corpus Parsing
upload_time	2025-01-11 21:24:40
maintainer	None
docs_url	None
author	Paul Landes
requires_python	None
license	None
keywords	tooling
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # MIMIC III Corpus Parsing

[![PyPI][pypi-badge]][pypi-link]
[![Python 3.11][python311-badge]][python311-link]
[![Build Status][build-badge]][build-link]

A utility library for parsing the [MIMIC-III] corpus.  This uses [spaCy] and
extends the [zensols.mednlp] to parse the [MIMIC-III] medical note dataset.
Features include:

* Creates both natural language and medical features from medical notes.  The
  latter is generated using linked entity concepts parsed with [MedCAT] via
  [zensols.mednlp].
* Modifies the [spaCy] tokenizer to chunk masked tokens.  For example, `[`,
  `**`, `First`, `Name` `**` `]` becomes `[**First Name**]`.
* Provides a clean Pythonic object oriented representation of MIMIC-III
  admissions and medical notes.
* Interfaces MIMIC-III data as a relational database (either PostgreSQL or
  SQLite).
* Paragraph chunking using the most common syntax/physician templates provided
  in the MIMIC-III dataset.


## Documentation

See the [full documentation](https://plandes.github.io/mimic/index.html).
The [API reference](https://plandes.github.io/mimic/api.html) is also
available.


## Obtaining

The easiest way to install the command line program is via the `pip` installer:
```bash
pip3 install zensols.mimic
```

Binaries are also available on [pypi].


## Installation

1. Install the package: `pip3 install zensols.mimic`
2. Install the database (either PostgreSQL or SQLite).


## Configuration

After a database is installed it must be configured in a new file `~/.mimicrc`
that you create.  This INI formatted file also specifies where to cache data:
```ini
[default]
# the directory where cached data is stored
data_dir = ~/directory/to/cached/data
```
If this file doesn't exist, it must be specified with the `--config` option.


### SQLite

SQLite is the default database used for MIMIC-III access, but, it is slower and
not as well tested compared to the [PostgreSQL](PostgreSQL) driver.  See the
[SQLite database file] using the [SQLite instructions] to create the SQLite
file from MIMIC-III if you need database access.

Once you create the file, configure it with the API using the following
additional configuration in the `--config` specified file is also necessary (or in
`~/.mimicrc`):
```ini
[mimic_sqlite_conn_manager]
db_file = path: <some directory>/mimic3.sqlite3
```

### PostgreSQL

PostgreSQL is the preferred way to access MIMIC-II for this API.  The MIMIC-III
database can be loaded by following the [PostgreSQL instructions], or consider
the [PostgreSQL Docker image].  Then configure the database by adding the
following to `~/.mimicrc`:
```ini
[mimic_default]
resources_dir = resource(zensols.mimic): resources
sql_resources = ${resources_dir}/postgres
conn_manager = mimic_postgres_conn_manager

[mimic_db]
database = <needs a value>
host = <needs a value>
port = <needs a value>
user = <needs a value>
password = <needs a value>
```


The Python PostgreSQL client package is also needed (not needed for the
[SQLite](#sqlite-configuration) installs), which can be installed with:
```bash
pip3 install zensols.dbpg
```


## Usage

The [Corpus] class is the data access object used to read and parse the corpus:

```python
# get the MIMIC-III corpus data acceess object
>>> from zensols.mimic import ApplicationFactory
>>> corpus = ApplicationFactory.get_corpus()

# get an admission by hadm_id
>>> adm = corpus.hospital_adm_stash['165315']

# get the first discharge note (some have admissions have addendums)
>>> from zensols.mimic.regexnote import DischargeSummaryNote
>>> ds = adm.notes_by_category[DischargeSummaryNote.CATEGORY][0]

# dump the note as a human readable section-by-section
>>> ds.write()
row_id: 12144
category: Discharge summary
description: Report
annotator: regular_expression
----------------------0:chief-complaint (CHIEF COMPLAINT)-----------------------
Unresponsiveness
-----------1:history-of-present-illness (HISTORY OF PRESENT ILLNESS)------------
The patient is a ...

# get features of the note useful in ML models as a Pandas dataframe
>>> df = ds.feature_dataframe

# get only medical features (CUI, entity, NER and POS tag) for the HPI section
>>> df[(df['section'] == 'history-of-present-illness') & (df['cui_'] != '-<N>-')]['norm cui_ detected_name_ ent_ tag_'.split()]
             norm      cui_           detected_name_     ent_ tag_
15        history  C0455527  history~of~hypertension  concept   NN
```

See the [application example], which gives a fine grain way of configuring the
API.


## Medical Note Segmentation

This package uses regular expressions to segment notes.  However, the
[zensols.mimicsid] uses annotations and a model trained by clinical informatics
physicians.  Using this package gives this enhanced segmentation without any
API changes.


## Citation

If you use this project in your research please use the following BibTeX entry:

```bibtex
@inproceedings{landes-etal-2023-deepzensols,
    title = "{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility",
    author = "Landes, Paul  and
      Di Eugenio, Barbara  and
      Caragea, Cornelia",
    editor = "Tan, Liling  and
      Milajevs, Dmitrijs  and
      Chauhan, Geeticka  and
      Gwinnup, Jeremy  and
      Rippeth, Elijah",
    booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
    month = dec,
    year = "2023",
    address = "Singapore, Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.nlposs-1.16",
    pages = "141--146"
}
```


## Changelog

An extensive changelog is available [here](CHANGELOG.md).


## Community

Please star this repository and let me know how and where you use this API.
Contributions as pull requests, feedback and any input is welcome.


## License

[MIT License](LICENSE.md)

Copyright (c) 2022 - 2025 Paul Landes


<!-- links -->
[pypi]: https://pypi.org/project/zensols.mimic/
[pypi-link]: https://pypi.python.org/pypi/zensols.mimic
[pypi-badge]: https://img.shields.io/pypi/v/zensols.mimic.svg
[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg
[python311-link]: https://www.python.org/downloads/release/python-3110
[build-badge]: https://github.com/plandes/mimic/workflows/CI/badge.svg
[build-link]: https://github.com/plandes/mimic/actions

[MIMIC-III]: https://physionet.org/content/mimiciii-demo/1.4/
[MedCAT]: https://github.com/CogStack/MedCAT
[spaCy]: https://spacy.io
[zensols.mednlp]: https://github.com/plandes/mednlp

[SQLite instructions]: https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iii/buildmimic/sqlite
[PostgreSQL instructions]: https://github.com/MIT-LCP/mimic-code/blob/main/mimic-iii/buildmimic/postgres/README.md
[PostgreSQL Docker image]: https://github.com/plandes/mimicdb
[SQLite database file]: https://github.com/plandes/mimicdbsqlite
[Corpus]: https://plandes.github.io/mimic/api/zensols.mimic.html#zensols.mimic.corpus.Corpus
[application example]: https://github.com/plandes/mimic/blob/master/example/shownote.py
[zensols.mimicsid]: https://github.com/plandes/mimicsid

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/plandes/mimic",
    "name": "zensols.mimic",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "tooling",
    "author": "Paul Landes",
    "author_email": "landes@mailc.net",
    "download_url": "https://github.com/plandes/mimic/releases/download/v1.8.0/zensols.mimic-1.8.0-py3-none-any.whl",
    "platform": null,
    "description": "# MIMIC III Corpus Parsing\n\n[![PyPI][pypi-badge]][pypi-link]\n[![Python 3.11][python311-badge]][python311-link]\n[![Build Status][build-badge]][build-link]\n\nA utility library for parsing the [MIMIC-III] corpus.  This uses [spaCy] and\nextends the [zensols.mednlp] to parse the [MIMIC-III] medical note dataset.\nFeatures include:\n\n* Creates both natural language and medical features from medical notes.  The\n  latter is generated using linked entity concepts parsed with [MedCAT] via\n  [zensols.mednlp].\n* Modifies the [spaCy] tokenizer to chunk masked tokens.  For example, `[`,\n  `**`, `First`, `Name` `**` `]` becomes `[**First Name**]`.\n* Provides a clean Pythonic object oriented representation of MIMIC-III\n  admissions and medical notes.\n* Interfaces MIMIC-III data as a relational database (either PostgreSQL or\n  SQLite).\n* Paragraph chunking using the most common syntax/physician templates provided\n  in the MIMIC-III dataset.\n\n\n## Documentation\n\nSee the [full documentation](https://plandes.github.io/mimic/index.html).\nThe [API reference](https://plandes.github.io/mimic/api.html) is also\navailable.\n\n\n## Obtaining\n\nThe easiest way to install the command line program is via the `pip` installer:\n```bash\npip3 install zensols.mimic\n```\n\nBinaries are also available on [pypi].\n\n\n## Installation\n\n1. Install the package: `pip3 install zensols.mimic`\n2. Install the database (either PostgreSQL or SQLite).\n\n\n## Configuration\n\nAfter a database is installed it must be configured in a new file `~/.mimicrc`\nthat you create.  This INI formatted file also specifies where to cache data:\n```ini\n[default]\n# the directory where cached data is stored\ndata_dir = ~/directory/to/cached/data\n```\nIf this file doesn't exist, it must be specified with the `--config` option.\n\n\n### SQLite\n\nSQLite is the default database used for MIMIC-III access, but, it is slower and\nnot as well tested compared to the [PostgreSQL](PostgreSQL) driver.  See the\n[SQLite database file] using the [SQLite instructions] to create the SQLite\nfile from MIMIC-III if you need database access.\n\nOnce you create the file, configure it with the API using the following\nadditional configuration in the `--config` specified file is also necessary (or in\n`~/.mimicrc`):\n```ini\n[mimic_sqlite_conn_manager]\ndb_file = path: <some directory>/mimic3.sqlite3\n```\n\n### PostgreSQL\n\nPostgreSQL is the preferred way to access MIMIC-II for this API.  The MIMIC-III\ndatabase can be loaded by following the [PostgreSQL instructions], or consider\nthe [PostgreSQL Docker image].  Then configure the database by adding the\nfollowing to `~/.mimicrc`:\n```ini\n[mimic_default]\nresources_dir = resource(zensols.mimic): resources\nsql_resources = ${resources_dir}/postgres\nconn_manager = mimic_postgres_conn_manager\n\n[mimic_db]\ndatabase = <needs a value>\nhost = <needs a value>\nport = <needs a value>\nuser = <needs a value>\npassword = <needs a value>\n```\n\n\nThe Python PostgreSQL client package is also needed (not needed for the\n[SQLite](#sqlite-configuration) installs), which can be installed with:\n```bash\npip3 install zensols.dbpg\n```\n\n\n## Usage\n\nThe [Corpus] class is the data access object used to read and parse the corpus:\n\n```python\n# get the MIMIC-III corpus data acceess object\n>>> from zensols.mimic import ApplicationFactory\n>>> corpus = ApplicationFactory.get_corpus()\n\n# get an admission by hadm_id\n>>> adm = corpus.hospital_adm_stash['165315']\n\n# get the first discharge note (some have admissions have addendums)\n>>> from zensols.mimic.regexnote import DischargeSummaryNote\n>>> ds = adm.notes_by_category[DischargeSummaryNote.CATEGORY][0]\n\n# dump the note as a human readable section-by-section\n>>> ds.write()\nrow_id: 12144\ncategory: Discharge summary\ndescription: Report\nannotator: regular_expression\n----------------------0:chief-complaint (CHIEF COMPLAINT)-----------------------\nUnresponsiveness\n-----------1:history-of-present-illness (HISTORY OF PRESENT ILLNESS)------------\nThe patient is a ...\n\n# get features of the note useful in ML models as a Pandas dataframe\n>>> df = ds.feature_dataframe\n\n# get only medical features (CUI, entity, NER and POS tag) for the HPI section\n>>> df[(df['section'] == 'history-of-present-illness') & (df['cui_'] != '-<N>-')]['norm cui_ detected_name_ ent_ tag_'.split()]\n             norm      cui_           detected_name_     ent_ tag_\n15        history  C0455527  history~of~hypertension  concept   NN\n```\n\nSee the [application example], which gives a fine grain way of configuring the\nAPI.\n\n\n## Medical Note Segmentation\n\nThis package uses regular expressions to segment notes.  However, the\n[zensols.mimicsid] uses annotations and a model trained by clinical informatics\nphysicians.  Using this package gives this enhanced segmentation without any\nAPI changes.\n\n\n## Citation\n\nIf you use this project in your research please use the following BibTeX entry:\n\n```bibtex\n@inproceedings{landes-etal-2023-deepzensols,\n    title = \"{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility\",\n    author = \"Landes, Paul  and\n      Di Eugenio, Barbara  and\n      Caragea, Cornelia\",\n    editor = \"Tan, Liling  and\n      Milajevs, Dmitrijs  and\n      Chauhan, Geeticka  and\n      Gwinnup, Jeremy  and\n      Rippeth, Elijah\",\n    booktitle = \"Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)\",\n    month = dec,\n    year = \"2023\",\n    address = \"Singapore, Singapore\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2023.nlposs-1.16\",\n    pages = \"141--146\"\n}\n```\n\n\n## Changelog\n\nAn extensive changelog is available [here](CHANGELOG.md).\n\n\n## Community\n\nPlease star this repository and let me know how and where you use this API.\nContributions as pull requests, feedback and any input is welcome.\n\n\n## License\n\n[MIT License](LICENSE.md)\n\nCopyright (c) 2022 - 2025 Paul Landes\n\n\n<!-- links -->\n[pypi]: https://pypi.org/project/zensols.mimic/\n[pypi-link]: https://pypi.python.org/pypi/zensols.mimic\n[pypi-badge]: https://img.shields.io/pypi/v/zensols.mimic.svg\n[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg\n[python311-link]: https://www.python.org/downloads/release/python-3110\n[build-badge]: https://github.com/plandes/mimic/workflows/CI/badge.svg\n[build-link]: https://github.com/plandes/mimic/actions\n\n[MIMIC-III]: https://physionet.org/content/mimiciii-demo/1.4/\n[MedCAT]: https://github.com/CogStack/MedCAT\n[spaCy]: https://spacy.io\n[zensols.mednlp]: https://github.com/plandes/mednlp\n\n[SQLite instructions]: https://github.com/MIT-LCP/mimic-code/tree/main/mimic-iii/buildmimic/sqlite\n[PostgreSQL instructions]: https://github.com/MIT-LCP/mimic-code/blob/main/mimic-iii/buildmimic/postgres/README.md\n[PostgreSQL Docker image]: https://github.com/plandes/mimicdb\n[SQLite database file]: https://github.com/plandes/mimicdbsqlite\n[Corpus]: https://plandes.github.io/mimic/api/zensols.mimic.html#zensols.mimic.corpus.Corpus\n[application example]: https://github.com/plandes/mimic/blob/master/example/shownote.py\n[zensols.mimicsid]: https://github.com/plandes/mimicsid\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "MIMIC III Corpus Parsing",
    "version": "1.8.0",
    "project_urls": {
        "Download": "https://github.com/plandes/mimic/releases/download/v1.8.0/zensols.mimic-1.8.0-py3-none-any.whl",
        "Homepage": "https://github.com/plandes/mimic"
    },
    "split_keywords": [
        "tooling"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cb554c35110e9ea255c5cea653015e895483b490862dbbbd4e8169c11756c03f",
                "md5": "7cdbc2e789bc516b81e17eeb2249de74",
                "sha256": "cd93cabe523b4ac7eab6e7446d1436351a0841e8065d1ce2d87ea0f47f4e7e01"
            },
            "downloads": -1,
            "filename": "zensols.mimic-1.8.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7cdbc2e789bc516b81e17eeb2249de74",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 49653,
            "upload_time": "2025-01-11T21:24:40",
            "upload_time_iso_8601": "2025-01-11T21:24:40.330272Z",
            "url": "https://files.pythonhosted.org/packages/cb/55/4c35110e9ea255c5cea653015e895483b490862dbbbd4e8169c11756c03f/zensols.mimic-1.8.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-11 21:24:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "plandes",
    "github_project": "mimic",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "zensols.mimic"
}

Paul Landes