icdcodex


Nameicdcodex JSON
Version 0.5.2 PyPI version JSON
download
home_pagehttps://github.com/icd-codex/icd-codex
Summarygraphical and continuous representations of ICD-9 and ICD-10 codes
upload_time2025-01-06 07:45:18
maintainerNone
docs_urlNone
authorJeremy Fisher
requires_pythonNone
licenseMIT license
keywords icdcodex
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![PyPI version fury.io](https://badge.fury.io/py/icdcodex.svg)](https://pypi.python.org/pypi/icdcodex/)
[![Documentation Status](https://readthedocs.org/projects/icd-codex/badge/?version=latest)](http://icd-codex.readthedocs.io/?badge=latest)
[![Downloads](https://pepy.tech/badge/icdcodex)](https://pepy.tech/project/icdcodex)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4300935.svg)](https://doi.org/10.5281/zenodo.4300935)

```{admonition} Maintenance mode
Thank you for your interest in ICD codex! I (Jeremy) wrote this in 2020 as part of class project and it has gotten quite a few downloads. However, since then, sequence representation has significantly improved. At this point, **I would not recommend using node2vec to represent ICD codes.** Instead, use a [language model](https://platform.openai.com/docs/guides/embeddings). The node2vec functionality is provided for compatibility with existing projects.

The `networkx` hierarchy should still remains useful.

If there is interest in extending this library for use with modern sequence learning algorithms, please reach out.
```
## What is it?
A python library for building vector representations of ICD-9 and ICD-10 codes. (2025 comment: the vector representations here are constructed using outdated algorithms.) Because it takes advantage of the hierarchical nature of ICD codes, it also provides these hierarchies in a [`networkx`](https://networkx.github.io) format. (2025 comment: this data structure should still remain useful.)

## Motivation
`icdcodex` was the first prize winner in the Data Driven Healthcare Track of John Hopkins' [MedHacks 2020](https://medhacks2020.devpost.com). It was hacked together to address the problem of [ICD](https://en.wikipedia.org/wiki/ICD-10) miscodes, which is a major issue for health insurance in the United States. Indeed, while ICD coding is tedious and labour intensive, it is not obvious how to automate because the output space is enourmous. For example, ICD-10 CM (clinical modification) has over 70,000 codes and growing.

There are [many strategies](https://maxhalford.github.io/blog/target-encoding/) for target encoding that address these issues. `icdcodex` has two features that make ICD classification more amenable to modeling:
- Access to a `networkx` tree representation of the ICD-9 and ICD-10 hierarchies
- Vector embeddings of ICD codes using the [node2vec](https://arxiv.org/abs/1607.00653) algorithm (including pre-computed embeddings and an interface to create new embeddings)

## Example Code
```python
from icdcodex import icd2vec, hierarchy
embedder = icd2vec.Icd2Vec(num_embedding_dimensions=64)
embedder.fit(*hierarchy.icd9())
X = get_patient_covariates()
y = embedder.to_vec(["0010"])  # Cholera due to vibrio cholerae
```
In this case, `y` is a 64-dimensional vector close to other `Infectious And Parasitic Diseases` codes. 

## Related Work
- node2vec [Paper](https://cs.stanford.edu/people/jure/pubs/node2vec-kdd16.pdf), [Website](https://snap.stanford.edu/node2vec/), [Code](https://github.com/snap-stanford/snap/tree/master/examples/node2vec), [Alternate Code](https://github.com/eliorc/node2vec)
- Learning Low-Dimensional Representations of Medical Concepts: [Paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001761/), [Code](https://github.com/clinicalml/embeddings)

## The Hackathon Team
- Jeremy Fisher (Maintainer)
- Alhusain Abdalla
- Natasha Nehra
- Tejas Patel
- Hamrish Saravanakumar

## Documentation

See the full documentation: [https://icd-codex.readthedocs.io/en/latest/](https://icd-codex.readthedocs.io/en/latest/)

## Contributions

[Contributions are always welcome!](https://icd-codex.readthedocs.io/en/latest/contributing.html)


# History

## 0.5.2 (2025-01-05)

- Added 2025 ICD-10-CM
- Bumped deps
- Removed EOL python's from CI
- Fix bug where 2024 codes are being pulled from 2023
- Fixed some flake8 bugs
- Fixed use of deprecated field in [networkx](https://networkx.org/documentation/networkx-2.8.7/reference/readwrite/generated/networkx.readwrite.json_graph.tree_graph.html)

## 0.4.9, 0.5.0 and 0.5.1 (2024-01-08)

- Added 2024 ICD-10-CM
- Fixed a bug where the node descriptions were being discarded
- Regenerated the hierarchy files with the descriptions

## 0.4.8 (2023-05-02)

- Use the newer scikit-learn PyPI location
- Deprecate the Python 3.6 version
- Add 2023 ICD-10-CM

## 0.4.6 (2022-10-13)

- Add 2022 ICD-10-CM (thank you @keithcallenberg!)

## 0.4.4 and 0.4.5 (2020-10-18)

- Add the code descriptions for ICD9
- Add usage on how to recapitulate functionality of sirrice/icd9
- Make the hierarchy directed to allow simpler and more intuitive traversal
- Fix issue where edges were not being formed between "Diseases Of The Blood And Blood-Forming Organs" and "Congenital Anomalies" and their children

## 0.4.3 (2020-10-04)

- Fix issue where hierarchy jsons were not being shipped with the pypi distribution

## 0.4.2 (2020-10-03)

- Add support for python <= 3.8 in the `hierarchy` module by using the `importlib.resources` backport

## 0.4.1 (2020-09-11)

- Update PyPI metadata

## 0.4.0 (2020-09-11)

- ICD-10-CM (2019 to 2020) codes are now fully present (whereas hackathon version missed certain codes)
- Versions of the ICD 9 and ICD-10-CM hierarchies are now cached to the `data` module
- Changed the hierarchy API: `hierarchy.icd9hierarchy()` is now `hierarchy.icd9()`. Ditto for ICD-10-CM.

## 0.3.0 (2020-09-05)

- Finesse API, now consistent between documentation and implementation

## 0.1.0 (2020-09-04)

- First release on PyPI, testing the waters during hackathon

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/icd-codex/icd-codex",
    "name": "icdcodex",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "icdcodex",
    "author": "Jeremy Fisher",
    "author_email": "jeremy@adamsfisher.me",
    "download_url": "https://files.pythonhosted.org/packages/72/4d/250db9de37f4b553132c87fb4b1e68c3989ed6369227679feac0c92403eb/icdcodex-0.5.2.tar.gz",
    "platform": null,
    "description": "[![PyPI version fury.io](https://badge.fury.io/py/icdcodex.svg)](https://pypi.python.org/pypi/icdcodex/)\n[![Documentation Status](https://readthedocs.org/projects/icd-codex/badge/?version=latest)](http://icd-codex.readthedocs.io/?badge=latest)\n[![Downloads](https://pepy.tech/badge/icdcodex)](https://pepy.tech/project/icdcodex)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4300935.svg)](https://doi.org/10.5281/zenodo.4300935)\n\n```{admonition} Maintenance mode\nThank you for your interest in ICD codex! I (Jeremy) wrote this in 2020 as part of class project and it has gotten quite a few downloads. However, since then, sequence representation has significantly improved. At this point, **I would not recommend using node2vec to represent ICD codes.** Instead, use a [language model](https://platform.openai.com/docs/guides/embeddings). The node2vec functionality is provided for compatibility with existing projects.\n\nThe `networkx` hierarchy should still remains useful.\n\nIf there is interest in extending this library for use with modern sequence learning algorithms, please reach out.\n```\n## What is it?\nA python library for building vector representations of ICD-9 and ICD-10 codes. (2025 comment: the vector representations here are constructed using outdated algorithms.) Because it takes advantage of the hierarchical nature of ICD codes, it also provides these hierarchies in a [`networkx`](https://networkx.github.io) format. (2025 comment: this data structure should still remain useful.)\n\n## Motivation\n`icdcodex` was the first prize winner in the Data Driven Healthcare Track of John Hopkins' [MedHacks 2020](https://medhacks2020.devpost.com). It was hacked together to address the problem of [ICD](https://en.wikipedia.org/wiki/ICD-10) miscodes, which is a major issue for health insurance in the United States. Indeed, while ICD coding is tedious and labour intensive, it is not obvious how to automate because the output space is enourmous. For example, ICD-10 CM (clinical modification) has over 70,000 codes and growing.\n\nThere are [many strategies](https://maxhalford.github.io/blog/target-encoding/) for target encoding that address these issues. `icdcodex` has two features that make ICD classification more amenable to modeling:\n- Access to a `networkx` tree representation of the ICD-9 and ICD-10 hierarchies\n- Vector embeddings of ICD codes using the [node2vec](https://arxiv.org/abs/1607.00653) algorithm (including pre-computed embeddings and an interface to create new embeddings)\n\n## Example Code\n```python\nfrom icdcodex import icd2vec, hierarchy\nembedder = icd2vec.Icd2Vec(num_embedding_dimensions=64)\nembedder.fit(*hierarchy.icd9())\nX = get_patient_covariates()\ny = embedder.to_vec([\"0010\"])  # Cholera due to vibrio cholerae\n```\nIn this case, `y` is a 64-dimensional vector close to other `Infectious And Parasitic Diseases` codes. \n\n## Related Work\n- node2vec [Paper](https://cs.stanford.edu/people/jure/pubs/node2vec-kdd16.pdf), [Website](https://snap.stanford.edu/node2vec/), [Code](https://github.com/snap-stanford/snap/tree/master/examples/node2vec), [Alternate Code](https://github.com/eliorc/node2vec)\n- Learning Low-Dimensional Representations of Medical Concepts: [Paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001761/), [Code](https://github.com/clinicalml/embeddings)\n\n## The Hackathon Team\n- Jeremy Fisher (Maintainer)\n- Alhusain Abdalla\n- Natasha Nehra\n- Tejas Patel\n- Hamrish Saravanakumar\n\n## Documentation\n\nSee the full documentation: [https://icd-codex.readthedocs.io/en/latest/](https://icd-codex.readthedocs.io/en/latest/)\n\n## Contributions\n\n[Contributions are always welcome!](https://icd-codex.readthedocs.io/en/latest/contributing.html)\n\n\n# History\n\n## 0.5.2 (2025-01-05)\n\n- Added 2025 ICD-10-CM\n- Bumped deps\n- Removed EOL python's from CI\n- Fix bug where 2024 codes are being pulled from 2023\n- Fixed some flake8 bugs\n- Fixed use of deprecated field in [networkx](https://networkx.org/documentation/networkx-2.8.7/reference/readwrite/generated/networkx.readwrite.json_graph.tree_graph.html)\n\n## 0.4.9, 0.5.0 and 0.5.1 (2024-01-08)\n\n- Added 2024 ICD-10-CM\n- Fixed a bug where the node descriptions were being discarded\n- Regenerated the hierarchy files with the descriptions\n\n## 0.4.8 (2023-05-02)\n\n- Use the newer scikit-learn PyPI location\n- Deprecate the Python 3.6 version\n- Add 2023 ICD-10-CM\n\n## 0.4.6 (2022-10-13)\n\n- Add 2022 ICD-10-CM (thank you @keithcallenberg!)\n\n## 0.4.4 and 0.4.5 (2020-10-18)\n\n- Add the code descriptions for ICD9\n- Add usage on how to recapitulate functionality of sirrice/icd9\n- Make the hierarchy directed to allow simpler and more intuitive traversal\n- Fix issue where edges were not being formed between \"Diseases Of The Blood And Blood-Forming Organs\" and \"Congenital Anomalies\" and their children\n\n## 0.4.3 (2020-10-04)\n\n- Fix issue where hierarchy jsons were not being shipped with the pypi distribution\n\n## 0.4.2 (2020-10-03)\n\n- Add support for python <= 3.8 in the `hierarchy` module by using the `importlib.resources` backport\n\n## 0.4.1 (2020-09-11)\n\n- Update PyPI metadata\n\n## 0.4.0 (2020-09-11)\n\n- ICD-10-CM (2019 to 2020) codes are now fully present (whereas hackathon version missed certain codes)\n- Versions of the ICD 9 and ICD-10-CM hierarchies are now cached to the `data` module\n- Changed the hierarchy API: `hierarchy.icd9hierarchy()` is now `hierarchy.icd9()`. Ditto for ICD-10-CM.\n\n## 0.3.0 (2020-09-05)\n\n- Finesse API, now consistent between documentation and implementation\n\n## 0.1.0 (2020-09-04)\n\n- First release on PyPI, testing the waters during hackathon\n",
    "bugtrack_url": null,
    "license": "MIT license",
    "summary": "graphical and continuous representations of ICD-9 and ICD-10 codes",
    "version": "0.5.2",
    "project_urls": {
        "Homepage": "https://github.com/icd-codex/icd-codex"
    },
    "split_keywords": [
        "icdcodex"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ae764eb30cdd85c50bf54a6931cddcb6c1136f53ef2669baba3c32325bb220d7",
                "md5": "c545cdbce710f4b7603b67a40a84e352",
                "sha256": "db8a7c3c3902fe24508b65711ea3f38f08dc185c38ff0b66e213c60ad82a4e6b"
            },
            "downloads": -1,
            "filename": "icdcodex-0.5.2-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c545cdbce710f4b7603b67a40a84e352",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 6708252,
            "upload_time": "2025-01-06T07:45:04",
            "upload_time_iso_8601": "2025-01-06T07:45:04.709366Z",
            "url": "https://files.pythonhosted.org/packages/ae/76/4eb30cdd85c50bf54a6931cddcb6c1136f53ef2669baba3c32325bb220d7/icdcodex-0.5.2-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "724d250db9de37f4b553132c87fb4b1e68c3989ed6369227679feac0c92403eb",
                "md5": "c7c9aa51c0f844e064302050d0253468",
                "sha256": "5930ff8c04d1d4e970bfce87927ad03e6a19d078c949753ba4b9defbe3522171"
            },
            "downloads": -1,
            "filename": "icdcodex-0.5.2.tar.gz",
            "has_sig": false,
            "md5_digest": "c7c9aa51c0f844e064302050d0253468",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 29737881,
            "upload_time": "2025-01-06T07:45:18",
            "upload_time_iso_8601": "2025-01-06T07:45:18.142128Z",
            "url": "https://files.pythonhosted.org/packages/72/4d/250db9de37f4b553132c87fb4b1e68c3989ed6369227679feac0c92403eb/icdcodex-0.5.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-06 07:45:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "icd-codex",
    "github_project": "icd-codex",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "icdcodex"
}
        
Elapsed time: 0.42943s