<div align="center" style="margin-bottom: 1em;">
# HealthChain π« π₯
<img src="https://raw.githubusercontent.com/dotimplement/HealthChain/main/docs/assets/images/healthchain_logo.png" alt="HealthChain Logo" width=300></img>
![GitHub License](https://img.shields.io/github/license/dotimplement/HealthChain)
![PyPI Version](https://img.shields.io/pypi/v/healthchain) ![Python Versions](https://img.shields.io/pypi/pyversions/healthchain)
![Downloads](https://img.shields.io/pypi/dm/healthchain)
</div>
Simplify developing, testing and validating AI and NLP applications in a healthcare context π« π₯.
Building applications that integrate with electronic health record systems (EHRs) is complex, and so is designing reliable, reactive algorithms involving unstructured data. Let's try to change that.
```bash
pip install healthchain
```
First time here? Check out our [Docs](https://dotimplement.github.io/HealthChain/) page!
Came here from NHS RPySOC 2024 β¨? [CDS sandbox walkthrough](https://dotimplement.github.io/HealthChain/cookbook/cds_sandbox/)
## Features
- [x] π οΈ Build custom pipelines or use [pre-built ones](https://dotimplement.github.io/HealthChain/reference/pipeline/pipeline/#prebuilt) for your healthcare NLP and ML tasks
- [x] ποΈ Add built-in [CDA and FHIR parsers](https://dotimplement.github.io/HealthChain/reference/utilities/cda_parser/) to connect your pipeline to interoperability standards
- [x] π§ͺ Test your pipelines in full healthcare-context aware [sandbox](https://dotimplement.github.io/HealthChain/reference/sandbox/sandbox/) environments
- [x] ποΈ Generate [synthetic healthcare data](https://dotimplement.github.io/HealthChain/reference/utilities/data_generator/) for testing and development
- [x] π Deploy sandbox servers locally with [FastAPI](https://fastapi.tiangolo.com/)
## Why use HealthChain?
- **EHR integrations are manual and time-consuming** - HealthChain abstracts away complexities so you can focus on AI development, not EHR configurations.
- **It's difficult to track and evaluate multiple integration instances** - HealthChain provides a framework to test the real-world resilience of your whole system, not just your models.
- [**Most healthcare data is unstructured**](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6372467/) - HealthChain is optimized for real-time AI and NLP applications that deal with realistic healthcare data.
- **Built by health tech developers, for health tech developers** - HealthChain is tech stack agnostic, modular, and easily extensible.
## Pipeline
Pipelines provide a flexible way to build and manage processing pipelines for NLP and ML tasks that can easily integrate with complex healthcare systems.
### Building a pipeline
```python
from healthchain.io.containers import Document
from healthchain.pipeline import Pipeline
from healthchain.pipeline.components import TextPreProcessor, SpacyNLP, TextPostProcessor
# Initialize the pipeline
nlp_pipeline = Pipeline[Document]()
# Add TextPreProcessor component
preprocessor = TextPreProcessor(tokenizer="spacy")
nlp_pipeline.add_node(preprocessor)
# Add Model component (assuming we have a pre-trained model)
spacy_nlp = SpacyNLP.from_model_id("en_core_sci_md", source="spacy")
nlp_pipeline.add_node(spacy_nlp)
# Add TextPostProcessor component
postprocessor = TextPostProcessor(
postcoordination_lookup={
"heart attack": "myocardial infarction",
"high blood pressure": "hypertension"
}
)
nlp_pipeline.add_node(postprocessor)
# Build the pipeline
nlp = nlp_pipeline.build()
# Use the pipeline
result = nlp(Document("Patient has a history of heart attack and high blood pressure."))
print(f"Entities: {result.nlp.spacy_doc.ents}")
```
#### Adding connectors
Connectors give your pipelines the ability to interface with EHRs.
```python
from healthchain.io import CdaConnector
from healthchain.models import CdaRequest
cda_connector = CdaConnector()
pipeline.add_input(cda_connector)
pipeline.add_output(cda_connector)
pipe = pipeline.build()
cda_data = CdaRequest(document="<CDA XML content>")
output = pipe(cda_data)
```
### Using pre-built pipelines
Pre-built pipelines are use case specific end-to-end workflows that already have connectors and models built-in.
```python
from healthchain.pipeline import MedicalCodingPipeline
from healthchain.models import CdaRequest
# Load from model ID
pipeline = MedicalCodingPipeline.from_model_id(
model="blaze999/Medical-NER", task="token-classification", source="huggingface"
)
# Or load from local model
pipeline = MedicalCodingPipeline.from_local_model("./path/to/model", source="spacy")
cda_data = CdaRequest(document="<CDA XML content>")
output = pipeline(cda_data)
```
## Sandbox
Sandboxes provide a staging environment for testing and validating your pipeline in a realistic healthcare context.
### Clinical Decision Support (CDS)
[CDS Hooks](https://cds-hooks.org/) is an [HL7](https://cds-hooks.hl7.org) published specification for clinical decision support.
**When is this used?** CDS hooks are triggered at certain events during a clinician's workflow in an electronic health record (EHR), e.g. when a patient record is opened, when an order is elected.
**What information is sent**: the context of the event and [FHIR](https://hl7.org/fhir/) resources that are requested by your service, for example, the patient ID and information on the encounter and conditions they are being seen for.
**What information is returned**: βcardsβ displaying text, actionable suggestions, or links to launch a [SMART](https://smarthealthit.org/) app from within the workflow.
```python
import healthchain as hc
from healthchain.pipeline import SummarizationPipeline
from healthchain.use_cases import ClinicalDecisionSupport
from healthchain.models import Card, CdsFhirData, CDSRequest
from healthchain.data_generator import CdsDataGenerator
from typing import List
@hc.sandbox
class MyCDS(ClinicalDecisionSupport):
def __init__(self) -> None:
self.pipeline = SummarizationPipeline.from_model_id(
"facebook/bart-large-cnn", source="huggingface"
)
self.data_generator = CdsDataGenerator()
# Sets up an instance of a mock EHR client of the specified workflow
@hc.ehr(workflow="encounter-discharge")
def ehr_database_client(self) -> CdsFhirData:
return self.data_generator.generate()
# Define your application logic here
@hc.api
def my_service(self, data: CDSRequest) -> CDSRequest:
result = self.pipeline(data)
return result
```
### Clinical Documentation
The `ClinicalDocumentation` use case implements a real-time Clinical Documentation Improvement (CDI) service. It helps convert free-text medical documentation into coded information that can be used for billing, quality reporting, and clinical decision support.
**When is this used?** Triggered when a clinician opts in to a CDI functionality (e.g. Epic NoteReader) and signs or pends a note after writing it.
**What information is sent**: A [CDA (Clinical Document Architecture)](https://www.hl7.org.uk/standards/hl7-standards/cda-clinical-document-architecture/) document which contains continuity of care data and free-text data, e.g. a patient's problem list and the progress note that the clinician has entered in the EHR.
```python
import healthchain as hc
from healthchain.pipeline import MedicalCodingPipeline
from healthchain.use_cases import ClinicalDocumentation
from healthchain.models import CcdData, CdaRequest, CdaResponse
@hc.sandbox
class NotereaderSandbox(ClinicalDocumentation):
def __init__(self):
self.pipeline = MedicalCodingPipeline.from_model_id(
"en_core_sci_md", source="spacy"
)
# Load an existing CDA file
@hc.ehr(workflow="sign-note-inpatient")
def load_data_in_client(self) -> CcdData:
with open("/path/to/cda/data.xml", "r") as file:
xml_string = file.read()
return CcdData(cda_xml=xml_string)
@hc.api
def my_service(self, data: CdaRequest) -> CdaResponse:
annotated_ccd = self.pipeline(data)
return annotated_ccd
```
### Running a sandbox
Ensure you run the following commands in your `mycds.py` file:
```python
cds = MyCDS()
cds.run_sandbox()
```
This will populate your EHR client with the data generation method you have defined, send requests to your server for processing, and save the data in the `./output` directory.
Then run:
```bash
healthchain run mycds.py
```
By default, the server runs at `http://127.0.0.1:8000`, and you can interact with the exposed endpoints at `/docs`.
## Road Map
- [ ] ποΈ Versioning and artifact management for pipelines sandbox EHR configurations
- [ ] β Testing and evaluation framework for pipelines and use cases
- [ ] π§ Multi-modal pipelines that that have built-in NLP to utilize unstructured data
- [ ] β¨ Improvements to synthetic data generator methods
- [ ] πΎ Frontend UI for EHR client and visualization features
- [ ] π Production deployment options
## Contribute
We are always eager to hear feedback and suggestions, especially if you are a developer or researcher working with healthcare systems!
- π‘ Let's chat! [Discord](https://discord.gg/UQC6uAepUz)
- π οΈ [Contribution Guidelines](CONTRIBUTING.md)
## Acknowledgement
This repository makes use of CDS Hooks developed by Boston Childrenβs Hospital.
Raw data
{
"_id": null,
"home_page": null,
"name": "healthchain",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.12,>=3.8",
"maintainer_email": null,
"keywords": "nlp, ai, llm, healthcare, ehr, mlops",
"author": "Jennifer Jiang-Kells",
"author_email": "jenniferjiangkells@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/cb/a1/23c5fdff971beda78b6a69f76a893ea46ad38dd8c1bacf86773a7150b9cc/healthchain-0.6.1.tar.gz",
"platform": null,
"description": "<div align=\"center\" style=\"margin-bottom: 1em;\">\n\n# HealthChain \ud83d\udcab \ud83c\udfe5\n\n<img src=\"https://raw.githubusercontent.com/dotimplement/HealthChain/main/docs/assets/images/healthchain_logo.png\" alt=\"HealthChain Logo\" width=300></img>\n\n![GitHub License](https://img.shields.io/github/license/dotimplement/HealthChain)\n![PyPI Version](https://img.shields.io/pypi/v/healthchain) ![Python Versions](https://img.shields.io/pypi/pyversions/healthchain)\n![Downloads](https://img.shields.io/pypi/dm/healthchain)\n\n</div>\n\nSimplify developing, testing and validating AI and NLP applications in a healthcare context \ud83d\udcab \ud83c\udfe5.\n\nBuilding applications that integrate with electronic health record systems (EHRs) is complex, and so is designing reliable, reactive algorithms involving unstructured data. Let's try to change that.\n\n```bash\npip install healthchain\n```\nFirst time here? Check out our [Docs](https://dotimplement.github.io/HealthChain/) page!\n\nCame here from NHS RPySOC 2024 \u2728? [CDS sandbox walkthrough](https://dotimplement.github.io/HealthChain/cookbook/cds_sandbox/)\n\n## Features\n- [x] \ud83d\udee0\ufe0f Build custom pipelines or use [pre-built ones](https://dotimplement.github.io/HealthChain/reference/pipeline/pipeline/#prebuilt) for your healthcare NLP and ML tasks\n- [x] \ud83c\udfd7\ufe0f Add built-in [CDA and FHIR parsers](https://dotimplement.github.io/HealthChain/reference/utilities/cda_parser/) to connect your pipeline to interoperability standards\n- [x] \ud83e\uddea Test your pipelines in full healthcare-context aware [sandbox](https://dotimplement.github.io/HealthChain/reference/sandbox/sandbox/) environments\n- [x] \ud83d\uddc3\ufe0f Generate [synthetic healthcare data](https://dotimplement.github.io/HealthChain/reference/utilities/data_generator/) for testing and development\n- [x] \ud83d\ude80 Deploy sandbox servers locally with [FastAPI](https://fastapi.tiangolo.com/)\n\n## Why use HealthChain?\n- **EHR integrations are manual and time-consuming** - HealthChain abstracts away complexities so you can focus on AI development, not EHR configurations.\n- **It's difficult to track and evaluate multiple integration instances** - HealthChain provides a framework to test the real-world resilience of your whole system, not just your models.\n- [**Most healthcare data is unstructured**](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6372467/) - HealthChain is optimized for real-time AI and NLP applications that deal with realistic healthcare data.\n- **Built by health tech developers, for health tech developers** - HealthChain is tech stack agnostic, modular, and easily extensible.\n\n## Pipeline\nPipelines provide a flexible way to build and manage processing pipelines for NLP and ML tasks that can easily integrate with complex healthcare systems.\n\n### Building a pipeline\n\n```python\nfrom healthchain.io.containers import Document\nfrom healthchain.pipeline import Pipeline\nfrom healthchain.pipeline.components import TextPreProcessor, SpacyNLP, TextPostProcessor\n\n# Initialize the pipeline\nnlp_pipeline = Pipeline[Document]()\n\n# Add TextPreProcessor component\npreprocessor = TextPreProcessor(tokenizer=\"spacy\")\nnlp_pipeline.add_node(preprocessor)\n\n# Add Model component (assuming we have a pre-trained model)\nspacy_nlp = SpacyNLP.from_model_id(\"en_core_sci_md\", source=\"spacy\")\nnlp_pipeline.add_node(spacy_nlp)\n\n# Add TextPostProcessor component\npostprocessor = TextPostProcessor(\n postcoordination_lookup={\n \"heart attack\": \"myocardial infarction\",\n \"high blood pressure\": \"hypertension\"\n }\n)\nnlp_pipeline.add_node(postprocessor)\n\n# Build the pipeline\nnlp = nlp_pipeline.build()\n\n# Use the pipeline\nresult = nlp(Document(\"Patient has a history of heart attack and high blood pressure.\"))\n\nprint(f\"Entities: {result.nlp.spacy_doc.ents}\")\n```\n\n#### Adding connectors\nConnectors give your pipelines the ability to interface with EHRs.\n\n```python\nfrom healthchain.io import CdaConnector\nfrom healthchain.models import CdaRequest\n\ncda_connector = CdaConnector()\n\npipeline.add_input(cda_connector)\npipeline.add_output(cda_connector)\n\npipe = pipeline.build()\n\ncda_data = CdaRequest(document=\"<CDA XML content>\")\noutput = pipe(cda_data)\n```\n\n### Using pre-built pipelines\nPre-built pipelines are use case specific end-to-end workflows that already have connectors and models built-in.\n\n```python\nfrom healthchain.pipeline import MedicalCodingPipeline\nfrom healthchain.models import CdaRequest\n\n# Load from model ID\npipeline = MedicalCodingPipeline.from_model_id(\n model=\"blaze999/Medical-NER\", task=\"token-classification\", source=\"huggingface\"\n)\n\n# Or load from local model\npipeline = MedicalCodingPipeline.from_local_model(\"./path/to/model\", source=\"spacy\")\n\ncda_data = CdaRequest(document=\"<CDA XML content>\")\noutput = pipeline(cda_data)\n```\n\n\n## Sandbox\n\nSandboxes provide a staging environment for testing and validating your pipeline in a realistic healthcare context.\n\n### Clinical Decision Support (CDS)\n[CDS Hooks](https://cds-hooks.org/) is an [HL7](https://cds-hooks.hl7.org) published specification for clinical decision support.\n\n**When is this used?** CDS hooks are triggered at certain events during a clinician's workflow in an electronic health record (EHR), e.g. when a patient record is opened, when an order is elected.\n\n**What information is sent**: the context of the event and [FHIR](https://hl7.org/fhir/) resources that are requested by your service, for example, the patient ID and information on the encounter and conditions they are being seen for.\n\n**What information is returned**: \u201ccards\u201d displaying text, actionable suggestions, or links to launch a [SMART](https://smarthealthit.org/) app from within the workflow.\n\n\n```python\nimport healthchain as hc\n\nfrom healthchain.pipeline import SummarizationPipeline\nfrom healthchain.use_cases import ClinicalDecisionSupport\nfrom healthchain.models import Card, CdsFhirData, CDSRequest\nfrom healthchain.data_generator import CdsDataGenerator\nfrom typing import List\n\n@hc.sandbox\nclass MyCDS(ClinicalDecisionSupport):\n def __init__(self) -> None:\n self.pipeline = SummarizationPipeline.from_model_id(\n \"facebook/bart-large-cnn\", source=\"huggingface\"\n )\n self.data_generator = CdsDataGenerator()\n\n # Sets up an instance of a mock EHR client of the specified workflow\n @hc.ehr(workflow=\"encounter-discharge\")\n def ehr_database_client(self) -> CdsFhirData:\n return self.data_generator.generate()\n\n # Define your application logic here\n @hc.api\n def my_service(self, data: CDSRequest) -> CDSRequest:\n result = self.pipeline(data)\n return result\n```\n\n### Clinical Documentation\n\nThe `ClinicalDocumentation` use case implements a real-time Clinical Documentation Improvement (CDI) service. It helps convert free-text medical documentation into coded information that can be used for billing, quality reporting, and clinical decision support.\n\n**When is this used?** Triggered when a clinician opts in to a CDI functionality (e.g. Epic NoteReader) and signs or pends a note after writing it.\n\n**What information is sent**: A [CDA (Clinical Document Architecture)](https://www.hl7.org.uk/standards/hl7-standards/cda-clinical-document-architecture/) document which contains continuity of care data and free-text data, e.g. a patient's problem list and the progress note that the clinician has entered in the EHR.\n\n```python\nimport healthchain as hc\n\nfrom healthchain.pipeline import MedicalCodingPipeline\nfrom healthchain.use_cases import ClinicalDocumentation\nfrom healthchain.models import CcdData, CdaRequest, CdaResponse\n\n@hc.sandbox\nclass NotereaderSandbox(ClinicalDocumentation):\n def __init__(self):\n self.pipeline = MedicalCodingPipeline.from_model_id(\n \"en_core_sci_md\", source=\"spacy\"\n )\n\n # Load an existing CDA file\n @hc.ehr(workflow=\"sign-note-inpatient\")\n def load_data_in_client(self) -> CcdData:\n with open(\"/path/to/cda/data.xml\", \"r\") as file:\n xml_string = file.read()\n\n return CcdData(cda_xml=xml_string)\n\n @hc.api\n def my_service(self, data: CdaRequest) -> CdaResponse:\n annotated_ccd = self.pipeline(data)\n return annotated_ccd\n```\n### Running a sandbox\n\nEnsure you run the following commands in your `mycds.py` file:\n\n```python\ncds = MyCDS()\ncds.run_sandbox()\n```\nThis will populate your EHR client with the data generation method you have defined, send requests to your server for processing, and save the data in the `./output` directory.\n\nThen run:\n```bash\nhealthchain run mycds.py\n```\nBy default, the server runs at `http://127.0.0.1:8000`, and you can interact with the exposed endpoints at `/docs`.\n\n## Road Map\n- [ ] \ud83c\udf9b\ufe0f Versioning and artifact management for pipelines sandbox EHR configurations\n- [ ] \u2753 Testing and evaluation framework for pipelines and use cases\n- [ ] \ud83e\udde0 Multi-modal pipelines that that have built-in NLP to utilize unstructured data\n- [ ] \u2728 Improvements to synthetic data generator methods\n- [ ] \ud83d\udc7e Frontend UI for EHR client and visualization features\n- [ ] \ud83d\ude80 Production deployment options\n\n## Contribute\nWe are always eager to hear feedback and suggestions, especially if you are a developer or researcher working with healthcare systems!\n- \ud83d\udca1 Let's chat! [Discord](https://discord.gg/UQC6uAepUz)\n- \ud83d\udee0\ufe0f [Contribution Guidelines](CONTRIBUTING.md)\n\n## Acknowledgement\nThis repository makes use of CDS Hooks developed by Boston Children\u2019s Hospital.\n\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Remarkably simple testing and validation of AI/NLP applications in healthcare context.",
"version": "0.6.1",
"project_urls": {
"Documentation": "https://dotimplement.github.io/HealthChain/"
},
"split_keywords": [
"nlp",
" ai",
" llm",
" healthcare",
" ehr",
" mlops"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3f81149f317c7814df847d8d33405ef5fc2478b8a447661dab40afbf0fcbcaeb",
"md5": "6592dc343640e1623db5594c2ee192a1",
"sha256": "2e6f8a66b8328a17324c4fd4de12e80edc405e861ffe5461306b81b939224b32"
},
"downloads": -1,
"filename": "healthchain-0.6.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6592dc343640e1623db5594c2ee192a1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.12,>=3.8",
"size": 151544,
"upload_time": "2024-11-27T15:12:13",
"upload_time_iso_8601": "2024-11-27T15:12:13.124587Z",
"url": "https://files.pythonhosted.org/packages/3f/81/149f317c7814df847d8d33405ef5fc2478b8a447661dab40afbf0fcbcaeb/healthchain-0.6.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "cba123c5fdff971beda78b6a69f76a893ea46ad38dd8c1bacf86773a7150b9cc",
"md5": "31f96ba0f7a50af7de0bbdad074aba21",
"sha256": "2d664066e4205e12e98175e83d4eecce0c685c93e013ee608967121a67b62d4e"
},
"downloads": -1,
"filename": "healthchain-0.6.1.tar.gz",
"has_sig": false,
"md5_digest": "31f96ba0f7a50af7de0bbdad074aba21",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.12,>=3.8",
"size": 107523,
"upload_time": "2024-11-27T15:12:14",
"upload_time_iso_8601": "2024-11-27T15:12:14.681113Z",
"url": "https://files.pythonhosted.org/packages/cb/a1/23c5fdff971beda78b6a69f76a893ea46ad38dd8c1bacf86773a7150b9cc/healthchain-0.6.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-27 15:12:14",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "healthchain"
}