# OpenMed
OpenMed is a Python toolkit for biomedical and clinical NLP, built to deliver state-of-the-art models, including advanced large language models (LLMs) for healthcare, that rival and often outperform proprietary enterprise solutions. It unifies model discovery, assertion status detection, de-identification pipelines, advanced extraction and reasoning tools, and one-line orchestration for scripts, services, or notebooks, enabling teams to deploy production-grade healthcare AI without vendor lock-in.
It also bundles configuration management, model loading, support for cutting-edge medical LLMs, post-processing, and formatting utilities — making it seamless to integrate clinical AI into existing scripts, services, and research workflows.
> **Status:** The package is pre-release and the API may change. Feedback and contributions are
> welcome while the project stabilises.
## Features
- **Curated model registry** with metadata for the OpenMed Hugging Face collection, including
category filters, entity coverage, and confidence guidance.
- **One-line model loading** via `ModelLoader`, with optional pipeline creation,
caching, and authenticated access to private models.
- **Advanced NER post-processing** (`AdvancedNERProcessor`) that applies the filtering and
grouping techniques proven in the OpenMed demos.
- **Text preprocessing & tokenisation helpers** tailored for medical text workflows.
- **Output formatting utilities** that convert raw predictions into dict/JSON/HTML/CSV for
downstream systems.
- **Logging and validation helpers** to keep pipelines observable and inputs safe.
## Installation
### Requirements
- Python 3.10 or newer.
- [`transformers`](https://huggingface.co/docs/transformers/index) and a compatible deep learning
backend such as [PyTorch](https://pytorch.org/get-started/locally/).
- An optional `HF_TOKEN` environment variable if you need to access gated models.
### Install from PyPI
```bash
pip install openmed transformers
# Install a backend (PyTorch shown here; follow the instructions for your platform):
pip install torch --index-url https://download.pytorch.org/whl/cpu
```
If you plan to run on GPU, install the CUDA-enabled PyTorch wheels from the official instructions.
## Quick start
```python
from openmed.core import ModelLoader
from openmed.processing import format_predictions
loader = ModelLoader() # uses the default configuration
ner = loader.create_pipeline(
"disease_detection_superclinical", # registry key or full model ID
aggregation_strategy="simple", # group sub-token predictions for quick wins
)
text = "Patient diagnosed with acute lymphoblastic leukemia and started on imatinib."
raw_predictions = ner(text)
result = format_predictions(raw_predictions, text, model_name="Disease Detection")
for entity in result.entities:
print(f"{entity.label:<12} -> {entity.text} (confidence={entity.confidence:.2f})")
```
Use the convenience helper if you prefer a single call:
```python
from openmed import analyze_text
result = analyze_text(
"Patient received 75mg clopidogrel for NSTEMI.",
model_name="pharma_detection_superclinical"
)
for entity in result.entities:
print(entity)
```
## Command-line usage
Install the package in the usual way and the `openmed` console command will be
available. It provides quick access to model discovery, text analysis, and
configuration management.
```bash
# List models from the bundled registry (add --include-remote for Hugging Face)
openmed models list
openmed models list --include-remote
# Analyse inline text or a file with a specific model
openmed analyze --model disease_detection_superclinical --text "Acute leukemia treated with imatinib."
# Inspect or edit the CLI configuration (defaults to ~/.config/openmed/config.toml)
openmed config show
openmed config set device cuda
# Inspect the model's inferred context window
openmed models info disease_detection_superclinical
```
Provide `--config-path /custom/path.toml` to work with a different configuration
file during automation or testing. Run `openmed --help` to see all options.
## Discovering models
```python
from openmed.core import ModelLoader
from openmed.core.model_registry import list_model_categories, get_models_by_category
loader = ModelLoader()
print(loader.list_available_models()[:5]) # Hugging Face + registry entries
suggestions = loader.get_model_suggestions(
"Metastatic breast cancer treated with paclitaxel and trastuzumab"
)
for key, info, reason in suggestions:
print(f"{info.display_name} -> {reason}")
print(list_model_categories())
for info in get_models_by_category("Oncology"):
print(f"- {info.display_name} ({info.model_id})")
from openmed import get_model_max_length
print(get_model_max_length("disease_detection_superclinical"))
```
Or use the top-level helper:
```python
from openmed import list_models
print(list_models()[:10])
```
## Advanced NER processing
```python
from openmed.core import ModelLoader
from openmed.processing.advanced_ner import create_advanced_processor
loader = ModelLoader()
# aggregation_strategy=None yields raw token-level predictions for maximum control
ner = loader.create_pipeline("pharma_detection_superclinical", aggregation_strategy=None)
text = "Administered 75mg clopidogrel daily alongside aspirin for secondary stroke prevention."
raw = ner(text)
processor = create_advanced_processor(confidence_threshold=0.65)
entities = processor.process_pipeline_output(text, raw)
summary = processor.create_entity_summary(entities)
for entity in entities:
print(f"{entity.label}: {entity.text} (score={entity.score:.3f})")
print(summary["by_type"])
```
## Text preprocessing & tokenisation
```python
from openmed.processing import TextProcessor, TokenizationHelper
from openmed.core import ModelLoader
text_processor = TextProcessor(normalize_whitespace=True, lowercase=False)
clean_text = text_processor.clean_text("BP 120/80, HR 88 bpm. Start Metformin 500mg bid.")
print(clean_text)
loader = ModelLoader()
model_data = loader.load_model("anatomy_detection_electramed")
token_helper = TokenizationHelper(model_data["tokenizer"])
encoding = token_helper.tokenize_with_alignment(clean_text)
print(encoding["tokens"][:10])
```
## Formatting outputs
```python
# Reuse `raw_predictions` and `text` from the quick start example
from openmed.processing import format_predictions
formatted = format_predictions(
raw_predictions,
text,
model_name="Disease Detection",
output_format="json",
include_confidence=True,
confidence_threshold=0.5,
)
print(formatted) # JSON string ready for logging or storage
```
`format_predictions` can also return CSV rows or rich HTML snippets for dashboards.
## Configuration & logging
```python
from openmed.core import OpenMedConfig, ModelLoader
from openmed.utils import setup_logging
config = OpenMedConfig(
default_org="OpenMed",
cache_dir="/tmp/openmed-cache",
device="cuda", # "cpu", "cuda", or a specific device index
)
setup_logging(level="INFO")
loader = ModelLoader(config=config)
```
`OpenMedConfig` automatically picks up `HF_TOKEN` from the environment so you can access
private or gated models without storing credentials in code.
## Validation utilities
```python
from openmed.utils.validation import validate_input, validate_model_name
text = validate_input(user_supplied_text, max_length=2000)
model = validate_model_name("OpenMed/OpenMed-NER-DiseaseDetect-SuperClinical-434M")
```
Use these helpers to guard API endpoints or batch pipelines against malformed inputs.
## License
OpenMed is released under the Apache-2.0 License.
## Citing
If you use OpenMed in your research, please cite:
```bibtex
@misc{panahi2025openmedneropensourcedomainadapted,
title={OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets},
author={Maziyar Panahi},
year={2025},
eprint={2508.01630},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.01630},
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "openmed",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "LLM, NLP, biomedical, clinical, healthcare, medical, medical LLMs, medical NER, medical NLP, medical de-identification, medical extraction, medical language models, medical reasoning, natural language processing",
"author": "Maziyar Panahi",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/40/2a/09024df8785d11358250e0881b1912d96dda60e488d86cddd648bc9ddd36/openmed-0.1.10.tar.gz",
"platform": null,
"description": "# OpenMed\n\nOpenMed is a Python toolkit for biomedical and clinical NLP, built to deliver state-of-the-art models, including advanced large language models (LLMs) for healthcare, that rival and often outperform proprietary enterprise solutions. It unifies model discovery, assertion status detection, de-identification pipelines, advanced extraction and reasoning tools, and one-line orchestration for scripts, services, or notebooks, enabling teams to deploy production-grade healthcare AI without vendor lock-in.\n\nIt also bundles configuration management, model loading, support for cutting-edge medical LLMs, post-processing, and formatting utilities \u2014 making it seamless to integrate clinical AI into existing scripts, services, and research workflows.\n\n> **Status:** The package is pre-release and the API may change. Feedback and contributions are\n> welcome while the project stabilises.\n\n## Features\n\n- **Curated model registry** with metadata for the OpenMed Hugging Face collection, including\n category filters, entity coverage, and confidence guidance.\n- **One-line model loading** via `ModelLoader`, with optional pipeline creation,\n caching, and authenticated access to private models.\n- **Advanced NER post-processing** (`AdvancedNERProcessor`) that applies the filtering and\n grouping techniques proven in the OpenMed demos.\n- **Text preprocessing & tokenisation helpers** tailored for medical text workflows.\n- **Output formatting utilities** that convert raw predictions into dict/JSON/HTML/CSV for\n downstream systems.\n- **Logging and validation helpers** to keep pipelines observable and inputs safe.\n\n## Installation\n\n### Requirements\n\n- Python 3.10 or newer.\n- [`transformers`](https://huggingface.co/docs/transformers/index) and a compatible deep learning\n backend such as [PyTorch](https://pytorch.org/get-started/locally/).\n- An optional `HF_TOKEN` environment variable if you need to access gated models.\n\n### Install from PyPI\n\n```bash\npip install openmed transformers\n# Install a backend (PyTorch shown here; follow the instructions for your platform):\npip install torch --index-url https://download.pytorch.org/whl/cpu\n```\n\nIf you plan to run on GPU, install the CUDA-enabled PyTorch wheels from the official instructions.\n\n## Quick start\n\n```python\nfrom openmed.core import ModelLoader\nfrom openmed.processing import format_predictions\n\nloader = ModelLoader() # uses the default configuration\nner = loader.create_pipeline(\n \"disease_detection_superclinical\", # registry key or full model ID\n aggregation_strategy=\"simple\", # group sub-token predictions for quick wins\n)\n\ntext = \"Patient diagnosed with acute lymphoblastic leukemia and started on imatinib.\"\nraw_predictions = ner(text)\n\nresult = format_predictions(raw_predictions, text, model_name=\"Disease Detection\")\nfor entity in result.entities:\n print(f\"{entity.label:<12} -> {entity.text} (confidence={entity.confidence:.2f})\")\n```\n\nUse the convenience helper if you prefer a single call:\n\n```python\nfrom openmed import analyze_text\n\nresult = analyze_text(\n \"Patient received 75mg clopidogrel for NSTEMI.\",\n model_name=\"pharma_detection_superclinical\"\n)\n\nfor entity in result.entities:\n print(entity)\n```\n\n## Command-line usage\n\nInstall the package in the usual way and the `openmed` console command will be\navailable. It provides quick access to model discovery, text analysis, and\nconfiguration management.\n\n```bash\n# List models from the bundled registry (add --include-remote for Hugging Face)\nopenmed models list\nopenmed models list --include-remote\n\n# Analyse inline text or a file with a specific model\nopenmed analyze --model disease_detection_superclinical --text \"Acute leukemia treated with imatinib.\"\n\n# Inspect or edit the CLI configuration (defaults to ~/.config/openmed/config.toml)\nopenmed config show\nopenmed config set device cuda\n\n# Inspect the model's inferred context window\nopenmed models info disease_detection_superclinical\n```\n\nProvide `--config-path /custom/path.toml` to work with a different configuration\nfile during automation or testing. Run `openmed --help` to see all options.\n\n## Discovering models\n\n```python\nfrom openmed.core import ModelLoader\nfrom openmed.core.model_registry import list_model_categories, get_models_by_category\n\nloader = ModelLoader()\nprint(loader.list_available_models()[:5]) # Hugging Face + registry entries\n\nsuggestions = loader.get_model_suggestions(\n \"Metastatic breast cancer treated with paclitaxel and trastuzumab\"\n)\nfor key, info, reason in suggestions:\n print(f\"{info.display_name} -> {reason}\")\n\nprint(list_model_categories())\nfor info in get_models_by_category(\"Oncology\"):\n print(f\"- {info.display_name} ({info.model_id})\")\n\nfrom openmed import get_model_max_length\nprint(get_model_max_length(\"disease_detection_superclinical\"))\n```\n\nOr use the top-level helper:\n\n```python\nfrom openmed import list_models\n\nprint(list_models()[:10])\n```\n\n## Advanced NER processing\n\n```python\nfrom openmed.core import ModelLoader\nfrom openmed.processing.advanced_ner import create_advanced_processor\n\nloader = ModelLoader()\n# aggregation_strategy=None yields raw token-level predictions for maximum control\nner = loader.create_pipeline(\"pharma_detection_superclinical\", aggregation_strategy=None)\n\ntext = \"Administered 75mg clopidogrel daily alongside aspirin for secondary stroke prevention.\"\nraw = ner(text)\n\nprocessor = create_advanced_processor(confidence_threshold=0.65)\nentities = processor.process_pipeline_output(text, raw)\nsummary = processor.create_entity_summary(entities)\n\nfor entity in entities:\n print(f\"{entity.label}: {entity.text} (score={entity.score:.3f})\")\n\nprint(summary[\"by_type\"])\n```\n\n## Text preprocessing & tokenisation\n\n```python\nfrom openmed.processing import TextProcessor, TokenizationHelper\nfrom openmed.core import ModelLoader\n\ntext_processor = TextProcessor(normalize_whitespace=True, lowercase=False)\nclean_text = text_processor.clean_text(\"BP 120/80, HR 88 bpm. Start Metformin 500mg bid.\")\nprint(clean_text)\n\nloader = ModelLoader()\nmodel_data = loader.load_model(\"anatomy_detection_electramed\")\ntoken_helper = TokenizationHelper(model_data[\"tokenizer\"])\nencoding = token_helper.tokenize_with_alignment(clean_text)\nprint(encoding[\"tokens\"][:10])\n```\n\n## Formatting outputs\n\n```python\n# Reuse `raw_predictions` and `text` from the quick start example\nfrom openmed.processing import format_predictions\n\nformatted = format_predictions(\n raw_predictions,\n text,\n model_name=\"Disease Detection\",\n output_format=\"json\",\n include_confidence=True,\n confidence_threshold=0.5,\n)\nprint(formatted) # JSON string ready for logging or storage\n```\n\n`format_predictions` can also return CSV rows or rich HTML snippets for dashboards.\n\n## Configuration & logging\n\n```python\nfrom openmed.core import OpenMedConfig, ModelLoader\nfrom openmed.utils import setup_logging\n\nconfig = OpenMedConfig(\n default_org=\"OpenMed\",\n cache_dir=\"/tmp/openmed-cache\",\n device=\"cuda\", # \"cpu\", \"cuda\", or a specific device index\n)\nsetup_logging(level=\"INFO\")\nloader = ModelLoader(config=config)\n```\n\n`OpenMedConfig` automatically picks up `HF_TOKEN` from the environment so you can access\nprivate or gated models without storing credentials in code.\n\n## Validation utilities\n\n```python\nfrom openmed.utils.validation import validate_input, validate_model_name\n\ntext = validate_input(user_supplied_text, max_length=2000)\nmodel = validate_model_name(\"OpenMed/OpenMed-NER-DiseaseDetect-SuperClinical-434M\")\n```\n\nUse these helpers to guard API endpoints or batch pipelines against malformed inputs.\n\n## License\n\nOpenMed is released under the Apache-2.0 License.\n\n## Citing\n\nIf you use OpenMed in your research, please cite:\n\n```bibtex\n@misc{panahi2025openmedneropensourcedomainadapted,\n title={OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets},\n author={Maziyar Panahi},\n year={2025},\n eprint={2508.01630},\n archivePrefix={arXiv},\n primaryClass={cs.CL},\n url={https://arxiv.org/abs/2508.01630},\n}\n```\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "OpenMed delivers state-of-the-art biomedical and clinical LLMs that rival proprietary enterprise stacks, unifying model discovery, advanced extractions, and one-line orchestration.",
"version": "0.1.10",
"project_urls": null,
"split_keywords": [
"llm",
" nlp",
" biomedical",
" clinical",
" healthcare",
" medical",
" medical llms",
" medical ner",
" medical nlp",
" medical de-identification",
" medical extraction",
" medical language models",
" medical reasoning",
" natural language processing"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "060f5f7ab41b022e8f0cb5d9480459e9ce2ae35c553a89d18eb9818aaac604de",
"md5": "f2b010b2c5d14dea612c583f0d130ed1",
"sha256": "c30a5eaec9fe83d7ecee2dcbfd6c771539193fb9e8b757c0b17b8a39fc73aac3"
},
"downloads": -1,
"filename": "openmed-0.1.10-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f2b010b2c5d14dea612c583f0d130ed1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 47129,
"upload_time": "2025-10-17T21:29:51",
"upload_time_iso_8601": "2025-10-17T21:29:51.903081Z",
"url": "https://files.pythonhosted.org/packages/06/0f/5f7ab41b022e8f0cb5d9480459e9ce2ae35c553a89d18eb9818aaac604de/openmed-0.1.10-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "402a09024df8785d11358250e0881b1912d96dda60e488d86cddd648bc9ddd36",
"md5": "9d71a6b2b771be9b6138d04b6bedd315",
"sha256": "ed1343d929d85f2f9a942fbebaed4bda2a23bb6980ff238a35d3a20b49bfc79d"
},
"downloads": -1,
"filename": "openmed-0.1.10.tar.gz",
"has_sig": false,
"md5_digest": "9d71a6b2b771be9b6138d04b6bedd315",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 36337,
"upload_time": "2025-10-17T21:29:50",
"upload_time_iso_8601": "2025-10-17T21:29:50.924260Z",
"url": "https://files.pythonhosted.org/packages/40/2a/09024df8785d11358250e0881b1912d96dda60e488d86cddd648bc9ddd36/openmed-0.1.10.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-17 21:29:50",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "openmed"
}