openmed


Nameopenmed JSON
Version 0.1.10 PyPI version JSON
download
home_pageNone
SummaryOpenMed delivers state-of-the-art biomedical and clinical LLMs that rival proprietary enterprise stacks, unifying model discovery, advanced extractions, and one-line orchestration.
upload_time2025-10-17 21:29:50
maintainerNone
docs_urlNone
authorMaziyar Panahi
requires_python>=3.10
licenseApache-2.0
keywords llm nlp biomedical clinical healthcare medical medical llms medical ner medical nlp medical de-identification medical extraction medical language models medical reasoning natural language processing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # OpenMed

OpenMed is a Python toolkit for biomedical and clinical NLP, built to deliver state-of-the-art models, including advanced large language models (LLMs) for healthcare, that rival and often outperform proprietary enterprise solutions. It unifies model discovery, assertion status detection, de-identification pipelines, advanced extraction and reasoning tools, and one-line orchestration for scripts, services, or notebooks, enabling teams to deploy production-grade healthcare AI without vendor lock-in.

It also bundles configuration management, model loading, support for cutting-edge medical LLMs, post-processing, and formatting utilities — making it seamless to integrate clinical AI into existing scripts, services, and research workflows.

> **Status:** The package is pre-release and the API may change. Feedback and contributions are
> welcome while the project stabilises.

## Features

- **Curated model registry** with metadata for the OpenMed Hugging Face collection, including
  category filters, entity coverage, and confidence guidance.
- **One-line model loading** via `ModelLoader`, with optional pipeline creation,
  caching, and authenticated access to private models.
- **Advanced NER post-processing** (`AdvancedNERProcessor`) that applies the filtering and
  grouping techniques proven in the OpenMed demos.
- **Text preprocessing & tokenisation helpers** tailored for medical text workflows.
- **Output formatting utilities** that convert raw predictions into dict/JSON/HTML/CSV for
  downstream systems.
- **Logging and validation helpers** to keep pipelines observable and inputs safe.

## Installation

### Requirements

- Python 3.10 or newer.
- [`transformers`](https://huggingface.co/docs/transformers/index) and a compatible deep learning
  backend such as [PyTorch](https://pytorch.org/get-started/locally/).
- An optional `HF_TOKEN` environment variable if you need to access gated models.

### Install from PyPI

```bash
pip install openmed transformers
# Install a backend (PyTorch shown here; follow the instructions for your platform):
pip install torch --index-url https://download.pytorch.org/whl/cpu
```

If you plan to run on GPU, install the CUDA-enabled PyTorch wheels from the official instructions.

## Quick start

```python
from openmed.core import ModelLoader
from openmed.processing import format_predictions

loader = ModelLoader()  # uses the default configuration
ner = loader.create_pipeline(
    "disease_detection_superclinical",  # registry key or full model ID
    aggregation_strategy="simple",      # group sub-token predictions for quick wins
)

text = "Patient diagnosed with acute lymphoblastic leukemia and started on imatinib."
raw_predictions = ner(text)

result = format_predictions(raw_predictions, text, model_name="Disease Detection")
for entity in result.entities:
    print(f"{entity.label:<12} -> {entity.text} (confidence={entity.confidence:.2f})")
```

Use the convenience helper if you prefer a single call:

```python
from openmed import analyze_text

result = analyze_text(
    "Patient received 75mg clopidogrel for NSTEMI.",
    model_name="pharma_detection_superclinical"
)

for entity in result.entities:
    print(entity)
```

## Command-line usage

Install the package in the usual way and the `openmed` console command will be
available. It provides quick access to model discovery, text analysis, and
configuration management.

```bash
# List models from the bundled registry (add --include-remote for Hugging Face)
openmed models list
openmed models list --include-remote

# Analyse inline text or a file with a specific model
openmed analyze --model disease_detection_superclinical --text "Acute leukemia treated with imatinib."

# Inspect or edit the CLI configuration (defaults to ~/.config/openmed/config.toml)
openmed config show
openmed config set device cuda

# Inspect the model's inferred context window
openmed models info disease_detection_superclinical
```

Provide `--config-path /custom/path.toml` to work with a different configuration
file during automation or testing. Run `openmed --help` to see all options.

## Discovering models

```python
from openmed.core import ModelLoader
from openmed.core.model_registry import list_model_categories, get_models_by_category

loader = ModelLoader()
print(loader.list_available_models()[:5])  # Hugging Face + registry entries

suggestions = loader.get_model_suggestions(
    "Metastatic breast cancer treated with paclitaxel and trastuzumab"
)
for key, info, reason in suggestions:
    print(f"{info.display_name} -> {reason}")

print(list_model_categories())
for info in get_models_by_category("Oncology"):
    print(f"- {info.display_name} ({info.model_id})")

from openmed import get_model_max_length
print(get_model_max_length("disease_detection_superclinical"))
```

Or use the top-level helper:

```python
from openmed import list_models

print(list_models()[:10])
```

## Advanced NER processing

```python
from openmed.core import ModelLoader
from openmed.processing.advanced_ner import create_advanced_processor

loader = ModelLoader()
# aggregation_strategy=None yields raw token-level predictions for maximum control
ner = loader.create_pipeline("pharma_detection_superclinical", aggregation_strategy=None)

text = "Administered 75mg clopidogrel daily alongside aspirin for secondary stroke prevention."
raw = ner(text)

processor = create_advanced_processor(confidence_threshold=0.65)
entities = processor.process_pipeline_output(text, raw)
summary = processor.create_entity_summary(entities)

for entity in entities:
    print(f"{entity.label}: {entity.text} (score={entity.score:.3f})")

print(summary["by_type"])
```

## Text preprocessing & tokenisation

```python
from openmed.processing import TextProcessor, TokenizationHelper
from openmed.core import ModelLoader

text_processor = TextProcessor(normalize_whitespace=True, lowercase=False)
clean_text = text_processor.clean_text("BP 120/80, HR 88 bpm. Start Metformin 500mg bid.")
print(clean_text)

loader = ModelLoader()
model_data = loader.load_model("anatomy_detection_electramed")
token_helper = TokenizationHelper(model_data["tokenizer"])
encoding = token_helper.tokenize_with_alignment(clean_text)
print(encoding["tokens"][:10])
```

## Formatting outputs

```python
# Reuse `raw_predictions` and `text` from the quick start example
from openmed.processing import format_predictions

formatted = format_predictions(
    raw_predictions,
    text,
    model_name="Disease Detection",
    output_format="json",
    include_confidence=True,
    confidence_threshold=0.5,
)
print(formatted)  # JSON string ready for logging or storage
```

`format_predictions` can also return CSV rows or rich HTML snippets for dashboards.

## Configuration & logging

```python
from openmed.core import OpenMedConfig, ModelLoader
from openmed.utils import setup_logging

config = OpenMedConfig(
    default_org="OpenMed",
    cache_dir="/tmp/openmed-cache",
    device="cuda",  # "cpu", "cuda", or a specific device index
)
setup_logging(level="INFO")
loader = ModelLoader(config=config)
```

`OpenMedConfig` automatically picks up `HF_TOKEN` from the environment so you can access
private or gated models without storing credentials in code.

## Validation utilities

```python
from openmed.utils.validation import validate_input, validate_model_name

text = validate_input(user_supplied_text, max_length=2000)
model = validate_model_name("OpenMed/OpenMed-NER-DiseaseDetect-SuperClinical-434M")
```

Use these helpers to guard API endpoints or batch pipelines against malformed inputs.

## License

OpenMed is released under the Apache-2.0 License.

## Citing

If you use OpenMed in your research, please cite:

```bibtex
@misc{panahi2025openmedneropensourcedomainadapted,
      title={OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets},
      author={Maziyar Panahi},
      year={2025},
      eprint={2508.01630},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.01630},
}
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "openmed",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "LLM, NLP, biomedical, clinical, healthcare, medical, medical LLMs, medical NER, medical NLP, medical de-identification, medical extraction, medical language models, medical reasoning, natural language processing",
    "author": "Maziyar Panahi",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/40/2a/09024df8785d11358250e0881b1912d96dda60e488d86cddd648bc9ddd36/openmed-0.1.10.tar.gz",
    "platform": null,
    "description": "# OpenMed\n\nOpenMed is a Python toolkit for biomedical and clinical NLP, built to deliver state-of-the-art models, including advanced large language models (LLMs) for healthcare, that rival and often outperform proprietary enterprise solutions. It unifies model discovery, assertion status detection, de-identification pipelines, advanced extraction and reasoning tools, and one-line orchestration for scripts, services, or notebooks, enabling teams to deploy production-grade healthcare AI without vendor lock-in.\n\nIt also bundles configuration management, model loading, support for cutting-edge medical LLMs, post-processing, and formatting utilities \u2014 making it seamless to integrate clinical AI into existing scripts, services, and research workflows.\n\n> **Status:** The package is pre-release and the API may change. Feedback and contributions are\n> welcome while the project stabilises.\n\n## Features\n\n- **Curated model registry** with metadata for the OpenMed Hugging Face collection, including\n  category filters, entity coverage, and confidence guidance.\n- **One-line model loading** via `ModelLoader`, with optional pipeline creation,\n  caching, and authenticated access to private models.\n- **Advanced NER post-processing** (`AdvancedNERProcessor`) that applies the filtering and\n  grouping techniques proven in the OpenMed demos.\n- **Text preprocessing & tokenisation helpers** tailored for medical text workflows.\n- **Output formatting utilities** that convert raw predictions into dict/JSON/HTML/CSV for\n  downstream systems.\n- **Logging and validation helpers** to keep pipelines observable and inputs safe.\n\n## Installation\n\n### Requirements\n\n- Python 3.10 or newer.\n- [`transformers`](https://huggingface.co/docs/transformers/index) and a compatible deep learning\n  backend such as [PyTorch](https://pytorch.org/get-started/locally/).\n- An optional `HF_TOKEN` environment variable if you need to access gated models.\n\n### Install from PyPI\n\n```bash\npip install openmed transformers\n# Install a backend (PyTorch shown here; follow the instructions for your platform):\npip install torch --index-url https://download.pytorch.org/whl/cpu\n```\n\nIf you plan to run on GPU, install the CUDA-enabled PyTorch wheels from the official instructions.\n\n## Quick start\n\n```python\nfrom openmed.core import ModelLoader\nfrom openmed.processing import format_predictions\n\nloader = ModelLoader()  # uses the default configuration\nner = loader.create_pipeline(\n    \"disease_detection_superclinical\",  # registry key or full model ID\n    aggregation_strategy=\"simple\",      # group sub-token predictions for quick wins\n)\n\ntext = \"Patient diagnosed with acute lymphoblastic leukemia and started on imatinib.\"\nraw_predictions = ner(text)\n\nresult = format_predictions(raw_predictions, text, model_name=\"Disease Detection\")\nfor entity in result.entities:\n    print(f\"{entity.label:<12} -> {entity.text} (confidence={entity.confidence:.2f})\")\n```\n\nUse the convenience helper if you prefer a single call:\n\n```python\nfrom openmed import analyze_text\n\nresult = analyze_text(\n    \"Patient received 75mg clopidogrel for NSTEMI.\",\n    model_name=\"pharma_detection_superclinical\"\n)\n\nfor entity in result.entities:\n    print(entity)\n```\n\n## Command-line usage\n\nInstall the package in the usual way and the `openmed` console command will be\navailable. It provides quick access to model discovery, text analysis, and\nconfiguration management.\n\n```bash\n# List models from the bundled registry (add --include-remote for Hugging Face)\nopenmed models list\nopenmed models list --include-remote\n\n# Analyse inline text or a file with a specific model\nopenmed analyze --model disease_detection_superclinical --text \"Acute leukemia treated with imatinib.\"\n\n# Inspect or edit the CLI configuration (defaults to ~/.config/openmed/config.toml)\nopenmed config show\nopenmed config set device cuda\n\n# Inspect the model's inferred context window\nopenmed models info disease_detection_superclinical\n```\n\nProvide `--config-path /custom/path.toml` to work with a different configuration\nfile during automation or testing. Run `openmed --help` to see all options.\n\n## Discovering models\n\n```python\nfrom openmed.core import ModelLoader\nfrom openmed.core.model_registry import list_model_categories, get_models_by_category\n\nloader = ModelLoader()\nprint(loader.list_available_models()[:5])  # Hugging Face + registry entries\n\nsuggestions = loader.get_model_suggestions(\n    \"Metastatic breast cancer treated with paclitaxel and trastuzumab\"\n)\nfor key, info, reason in suggestions:\n    print(f\"{info.display_name} -> {reason}\")\n\nprint(list_model_categories())\nfor info in get_models_by_category(\"Oncology\"):\n    print(f\"- {info.display_name} ({info.model_id})\")\n\nfrom openmed import get_model_max_length\nprint(get_model_max_length(\"disease_detection_superclinical\"))\n```\n\nOr use the top-level helper:\n\n```python\nfrom openmed import list_models\n\nprint(list_models()[:10])\n```\n\n## Advanced NER processing\n\n```python\nfrom openmed.core import ModelLoader\nfrom openmed.processing.advanced_ner import create_advanced_processor\n\nloader = ModelLoader()\n# aggregation_strategy=None yields raw token-level predictions for maximum control\nner = loader.create_pipeline(\"pharma_detection_superclinical\", aggregation_strategy=None)\n\ntext = \"Administered 75mg clopidogrel daily alongside aspirin for secondary stroke prevention.\"\nraw = ner(text)\n\nprocessor = create_advanced_processor(confidence_threshold=0.65)\nentities = processor.process_pipeline_output(text, raw)\nsummary = processor.create_entity_summary(entities)\n\nfor entity in entities:\n    print(f\"{entity.label}: {entity.text} (score={entity.score:.3f})\")\n\nprint(summary[\"by_type\"])\n```\n\n## Text preprocessing & tokenisation\n\n```python\nfrom openmed.processing import TextProcessor, TokenizationHelper\nfrom openmed.core import ModelLoader\n\ntext_processor = TextProcessor(normalize_whitespace=True, lowercase=False)\nclean_text = text_processor.clean_text(\"BP 120/80, HR 88 bpm. Start Metformin 500mg bid.\")\nprint(clean_text)\n\nloader = ModelLoader()\nmodel_data = loader.load_model(\"anatomy_detection_electramed\")\ntoken_helper = TokenizationHelper(model_data[\"tokenizer\"])\nencoding = token_helper.tokenize_with_alignment(clean_text)\nprint(encoding[\"tokens\"][:10])\n```\n\n## Formatting outputs\n\n```python\n# Reuse `raw_predictions` and `text` from the quick start example\nfrom openmed.processing import format_predictions\n\nformatted = format_predictions(\n    raw_predictions,\n    text,\n    model_name=\"Disease Detection\",\n    output_format=\"json\",\n    include_confidence=True,\n    confidence_threshold=0.5,\n)\nprint(formatted)  # JSON string ready for logging or storage\n```\n\n`format_predictions` can also return CSV rows or rich HTML snippets for dashboards.\n\n## Configuration & logging\n\n```python\nfrom openmed.core import OpenMedConfig, ModelLoader\nfrom openmed.utils import setup_logging\n\nconfig = OpenMedConfig(\n    default_org=\"OpenMed\",\n    cache_dir=\"/tmp/openmed-cache\",\n    device=\"cuda\",  # \"cpu\", \"cuda\", or a specific device index\n)\nsetup_logging(level=\"INFO\")\nloader = ModelLoader(config=config)\n```\n\n`OpenMedConfig` automatically picks up `HF_TOKEN` from the environment so you can access\nprivate or gated models without storing credentials in code.\n\n## Validation utilities\n\n```python\nfrom openmed.utils.validation import validate_input, validate_model_name\n\ntext = validate_input(user_supplied_text, max_length=2000)\nmodel = validate_model_name(\"OpenMed/OpenMed-NER-DiseaseDetect-SuperClinical-434M\")\n```\n\nUse these helpers to guard API endpoints or batch pipelines against malformed inputs.\n\n## License\n\nOpenMed is released under the Apache-2.0 License.\n\n## Citing\n\nIf you use OpenMed in your research, please cite:\n\n```bibtex\n@misc{panahi2025openmedneropensourcedomainadapted,\n      title={OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets},\n      author={Maziyar Panahi},\n      year={2025},\n      eprint={2508.01630},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https://arxiv.org/abs/2508.01630},\n}\n```\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "OpenMed delivers state-of-the-art biomedical and clinical LLMs that rival proprietary enterprise stacks, unifying model discovery, advanced extractions, and one-line orchestration.",
    "version": "0.1.10",
    "project_urls": null,
    "split_keywords": [
        "llm",
        " nlp",
        " biomedical",
        " clinical",
        " healthcare",
        " medical",
        " medical llms",
        " medical ner",
        " medical nlp",
        " medical de-identification",
        " medical extraction",
        " medical language models",
        " medical reasoning",
        " natural language processing"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "060f5f7ab41b022e8f0cb5d9480459e9ce2ae35c553a89d18eb9818aaac604de",
                "md5": "f2b010b2c5d14dea612c583f0d130ed1",
                "sha256": "c30a5eaec9fe83d7ecee2dcbfd6c771539193fb9e8b757c0b17b8a39fc73aac3"
            },
            "downloads": -1,
            "filename": "openmed-0.1.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f2b010b2c5d14dea612c583f0d130ed1",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 47129,
            "upload_time": "2025-10-17T21:29:51",
            "upload_time_iso_8601": "2025-10-17T21:29:51.903081Z",
            "url": "https://files.pythonhosted.org/packages/06/0f/5f7ab41b022e8f0cb5d9480459e9ce2ae35c553a89d18eb9818aaac604de/openmed-0.1.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "402a09024df8785d11358250e0881b1912d96dda60e488d86cddd648bc9ddd36",
                "md5": "9d71a6b2b771be9b6138d04b6bedd315",
                "sha256": "ed1343d929d85f2f9a942fbebaed4bda2a23bb6980ff238a35d3a20b49bfc79d"
            },
            "downloads": -1,
            "filename": "openmed-0.1.10.tar.gz",
            "has_sig": false,
            "md5_digest": "9d71a6b2b771be9b6138d04b6bedd315",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 36337,
            "upload_time": "2025-10-17T21:29:50",
            "upload_time_iso_8601": "2025-10-17T21:29:50.924260Z",
            "url": "https://files.pythonhosted.org/packages/40/2a/09024df8785d11358250e0881b1912d96dda60e488d86cddd648bc9ddd36/openmed-0.1.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-17 21:29:50",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "openmed"
}
        
Elapsed time: 3.33876s