medkit-lib


Namemedkit-lib JSON
Version 0.16.0 PyPI version JSON
download
home_pageNone
SummaryA Python library for a learning health system
upload_time2024-05-22 08:29:58
maintainerNone
docs_urlNone
authorHeKA Research Team
requires_python>=3.8
licenseNone
keywords bert digital health ehr nlp umls
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # medkit

![medkit logo](https://github.com/medkit-lib/medkit/blob/main/docs/_static/medkit-logo.png?raw=true)

|         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|---------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| CI      | [![docs status](https://readthedocs.org/projects/medkit/badge/?version=latest)](https://medkit.readthedocs.io/en/latest/) [![pre-commit status](https://github.com/medkit-lib/medkit/actions/workflows/pre-commit.yaml/badge.svg)](https://github.com/medkit-lib/medkit/actions/workflows/pre-commit.yaml) [![test: status](https://github.com/medkit-lib/medkit/actions/workflows/test.yaml/badge.svg)](https://github.com/medkit-lib/medkit/actions/workflows/test.yaml) |
| Package | [![PyPI version](https://img.shields.io/pypi/v/medkit-lib.svg?logo=pypi&label=PyPI&logoColor=gold)](https://pypi.org/project/medkit-lib/) [![PyPI Python versions](https://img.shields.io/pypi/pyversions/medkit-lib.svg?logo=python&label=Python&logoColor=gold)](https://pypi.org/project/medkit-lib/)                                                                                                                                                                   |
| Project | [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://spdx.org/licenses/MIT.html) [![Formatter: Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff) [![Project: Hatch](https://img.shields.io/badge/%F0%9F%A5%9A-Hatch-4051b5.svg)](https://hatch.pypa.io)                                                                                   |

----

`medkit` is a toolkit for a learning health system, developed by the [HeKA research team](https://team.inria.fr/heka).

This python library aims at:

1. Facilitating the manipulation of healthcare data of various modalities (e.g., structured, text, audio data)
for the extraction of relevant features.

2. Developing supervised models from these various modalities for decision support in healthcare.

## Installation

To install `medkit` with basic functionalities:

```console
pip install medkit-lib
```

To install `medkit` with all its optional features:

```console
pip install 'medkit-lib[all]'
```

## Example

A basic named-entity recognition pipeline using `medkit`:

```python
# 1. Define individual operations.
from medkit.text.preprocessing import CharReplacer, LIGATURE_RULES, SIGN_RULES
from medkit.text.segmentation import SentenceTokenizer, SyntagmaTokenizer
from medkit.text.context.negation_detector import NegationDetector
from medkit.text.ner.hf_entity_matcher import HFEntityMatcher

# Preprocessing
char_replacer = CharReplacer(rules=LIGATURE_RULES + SIGN_RULES)
# Segmentation
sent_tokenizer = SentenceTokenizer(output_label="sentence")
synt_tokenizer = SyntagmaTokenizer(output_label="syntagma")
# Negation detection
neg_detector = NegationDetector(output_label="is_negated")
# Entity recognition
entity_matcher = HFEntityMatcher(model="my-BERT-model", attrs_to_copy=["is_negated"])

# 2. Combine operations into a pipeline.
from medkit.core.pipeline import Pipeline, PipelineStep

ner_pipeline = Pipeline(
    input_keys=["full_text"],
    output_keys=["entities"],
    steps=[
        PipelineStep(char_replacer, input_keys=["full_text"], output_keys=["clean_text"]),
        PipelineStep(sent_tokenizer, input_keys=["clean_text"], output_keys=["sentences"]),
        PipelineStep(synt_tokenizer, input_keys=["sentences"], output_keys=["syntagmas"]),
        PipelineStep(neg_detector, input_keys=["syntagmas"], output_keys=[]),
        PipelineStep(entity_matcher, input_keys=["syntagmas"], output_keys=["entities"]),
    ],
)

# 3. Run the NER pipeline on a BRAT document.
from medkit.io import BratInputConverter

docs = BratInputConverter().load(path="/path/to/dataset/")
entities = ner_pipeline.run([doc.raw_segment for doc in docs])
```

## Getting started

To get started with `medkit`, please checkout our [documentation](https://medkit.readthedocs.io/).

This documentation also contains tutorials and examples showcasing the use of `medkit` for different tasks.

## Contributing

Thank you for your interest into medkit !

We'll be happy to get your inputs !

If your problem has not been reported by another user, please open an
[issue](https://github.com/medkit-lib/medkit/issues), whether it's for:

* reporting a bug, 
* discussing the current state of the code, 
* submitting a fix, 
* proposing new features, 
* or contributing to documentation, ...

If you want to propose a pull request, you can read [CONTRIBUTING.md](./CONTRIBUTING.md).

## Contact

Feel free to contact us by sending an email to [medkit-maintainers@inria.fr](mailto:medkit-maintainers@inria.fr).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "medkit-lib",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "medkit maintainers <medkit-maintainers@inria.fr>",
    "keywords": "bert, digital health, ehr, nlp, umls",
    "author": "HeKA Research Team",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/18/e8/8a2b4d2c4a0d5e0b065c1a533793bb687967e7c54295e7d685f3ebc1aea5/medkit_lib-0.16.0.tar.gz",
    "platform": null,
    "description": "# medkit\n\n![medkit logo](https://github.com/medkit-lib/medkit/blob/main/docs/_static/medkit-logo.png?raw=true)\n\n|         |                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |\n|---------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| CI      | [![docs status](https://readthedocs.org/projects/medkit/badge/?version=latest)](https://medkit.readthedocs.io/en/latest/) [![pre-commit status](https://github.com/medkit-lib/medkit/actions/workflows/pre-commit.yaml/badge.svg)](https://github.com/medkit-lib/medkit/actions/workflows/pre-commit.yaml) [![test: status](https://github.com/medkit-lib/medkit/actions/workflows/test.yaml/badge.svg)](https://github.com/medkit-lib/medkit/actions/workflows/test.yaml) |\n| Package | [![PyPI version](https://img.shields.io/pypi/v/medkit-lib.svg?logo=pypi&label=PyPI&logoColor=gold)](https://pypi.org/project/medkit-lib/) [![PyPI Python versions](https://img.shields.io/pypi/pyversions/medkit-lib.svg?logo=python&label=Python&logoColor=gold)](https://pypi.org/project/medkit-lib/)                                                                                                                                                                   |\n| Project | [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://spdx.org/licenses/MIT.html) [![Formatter: Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff) [![Project: Hatch](https://img.shields.io/badge/%F0%9F%A5%9A-Hatch-4051b5.svg)](https://hatch.pypa.io)                                                                                   |\n\n----\n\n`medkit` is a toolkit for a learning health system, developed by the [HeKA research team](https://team.inria.fr/heka).\n\nThis python library aims at:\n\n1. Facilitating the manipulation of healthcare data of various modalities (e.g., structured, text, audio data)\nfor the extraction of relevant features.\n\n2. Developing supervised models from these various modalities for decision support in healthcare.\n\n## Installation\n\nTo install `medkit` with basic functionalities:\n\n```console\npip install medkit-lib\n```\n\nTo install `medkit` with all its optional features:\n\n```console\npip install 'medkit-lib[all]'\n```\n\n## Example\n\nA basic named-entity recognition pipeline using `medkit`:\n\n```python\n# 1. Define individual operations.\nfrom medkit.text.preprocessing import CharReplacer, LIGATURE_RULES, SIGN_RULES\nfrom medkit.text.segmentation import SentenceTokenizer, SyntagmaTokenizer\nfrom medkit.text.context.negation_detector import NegationDetector\nfrom medkit.text.ner.hf_entity_matcher import HFEntityMatcher\n\n# Preprocessing\nchar_replacer = CharReplacer(rules=LIGATURE_RULES + SIGN_RULES)\n# Segmentation\nsent_tokenizer = SentenceTokenizer(output_label=\"sentence\")\nsynt_tokenizer = SyntagmaTokenizer(output_label=\"syntagma\")\n# Negation detection\nneg_detector = NegationDetector(output_label=\"is_negated\")\n# Entity recognition\nentity_matcher = HFEntityMatcher(model=\"my-BERT-model\", attrs_to_copy=[\"is_negated\"])\n\n# 2. Combine operations into a pipeline.\nfrom medkit.core.pipeline import Pipeline, PipelineStep\n\nner_pipeline = Pipeline(\n    input_keys=[\"full_text\"],\n    output_keys=[\"entities\"],\n    steps=[\n        PipelineStep(char_replacer, input_keys=[\"full_text\"], output_keys=[\"clean_text\"]),\n        PipelineStep(sent_tokenizer, input_keys=[\"clean_text\"], output_keys=[\"sentences\"]),\n        PipelineStep(synt_tokenizer, input_keys=[\"sentences\"], output_keys=[\"syntagmas\"]),\n        PipelineStep(neg_detector, input_keys=[\"syntagmas\"], output_keys=[]),\n        PipelineStep(entity_matcher, input_keys=[\"syntagmas\"], output_keys=[\"entities\"]),\n    ],\n)\n\n# 3. Run the NER pipeline on a BRAT document.\nfrom medkit.io import BratInputConverter\n\ndocs = BratInputConverter().load(path=\"/path/to/dataset/\")\nentities = ner_pipeline.run([doc.raw_segment for doc in docs])\n```\n\n## Getting started\n\nTo get started with `medkit`, please checkout our [documentation](https://medkit.readthedocs.io/).\n\nThis documentation also contains tutorials and examples showcasing the use of `medkit` for different tasks.\n\n## Contributing\n\nThank you for your interest into medkit !\n\nWe'll be happy to get your inputs !\n\nIf your problem has not been reported by another user, please open an\n[issue](https://github.com/medkit-lib/medkit/issues), whether it's for:\n\n* reporting a bug, \n* discussing the current state of the code, \n* submitting a fix, \n* proposing new features, \n* or contributing to documentation, ...\n\nIf you want to propose a pull request, you can read [CONTRIBUTING.md](./CONTRIBUTING.md).\n\n## Contact\n\nFeel free to contact us by sending an email to [medkit-maintainers@inria.fr](mailto:medkit-maintainers@inria.fr).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A Python library for a learning health system",
    "version": "0.16.0",
    "project_urls": {
        "Changelog": "https://medkit.readthedocs.io/en/stable/changelog.html",
        "Documentation": "https://medkit.readthedocs.io",
        "Issues": "https://github.com/medkit-lib/medkit/issues",
        "Source": "https://github.com/medkit-lib/medkit"
    },
    "split_keywords": [
        "bert",
        " digital health",
        " ehr",
        " nlp",
        " umls"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "61e96e003355047e7a3168ec08cc021efd5f6a626b45ceb1886ff8a127118429",
                "md5": "3d3f84c45334ab979cf7ff85593c4881",
                "sha256": "568d40f3fa95faebb2caf5154410a43cb2c5e655dbad60fc2eb1d7fe51a3b5e8"
            },
            "downloads": -1,
            "filename": "medkit_lib-0.16.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "3d3f84c45334ab979cf7ff85593c4881",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 285956,
            "upload_time": "2024-05-22T08:29:56",
            "upload_time_iso_8601": "2024-05-22T08:29:56.130969Z",
            "url": "https://files.pythonhosted.org/packages/61/e9/6e003355047e7a3168ec08cc021efd5f6a626b45ceb1886ff8a127118429/medkit_lib-0.16.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "18e88a2b4d2c4a0d5e0b065c1a533793bb687967e7c54295e7d685f3ebc1aea5",
                "md5": "28aa43cec81d1eb951d6ee63a68aa7e4",
                "sha256": "c7462cd376cf0a682c64082e7d64a9645060ed8e468cbe5039a46397213a18bf"
            },
            "downloads": -1,
            "filename": "medkit_lib-0.16.0.tar.gz",
            "has_sig": false,
            "md5_digest": "28aa43cec81d1eb951d6ee63a68aa7e4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 681232,
            "upload_time": "2024-05-22T08:29:58",
            "upload_time_iso_8601": "2024-05-22T08:29:58.270704Z",
            "url": "https://files.pythonhosted.org/packages/18/e8/8a2b4d2c4a0d5e0b065c1a533793bb687967e7c54295e7d685f3ebc1aea5/medkit_lib-0.16.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-22 08:29:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "medkit-lib",
    "github_project": "medkit",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "medkit-lib"
}
        
Elapsed time: 0.23152s