[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/DmitryPogrebnoy/MedSpellChecker/blob/main/LICENSE)
[![Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip)
[![Test](https://github.com/DmitryPogrebnoy/MedSpellChecker/actions/workflows/python-test.yml/badge.svg?branch=main)](https://github.com/DmitryPogrebnoy/MedSpellChecker/actions/workflows/python-test.yml)
# MedSpellChecker
Fast and effective tool for correcting spelling errors in Russian medical texts.
The tool takes the raw medical text and returns the corrected text in lemmatized form.
This project is under active development and is gradually improving.
## Demo
Here is an example of how to correct a spelling mistake with MedSpellChecker.
![Demo](https://github.com/DmitryPogrebnoy/MedSpellChecker/blob/main/presentation_materials/readme/demo/demo_correct_message.gif)
Steps for reproducing the demo:
1. Clone the project
2. Install all requirements
3. Go to `demo` folder
3. Run demo Flask server
4. Open demo website and enjoy!
## Supported errors
**MedSpellChecker** supports fixing the following types of errors.
![Supported errors](https://github.com/DmitryPogrebnoy/MedSpellChecker/blob/main/presentation_materials/figures/misspelling_types.drawio.png)
## Internals
**MedSpellChecker** uses the SymDel algorithm to speed up the generation of correction candidates,
and a fine-tuned BERT-based machine learning model to rank candidates and select the best fit.
The architecture of the **MedSpellChecker** tool is shown below.
![Arch](https://github.com/DmitryPogrebnoy/MedSpellChecker/blob/main/presentation_materials/figures/arch.png)
This architecture allows each component to be developed almost independently and
the correction process to be implemented flexibly.
* **Spellchecker Manager** - responsible for coordinating other components and implementing high-level logic.
* **Preprocessor** and **PostProcessor** - responsible for splitting the incoming text and assembling the result.
* **Dictionary** - contains a dictionary of correct words, which allows to check the correct word or not.
* **Edit Distance Index** - allows to optimize and speed up the calculation of the editing distance required to generate
candicates for fixing an incorrect word.
* **Error Model** - responsible for generating candidates for fixing incorrect words.
* **Language Model** - based on the fine-tuned RuRoberta model, ranks candidates for fixing and selects the most
suitable word for correction.
## More information
This project is part of master's thesis. The current state is the result of the first year of work.
More details about **MedSpellCHecker** you can find in the text of the
[term report](https://github.com/DmitryPogrebnoy/MedSpellChecker/blob/main/presentation_materials/summer-report/Dmitry_Pogrebnoy_term_work.pdf)
.
Raw data
{
"_id": null,
"home_page": "https://github.com/DmitryPogrebnoy/MedSpellChecker",
"name": "medspellchecker",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "",
"keywords": "spellchecker,nlp,medical,text correction",
"author": "Dmitry Pogrebnoy",
"author_email": "pogrebnoy.inc@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/db/22/fc2d23d64e5963156849cd292e295d829614f351fb7be8e0411790dde9b9/medspellchecker-0.0.2.tar.gz",
"platform": null,
"description": "[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/DmitryPogrebnoy/MedSpellChecker/blob/main/LICENSE)\n[![Project Status: WIP \u2013 Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip)\n[![Test](https://github.com/DmitryPogrebnoy/MedSpellChecker/actions/workflows/python-test.yml/badge.svg?branch=main)](https://github.com/DmitryPogrebnoy/MedSpellChecker/actions/workflows/python-test.yml)\n\n# MedSpellChecker\n\nFast and effective tool for correcting spelling errors in Russian medical texts.\nThe tool takes the raw medical text and returns the corrected text in lemmatized form.\n\nThis project is under active development and is gradually improving.\n\n## Demo\n\nHere is an example of how to correct a spelling mistake with MedSpellChecker.\n\n![Demo](https://github.com/DmitryPogrebnoy/MedSpellChecker/blob/main/presentation_materials/readme/demo/demo_correct_message.gif)\n\nSteps for reproducing the demo:\n\n1. Clone the project\n2. Install all requirements\n3. Go to `demo` folder\n3. Run demo Flask server\n4. Open demo website and enjoy!\n\n## Supported errors\n\n**MedSpellChecker** supports fixing the following types of errors.\n\n![Supported errors](https://github.com/DmitryPogrebnoy/MedSpellChecker/blob/main/presentation_materials/figures/misspelling_types.drawio.png)\n\n## Internals\n\n**MedSpellChecker** uses the SymDel algorithm to speed up the generation of correction candidates,\nand a fine-tuned BERT-based machine learning model to rank candidates and select the best fit.\n\nThe architecture of the **MedSpellChecker** tool is shown below.\n\n![Arch](https://github.com/DmitryPogrebnoy/MedSpellChecker/blob/main/presentation_materials/figures/arch.png)\n\nThis architecture allows each component to be developed almost independently and\nthe correction process to be implemented flexibly.\n\n* **Spellchecker Manager** - responsible for coordinating other components and implementing high-level logic.\n* **Preprocessor** and **PostProcessor** - responsible for splitting the incoming text and assembling the result.\n* **Dictionary** - contains a dictionary of correct words, which allows to check the correct word or not.\n* **Edit Distance Index** - allows to optimize and speed up the calculation of the editing distance required to generate\n candicates for fixing an incorrect word.\n* **Error Model** - responsible for generating candidates for fixing incorrect words.\n* **Language Model** - based on the fine-tuned RuRoberta model, ranks candidates for fixing and selects the most\n suitable word for correction.\n\n## More information\n\nThis project is part of master's thesis. The current state is the result of the first year of work.\nMore details about **MedSpellCHecker** you can find in the text of the\n[term report](https://github.com/DmitryPogrebnoy/MedSpellChecker/blob/main/presentation_materials/summer-report/Dmitry_Pogrebnoy_term_work.pdf)\n.\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "Fast and effective spellchecker for Russian medical texts",
"version": "0.0.2",
"split_keywords": [
"spellchecker",
"nlp",
"medical",
"text correction"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "4037542db2a1db190db93e7056ce76a404006ad19b714abb2c61f1470f8abeea",
"md5": "a78bf2f49a21df5ecf06ceca2a75ecb8",
"sha256": "16b2b952cc6634d588bcdfa90f34d55ade78443917aaff06cb81724622aa7332"
},
"downloads": -1,
"filename": "medspellchecker-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a78bf2f49a21df5ecf06ceca2a75ecb8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 1414996,
"upload_time": "2023-04-22T11:11:40",
"upload_time_iso_8601": "2023-04-22T11:11:40.677803Z",
"url": "https://files.pythonhosted.org/packages/40/37/542db2a1db190db93e7056ce76a404006ad19b714abb2c61f1470f8abeea/medspellchecker-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "db22fc2d23d64e5963156849cd292e295d829614f351fb7be8e0411790dde9b9",
"md5": "09d67b8bc2073c07dcb42528aaf0d55d",
"sha256": "d00f32b85af5f1cfb7df88078ff056f375bac10324853021d5a7dde3efbdeb9d"
},
"downloads": -1,
"filename": "medspellchecker-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "09d67b8bc2073c07dcb42528aaf0d55d",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 1372016,
"upload_time": "2023-04-22T11:11:43",
"upload_time_iso_8601": "2023-04-22T11:11:43.672369Z",
"url": "https://files.pythonhosted.org/packages/db/22/fc2d23d64e5963156849cd292e295d829614f351fb7be8e0411790dde9b9/medspellchecker-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-04-22 11:11:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "DmitryPogrebnoy",
"github_project": "MedSpellChecker",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "medspellchecker"
}