# Text-to-AthenaCode
This library simply receives a input text and matches possible terms against a dictionary of terms that contain their associated Athena codes. For instance:
For the terms to code dictionary:
```json
"Male": 8507,
"Female": 8532,
"Angiomyxoma": 4239956,
```
and input text,
- "Query for all female patients diagnosed with angiomyxoma"
the library returns:
- "Query for all 8532 patients diagnosed with 4239956"
Consider that the dictionary terms can be obtained from anywhere, in this case we are using it for the IDEA4RC project's datamodel.
## Methods
Fuzy string matching.
## How to use it
- Just install requirements (check end of file)
- Run `DM_codes_extraction` for obtaining the term_to_code.json. This step can be replaced for any other dictionaries.
- Run `demo.py` for a demo
```python
from term_matcher import load_term_to_code, match_terms
# Load term-to-code mappings
term_to_code = load_term_to_code("dictionaries/term_to_code.json") # if working with term to code
# Input text
text = "The patient with angiomyxoma and carcinoma was diagnosed."
# Match terms to codes
matched_codes, matched_terms = match_terms(text, term_to_code, threshold=50)
# Output matched codes
print("Matched Codes:", matched_codes)
# Optionally, output matched terms for debugging
print("Matched Terms:", matched_terms)
```
Output:
```
Matched Codes: [4239956, 4233949, 4175678, 4164740, 4206785, 4224593, 37156145, 4241843, 4029680, 4022895]
Matched Terms: ['Angiomyxoma', 'Verrucous carcinoma', 'Giant cell carcinoma', 'Acinar cell carcinoma', 'Schneiderian carcinoma', 'Juvenile carcinoma of the breast', 'Squamous cell carcinoma', 'Adenosquamous carcinoma', 'Myoepithelial carcinoma', 'Adenoid cystic carcinoma']
```
## Improvements
1. Synonyms
2. Multi-term matching
3. ElasticSearch for bigger size dictionaries (basically full Athena Concepts).
## Installation as package
```
pip install -e .
```
Raw data
{
"_id": null,
"home_page": "https://github.com/unai-zulaika/term_matcher",
"name": "term-matcher",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": null,
"keywords": "fuzzy matching terms codes",
"author": "Unai Zulaika",
"author_email": "unai.zulaika@deusto.es",
"download_url": "https://files.pythonhosted.org/packages/45/b1/c83f81d96f9604feb95049d53b95d07c3a43fab9d297dba53f6551a5a2c9/term_matcher-0.21.tar.gz",
"platform": null,
"description": "\r\n# Text-to-AthenaCode \r\n\r\nThis library simply receives a input text and matches possible terms against a dictionary of terms that contain their associated Athena codes. For instance:\r\n\r\nFor the terms to code dictionary:\r\n\r\n```json\r\n\"Male\": 8507,\r\n\"Female\": 8532,\r\n\"Angiomyxoma\": 4239956,\r\n```\r\nand input text,\r\n- \"Query for all female patients diagnosed with angiomyxoma\"\r\n\r\n\r\nthe library returns:\r\n\r\n- \"Query for all 8532 patients diagnosed with 4239956\"\r\n\r\n\r\nConsider that the dictionary terms can be obtained from anywhere, in this case we are using it for the IDEA4RC project's datamodel. \r\n\r\n## Methods\r\n\r\nFuzy string matching.\r\n\r\n## How to use it\r\n\r\n- Just install requirements (check end of file)\r\n- Run `DM_codes_extraction` for obtaining the term_to_code.json. This step can be replaced for any other dictionaries.\r\n- Run `demo.py` for a demo\r\n\r\n```python\r\nfrom term_matcher import load_term_to_code, match_terms\r\n\r\n# Load term-to-code mappings\r\nterm_to_code = load_term_to_code(\"dictionaries/term_to_code.json\") # if working with term to code\r\n\r\n# Input text\r\ntext = \"The patient with angiomyxoma and carcinoma was diagnosed.\"\r\n\r\n# Match terms to codes\r\nmatched_codes, matched_terms = match_terms(text, term_to_code, threshold=50)\r\n\r\n# Output matched codes\r\nprint(\"Matched Codes:\", matched_codes)\r\n# Optionally, output matched terms for debugging\r\nprint(\"Matched Terms:\", matched_terms)\r\n```\r\n\r\nOutput:\r\n\r\n```\r\nMatched Codes: [4239956, 4233949, 4175678, 4164740, 4206785, 4224593, 37156145, 4241843, 4029680, 4022895]\r\nMatched Terms: ['Angiomyxoma', 'Verrucous carcinoma', 'Giant cell carcinoma', 'Acinar cell carcinoma', 'Schneiderian carcinoma', 'Juvenile carcinoma of the breast', 'Squamous cell carcinoma', 'Adenosquamous carcinoma', 'Myoepithelial carcinoma', 'Adenoid cystic carcinoma']\r\n```\r\n\r\n## Improvements\r\n\r\n1. Synonyms\r\n2. Multi-term matching\r\n3. ElasticSearch for bigger size dictionaries (basically full Athena Concepts).\r\n\r\n## Installation as package\r\n\r\n```\r\npip install -e .\r\n```\r\n",
"bugtrack_url": null,
"license": null,
"summary": "A library for fuzzy matching terms in text with corresponding codes.",
"version": "0.21",
"project_urls": {
"Homepage": "https://github.com/unai-zulaika/term_matcher"
},
"split_keywords": [
"fuzzy",
"matching",
"terms",
"codes"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9f033b4b462b23262202682955f9e17e211867471defe5a0054fcaa73ef8cc4a",
"md5": "2192f8d8d6bd3ddce22d8bc41e2a8a56",
"sha256": "a57c9ef92401b3c09f36ab703fed5862f5d083a5214a4aaa1f6bbbf6cf78db7d"
},
"downloads": -1,
"filename": "term_matcher-0.21-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2192f8d8d6bd3ddce22d8bc41e2a8a56",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 5222,
"upload_time": "2024-10-28T16:41:04",
"upload_time_iso_8601": "2024-10-28T16:41:04.965391Z",
"url": "https://files.pythonhosted.org/packages/9f/03/3b4b462b23262202682955f9e17e211867471defe5a0054fcaa73ef8cc4a/term_matcher-0.21-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "45b1c83f81d96f9604feb95049d53b95d07c3a43fab9d297dba53f6551a5a2c9",
"md5": "59944fd7ad90230d20ecbd2587539053",
"sha256": "b1aec28cfa1d49b714bf9f202787026138a6766bd8068fefeb520379a3a08c1c"
},
"downloads": -1,
"filename": "term_matcher-0.21.tar.gz",
"has_sig": false,
"md5_digest": "59944fd7ad90230d20ecbd2587539053",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 5422,
"upload_time": "2024-10-28T16:41:06",
"upload_time_iso_8601": "2024-10-28T16:41:06.376351Z",
"url": "https://files.pythonhosted.org/packages/45/b1/c83f81d96f9604feb95049d53b95d07c3a43fab9d297dba53f6551a5a2c9/term_matcher-0.21.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-28 16:41:06",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "unai-zulaika",
"github_project": "term_matcher",
"github_not_found": true,
"lcname": "term-matcher"
}