term-matcher


Nameterm-matcher JSON
Version 0.21 PyPI version JSON
download
home_pagehttps://github.com/unai-zulaika/term_matcher
SummaryA library for fuzzy matching terms in text with corresponding codes.
upload_time2024-10-28 16:41:06
maintainerNone
docs_urlNone
authorUnai Zulaika
requires_python>=3.6
licenseNone
keywords fuzzy matching terms codes
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# Text-to-AthenaCode 

This library simply receives a input text and matches possible terms against a dictionary of terms that contain their associated Athena codes. For instance:

For the terms to code dictionary:

```json
"Male": 8507,
"Female": 8532,
"Angiomyxoma": 4239956,
```
and input text,
- "Query for all female patients diagnosed with angiomyxoma"


the library returns:

- "Query for all 8532 patients diagnosed with 4239956"


Consider that the dictionary terms can be obtained from anywhere, in this case we are using it for the IDEA4RC project's datamodel. 

## Methods

Fuzy string matching.

## How to use it

- Just install requirements (check end of file)
- Run `DM_codes_extraction` for obtaining the term_to_code.json. This step can be replaced for any other dictionaries.
- Run `demo.py` for a demo

```python
from term_matcher import load_term_to_code, match_terms

# Load term-to-code mappings
term_to_code = load_term_to_code("dictionaries/term_to_code.json") # if working with term to code

# Input text
text = "The patient with angiomyxoma and carcinoma was diagnosed."

# Match terms to codes
matched_codes, matched_terms = match_terms(text, term_to_code, threshold=50)

# Output matched codes
print("Matched Codes:", matched_codes)
# Optionally, output matched terms for debugging
print("Matched Terms:", matched_terms)
```

Output:

```
Matched Codes: [4239956, 4233949, 4175678, 4164740, 4206785, 4224593, 37156145, 4241843, 4029680, 4022895]
Matched Terms: ['Angiomyxoma', 'Verrucous carcinoma', 'Giant cell carcinoma', 'Acinar cell carcinoma', 'Schneiderian carcinoma', 'Juvenile carcinoma of the breast', 'Squamous cell carcinoma', 'Adenosquamous carcinoma', 'Myoepithelial carcinoma', 'Adenoid cystic carcinoma']
```

## Improvements

1. Synonyms
2. Multi-term matching
3. ElasticSearch for bigger size dictionaries (basically full Athena Concepts).

## Installation as package

```
pip install -e .
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/unai-zulaika/term_matcher",
    "name": "term-matcher",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "fuzzy matching terms codes",
    "author": "Unai Zulaika",
    "author_email": "unai.zulaika@deusto.es",
    "download_url": "https://files.pythonhosted.org/packages/45/b1/c83f81d96f9604feb95049d53b95d07c3a43fab9d297dba53f6551a5a2c9/term_matcher-0.21.tar.gz",
    "platform": null,
    "description": "\r\n# Text-to-AthenaCode \r\n\r\nThis library simply receives a input text and matches possible terms against a dictionary of terms that contain their associated Athena codes. For instance:\r\n\r\nFor the terms to code dictionary:\r\n\r\n```json\r\n\"Male\": 8507,\r\n\"Female\": 8532,\r\n\"Angiomyxoma\": 4239956,\r\n```\r\nand input text,\r\n- \"Query for all female patients diagnosed with angiomyxoma\"\r\n\r\n\r\nthe library returns:\r\n\r\n- \"Query for all 8532 patients diagnosed with 4239956\"\r\n\r\n\r\nConsider that the dictionary terms can be obtained from anywhere, in this case we are using it for the IDEA4RC project's datamodel. \r\n\r\n## Methods\r\n\r\nFuzy string matching.\r\n\r\n## How to use it\r\n\r\n- Just install requirements (check end of file)\r\n- Run `DM_codes_extraction` for obtaining the term_to_code.json. This step can be replaced for any other dictionaries.\r\n- Run `demo.py` for a demo\r\n\r\n```python\r\nfrom term_matcher import load_term_to_code, match_terms\r\n\r\n# Load term-to-code mappings\r\nterm_to_code = load_term_to_code(\"dictionaries/term_to_code.json\") # if working with term to code\r\n\r\n# Input text\r\ntext = \"The patient with angiomyxoma and carcinoma was diagnosed.\"\r\n\r\n# Match terms to codes\r\nmatched_codes, matched_terms = match_terms(text, term_to_code, threshold=50)\r\n\r\n# Output matched codes\r\nprint(\"Matched Codes:\", matched_codes)\r\n# Optionally, output matched terms for debugging\r\nprint(\"Matched Terms:\", matched_terms)\r\n```\r\n\r\nOutput:\r\n\r\n```\r\nMatched Codes: [4239956, 4233949, 4175678, 4164740, 4206785, 4224593, 37156145, 4241843, 4029680, 4022895]\r\nMatched Terms: ['Angiomyxoma', 'Verrucous carcinoma', 'Giant cell carcinoma', 'Acinar cell carcinoma', 'Schneiderian carcinoma', 'Juvenile carcinoma of the breast', 'Squamous cell carcinoma', 'Adenosquamous carcinoma', 'Myoepithelial carcinoma', 'Adenoid cystic carcinoma']\r\n```\r\n\r\n## Improvements\r\n\r\n1. Synonyms\r\n2. Multi-term matching\r\n3. ElasticSearch for bigger size dictionaries (basically full Athena Concepts).\r\n\r\n## Installation as package\r\n\r\n```\r\npip install -e .\r\n```\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A library for fuzzy matching terms in text with corresponding codes.",
    "version": "0.21",
    "project_urls": {
        "Homepage": "https://github.com/unai-zulaika/term_matcher"
    },
    "split_keywords": [
        "fuzzy",
        "matching",
        "terms",
        "codes"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9f033b4b462b23262202682955f9e17e211867471defe5a0054fcaa73ef8cc4a",
                "md5": "2192f8d8d6bd3ddce22d8bc41e2a8a56",
                "sha256": "a57c9ef92401b3c09f36ab703fed5862f5d083a5214a4aaa1f6bbbf6cf78db7d"
            },
            "downloads": -1,
            "filename": "term_matcher-0.21-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2192f8d8d6bd3ddce22d8bc41e2a8a56",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 5222,
            "upload_time": "2024-10-28T16:41:04",
            "upload_time_iso_8601": "2024-10-28T16:41:04.965391Z",
            "url": "https://files.pythonhosted.org/packages/9f/03/3b4b462b23262202682955f9e17e211867471defe5a0054fcaa73ef8cc4a/term_matcher-0.21-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "45b1c83f81d96f9604feb95049d53b95d07c3a43fab9d297dba53f6551a5a2c9",
                "md5": "59944fd7ad90230d20ecbd2587539053",
                "sha256": "b1aec28cfa1d49b714bf9f202787026138a6766bd8068fefeb520379a3a08c1c"
            },
            "downloads": -1,
            "filename": "term_matcher-0.21.tar.gz",
            "has_sig": false,
            "md5_digest": "59944fd7ad90230d20ecbd2587539053",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 5422,
            "upload_time": "2024-10-28T16:41:06",
            "upload_time_iso_8601": "2024-10-28T16:41:06.376351Z",
            "url": "https://files.pythonhosted.org/packages/45/b1/c83f81d96f9604feb95049d53b95d07c3a43fab9d297dba53f6551a5a2c9/term_matcher-0.21.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-28 16:41:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "unai-zulaika",
    "github_project": "term_matcher",
    "github_not_found": true,
    "lcname": "term-matcher"
}
        
Elapsed time: 0.34178s