iamsystem


Nameiamsystem JSON
Version 0.6.1 PyPI version JSON
download
home_pageNone
SummaryA python implementation of IAMsystem algorithm
upload_time2024-04-30 07:37:43
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseNone
keywords nlp semantic annotation entity linking
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # iamsystem
![test](https://github.com/scossin/iamsystem_python/actions/workflows/tests.yml/badge.svg)
[![PyPI version fury.io](https://badge.fury.io/py/iamsystem.svg)](https://pypi.org/project/iamsystem/)
[![PyPI license](https://img.shields.io/pypi/l/iamsystem.svg)](https://pypi.python.org/pypi/iamsystem/)
[![PyPI pyversions](https://img.shields.io/pypi/pyversions/iamsystem.svg)](https://pypi.python.org/pypi/iamsystem/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black)

A python implementation of IAMsystem algorithm, a fast dictionary-based approach for semantic annotation, a.k.a entity linking.


## Installation

```bash
pip install iamsystem
```

## Usage
You provide a list of keywords you want to detect in a document,
you can add and combine abbreviations, normalization methods (lemmatization, stemming) and approximate string matching algorithms,
IAMsystem algorithm performs the semantic annotation.

See the [documentation](https://iamsystem-python.readthedocs.io/en/latest/) for the configuration details.

### Quick example

```python
from iamsystem import Matcher

matcher = Matcher.build(
    keywords=["North America", "South America"],
    stopwords=["and"],
    abbreviations=[("amer", "America")],
    spellwise=[dict(measure="Levenshtein", max_distance=1)],
    w=2,
)
annots = matcher.annot_text(text="Northh and south Amer.")
for annot in annots:
    print(annot)
# Northh Amer	0 6;17 21	North America
# south Amer	11 21	South America
```


## Algorithm
The algorithm was developed in the context of a [PhD thesis](https://theses.hal.science/tel-03857962/).
It proposes a solution to quickly annotate documents using a large dictionary (> 300K keywords) and fuzzy matching algorithms.
No string distance algorithm is implemented in this package, it imports and leverages external libraries like [spellwise](https://github.com/chinnichaitanya/spellwise),
[pysimstring](https://github.com/percevalw/pysimstring) and [nltk](https://github.com/nltk/nltk).
Its algorithmic complexity is *O(n(log(m)))* with n the number of tokens in a document and m the size of the dictionary.
The formalization of the algorithm is available in this [paper](https://ceur-ws.org/Vol-3202/livingner-paper11.pdf).

The algorithm was initially developed in Java (https://github.com/scossin/IAMsystem).
It has participated in several semantic annotation competitions in the medical field where it has obtained satisfactory results,
for example by obtaining the best results in the [Codiesp shared task](https://temu.bsc.es/codiesp/index.php/2019/09/19/awards/).
A dictionary-based model can achieve close performance to a transformer-based model when the task is simple or when the training set is small.
Its main advantage is its speed, which allows a baseline to be generated quickly.

### Citation
```
@article{cossin_iam_2018,
	title = {{IAM} at {CLEF} {eHealth} 2018: {Concept} {Annotation} and {Coding} in {French} {Death} {Certificates}},
	shorttitle = {{IAM} at {CLEF} {eHealth} 2018},
	url = {http://arxiv.org/abs/1807.03674},
	urldate = {2018-07-11},
	journal = {arXiv:1807.03674 [cs]},
	author = {Cossin, Sébastien and Jouhet, Vianney and Mougin, Fleur and Diallo, Gayo and Thiessard, Frantz},
	month = jul,
	year = {2018},
	note = {arXiv: 1807.03674},
	keywords = {Computer Science - Computation and Language},
}
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "iamsystem",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "NLP, semantic annotation, entity linking",
    "author": null,
    "author_email": "Sebastien Cossin <cossin.sebastien@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/55/d0/c30ede1487a9218c80cc709c138ea96f9367c4f4b2205eea46903128f08d/iamsystem-0.6.1.tar.gz",
    "platform": null,
    "description": "# iamsystem\n![test](https://github.com/scossin/iamsystem_python/actions/workflows/tests.yml/badge.svg)\n[![PyPI version fury.io](https://badge.fury.io/py/iamsystem.svg)](https://pypi.org/project/iamsystem/)\n[![PyPI license](https://img.shields.io/pypi/l/iamsystem.svg)](https://pypi.python.org/pypi/iamsystem/)\n[![PyPI pyversions](https://img.shields.io/pypi/pyversions/iamsystem.svg)](https://pypi.python.org/pypi/iamsystem/)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black)\n\nA python implementation of IAMsystem algorithm, a fast dictionary-based approach for semantic annotation, a.k.a entity linking.\n\n\n## Installation\n\n```bash\npip install iamsystem\n```\n\n## Usage\nYou provide a list of keywords you want to detect in a document,\nyou can add and combine abbreviations, normalization methods (lemmatization, stemming) and approximate string matching algorithms,\nIAMsystem algorithm performs the semantic annotation.\n\nSee the [documentation](https://iamsystem-python.readthedocs.io/en/latest/) for the configuration details.\n\n### Quick example\n\n```python\nfrom iamsystem import Matcher\n\nmatcher = Matcher.build(\n    keywords=[\"North America\", \"South America\"],\n    stopwords=[\"and\"],\n    abbreviations=[(\"amer\", \"America\")],\n    spellwise=[dict(measure=\"Levenshtein\", max_distance=1)],\n    w=2,\n)\nannots = matcher.annot_text(text=\"Northh and south Amer.\")\nfor annot in annots:\n    print(annot)\n# Northh Amer\t0 6;17 21\tNorth America\n# south Amer\t11 21\tSouth America\n```\n\n\n## Algorithm\nThe algorithm was developed in the context of a [PhD thesis](https://theses.hal.science/tel-03857962/).\nIt proposes a solution to quickly annotate documents using a large dictionary (> 300K keywords) and fuzzy matching algorithms.\nNo string distance algorithm is implemented in this package, it imports and leverages external libraries like [spellwise](https://github.com/chinnichaitanya/spellwise),\n[pysimstring](https://github.com/percevalw/pysimstring) and [nltk](https://github.com/nltk/nltk).\nIts algorithmic complexity is *O(n(log(m)))* with n the number of tokens in a document and m the size of the dictionary.\nThe formalization of the algorithm is available in this [paper](https://ceur-ws.org/Vol-3202/livingner-paper11.pdf).\n\nThe algorithm was initially developed in Java (https://github.com/scossin/IAMsystem).\nIt has participated in several semantic annotation competitions in the medical field where it has obtained satisfactory results,\nfor example by obtaining the best results in the [Codiesp shared task](https://temu.bsc.es/codiesp/index.php/2019/09/19/awards/).\nA dictionary-based model can achieve close performance to a transformer-based model when the task is simple or when the training set is small.\nIts main advantage is its speed, which allows a baseline to be generated quickly.\n\n### Citation\n```\n@article{cossin_iam_2018,\n\ttitle = {{IAM} at {CLEF} {eHealth} 2018: {Concept} {Annotation} and {Coding} in {French} {Death} {Certificates}},\n\tshorttitle = {{IAM} at {CLEF} {eHealth} 2018},\n\turl = {http://arxiv.org/abs/1807.03674},\n\turldate = {2018-07-11},\n\tjournal = {arXiv:1807.03674 [cs]},\n\tauthor = {Cossin, S\u00e9bastien and Jouhet, Vianney and Mougin, Fleur and Diallo, Gayo and Thiessard, Frantz},\n\tmonth = jul,\n\tyear = {2018},\n\tnote = {arXiv: 1807.03674},\n\tkeywords = {Computer Science - Computation and Language},\n}\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A python implementation of IAMsystem algorithm",
    "version": "0.6.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/scossin/iamsystem_python/issues",
        "Homepage": "https://github.com/scossin/iamsystem_python"
    },
    "split_keywords": [
        "nlp",
        " semantic annotation",
        " entity linking"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7da34598df7318de97d46e84639299c8c6342a3badf5f0a3eae2b99daadfaf3e",
                "md5": "408f350582e49e121c7673ce59f8e627",
                "sha256": "6576c21d860d954e8be39fb7e9c7bc540293df51f01f7f52d483abab0c7aa173"
            },
            "downloads": -1,
            "filename": "iamsystem-0.6.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "408f350582e49e121c7673ce59f8e627",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 56070,
            "upload_time": "2024-04-30T07:37:41",
            "upload_time_iso_8601": "2024-04-30T07:37:41.976824Z",
            "url": "https://files.pythonhosted.org/packages/7d/a3/4598df7318de97d46e84639299c8c6342a3badf5f0a3eae2b99daadfaf3e/iamsystem-0.6.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "55d0c30ede1487a9218c80cc709c138ea96f9367c4f4b2205eea46903128f08d",
                "md5": "d05e90ac8693960cdac6c6d53af2239b",
                "sha256": "f5200c6969984c3d286fee021c22d327afee0444c6702eba0e14e459e18f8221"
            },
            "downloads": -1,
            "filename": "iamsystem-0.6.1.tar.gz",
            "has_sig": false,
            "md5_digest": "d05e90ac8693960cdac6c6d53af2239b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 68191,
            "upload_time": "2024-04-30T07:37:43",
            "upload_time_iso_8601": "2024-04-30T07:37:43.922167Z",
            "url": "https://files.pythonhosted.org/packages/55/d0/c30ede1487a9218c80cc709c138ea96f9367c4f4b2205eea46903128f08d/iamsystem-0.6.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-30 07:37:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "scossin",
    "github_project": "iamsystem_python",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "iamsystem"
}
        
Elapsed time: 0.25275s