[![Downloads](https://pepy.tech/badge/merpy)](https://pepy.tech/project/merpy)
Use MER scripts inside python.
(from the MER repository)
MER is a Named-Entity Recognition tool which given any lexicon and any input text returns the list of
terms recognized in the text, including their exact location (annotations).
Given an ontology (owl file) MER is also able to link the entities to their classes.
More information about MER can be found in:
- MER: a Shell Script and Annotation Server for Minimal Named Entity Recognition and Linking, F. Couto and A. Lamurias, Journal of Cheminformatics, 10:58, 2018
[https://doi.org/10.1186/s13321-018-0312-9]
- MER: a Minimal Named-Entity Recognition Tagger and Annotation Server, F. Couto, L. Campos, and A. Lamurias, in BioCreative V.5 Challenge Evaluation, 2017
[https://www.researchgate.net/publication/316545534_MER_a_Minimal_Named-Entity_Recognition_Tagger_and_Annotation_Server]
**NEW**
- Package lexicons202302.tgz is available
- Docker image available: https://hub.docker.com/r/fjmc/merpy-image
- Package lexicons202103.tgz is available
- Multilingual lexicons using DeCS
## Documentation
https://merpy.readthedocs.io/en/latest/
## Dependencies
### awk
MER was developed and tested using the GNU awk (gawk) and grep. If you have another awk interpreter in your machine, there's no assurance that the program will work.
For example, to install GNU awk on Ubuntu:
```bash
sudo apt-get install gawk
```
### ssmpy
To calculate similarities between the recognized entities
```bash
pip install ssmpy
```
## Installation
```bash
pip install merpy
```
or
```bash
python setup.py install
```
Then you might want to update the MER scripts and download preprocessed data:
```python
>>> import merpy
>>> merpy.download_mer()
>>> merpy.download_lexicons()
```
## Basic Usage
```python
>>> import merpy
>>> merpy.download_lexicons()
>>> lexicons = merpy.get_lexicons()
>>> merpy.show_lexicons()
lexicons preloaded:
['lexicon', 'bireme_decs_por2020', 'bireme_decs_spa2020', 'wordnet-hyponym', 'radlex', 'doid', 'bireme_decs_eng2020', 'go', 'hp', 'chebi_lite']
lexicons loaded ready to use:
['bireme_decs_por2020', 'chebi_lite', 'hp', 'bireme_decs_spa2020', 'wordnet-hyponym', 'doid', 'lexicon', 'radlex', 'go', 'bireme_decs_eng2020']
lexicons with linked concepts:
['bireme_decs_eng2020', 'doid', 'hp', 'go', 'lexicon', 'bireme_decs_spa2020', 'bireme_decs_por2020', 'radlex', 'chebi_lite']
>>> document = 'Influenza, commonly known as "the flu", is an infectious disease caused by an influenza virus. Symptoms can be mild to severe. The most common symptoms include: a high fever, runny nose, sore throat, muscle pains, headache, coughing, and feeling tired ... Acetylcysteine for reducing the oxygen transport and caffeine to stimulate ... fever, tachypnea ... fiebre, taquipnea ... febre, taquipneia'
>>> entities = merpy.get_entities(document, "hp") # get_entities_mp uses multiprocessing (set n_cores param)
>>> print(entities)
[['111', '115', 'mild', 'http://purl.obolibrary.org/obo/HP_0012825'], ['119', '125', 'severe', 'http://purl.obolibrary.org/obo/HP_0012828'], ['168', '173', 'fever', 'http://purl.obolibrary.org/obo/HP_0001945'], ['181', '185', 'nose', 'http://purl.obolibrary.org/obo/UBERON_0000004'], ['200', '206', 'muscle', 'http://purl.obolibrary.org/obo/UBERON_0005090'], ['214', '222', 'headache', 'http://purl.obolibrary.org/obo/HP_0002315'], ['224', '232', 'coughing', 'http://purl.obolibrary.org/obo/HP_0012735'], ['246', '251', 'tired', 'http://purl.obolibrary.org/obo/HP_0012378'], ['288', '294', 'oxygen', 'http://purl.obolibrary.org/obo/CHEBI_15379'], ['295', '304', 'transport', 'http://purl.obolibrary.org/obo/GO_0006810'], ['335', '340', 'fever', 'http://purl.obolibrary.org/obo/HP_0001945'], ['342', '351', 'tachypnea', 'http://purl.obolibrary.org/obo/HP_0002789'], ['175', '185', 'runny nose', 'http://purl.obolibrary.org/obo/HP_0031417'], ['187', '198', 'sore throat', 'http://purl.obolibrary.org/obo/HP_0033050']]
>>> entities = merpy.get_entities(document, "bireme_decs_por2020")
>>> print(entities)
[['0', '9', 'Influenza', 'https://decs.bvsalud.org/ths/?filter=ths_regid&q=D007251'], ['78', '87', 'influenza', 'https://decs.bvsalud.org/ths/?filter=ths_regid&q=D007251'], ['378', '383', 'febre', 'https://decs.bvsalud.org/ths/?filter=ths_regid&q=D005334'], ['385', '395', 'taquipneia', 'https://decs.bvsalud.org/ths/?filter=ths_regid&q=D059246']]
>>> merpy.create_lexicon(["gene1", "gene2", "gene3"], "genelist")
wrote genelist lexicon
>>> merpy.process_lexicon("genelist")
>>> merpy.delete_lexicon("genelist")
deleted genelist lexicon
>>> merpy.download_lexicon("https://github.com/lasigeBioTM/MER/raw/biocreative2017/data/ChEBI.txt", "chebi")
wrote chebi lexicon
>>> merpy.process_lexicon("chebi")
```
## Semantic Similarities
```bash
wget http://labs.rd.ciencias.ulisboa.pt/dishin/chebi202104.db.gz
gunzip -N chebi202104.db.gz
```
```python
>>> import merpy
>>> merpy.process_lexicon("lexicon")
>>> document = "α-maltose and nicotinic acid was found, but not nicotinic acid D-ribonucleotide"
>>> entities = merpy.get_entities(document, "lexicon")
>>> merpy.get_similarities(entities, 'chebi.db')
[['0', '9', 'α-maltose', 'http://purl.obolibrary.org/obo/CHEBI_18167', 0.02834388514184269], ['14', '28', 'nicotinic acid', 'http://purl.obolibrary.org/obo/CHEBI_15940', 0.07402224403263755], ['48', '62', 'nicotinic acid', 'http://purl.obolibrary.org/obo/CHEBI_15940', 0.07402224403263755], ['48', '79', 'nicotinic acid D-ribonucleotide', 'http://purl.obolibrary.org/obo/CHEBI_15763', 0.07402224403263755]]
```
Raw data
{
"_id": null,
"home_page": "https://github.com/lasigeBioTM/merpy",
"name": "merpy",
"maintainer": "Francisco M Couto",
"docs_url": null,
"requires_python": "",
"maintainer_email": "fjcouto@edu.ulisboa.pt",
"keywords": "ner,named-entity recognition,entity linking,ontologies",
"author": "Andre Lamurias",
"author_email": "alamurias@lasige.di.fc.ul.pt",
"download_url": "https://files.pythonhosted.org/packages/2a/b1/5c5fb66e25430ae126bb92b5fdc5641dce7cf199f116116cfa84a55935dd/merpy-1.7.12.tar.gz",
"platform": null,
"description": "[![Downloads](https://pepy.tech/badge/merpy)](https://pepy.tech/project/merpy)\n\nUse MER scripts inside python.\n\n\n(from the MER repository)\n\nMER is a Named-Entity Recognition tool which given any lexicon and any input text returns the list of \nterms recognized in the text, including their exact location (annotations).\n\nGiven an ontology (owl file) MER is also able to link the entities to their classes.\n\nMore information about MER can be found in:\n- MER: a Shell Script and Annotation Server for Minimal Named Entity Recognition and Linking, F. Couto and A. Lamurias, Journal of Cheminformatics, 10:58, 2018\n[https://doi.org/10.1186/s13321-018-0312-9]\n- MER: a Minimal Named-Entity Recognition Tagger and Annotation Server, F. Couto, L. Campos, and A. Lamurias, in BioCreative V.5 Challenge Evaluation, 2017\n[https://www.researchgate.net/publication/316545534_MER_a_Minimal_Named-Entity_Recognition_Tagger_and_Annotation_Server]\n\n\n**NEW**\n- Package lexicons202302.tgz is available\n- Docker image available: https://hub.docker.com/r/fjmc/merpy-image\n- Package lexicons202103.tgz is available\n- Multilingual lexicons using DeCS\n\n## Documentation\n\nhttps://merpy.readthedocs.io/en/latest/\n\n## Dependencies\n\n### awk\n\nMER was developed and tested using the GNU awk (gawk) and grep. If you have another awk interpreter in your machine, there's no assurance that the program will work.\n\nFor example, to install GNU awk on Ubuntu:\n\n```bash\nsudo apt-get install gawk\n```\n\n### ssmpy\n\nTo calculate similarities between the recognized entities\n\n```bash\npip install ssmpy\n```\n\n\n## Installation\n```bash\npip install merpy\n```\nor\n\n```bash\npython setup.py install\n```\n\nThen you might want to update the MER scripts and download preprocessed data:\n```python\n>>> import merpy\n>>> merpy.download_mer()\n>>> merpy.download_lexicons()\n```\n\n\n## Basic Usage\n\n```python\n>>> import merpy\n>>> merpy.download_lexicons()\n>>> lexicons = merpy.get_lexicons()\n>>> merpy.show_lexicons()\nlexicons preloaded:\n['lexicon', 'bireme_decs_por2020', 'bireme_decs_spa2020', 'wordnet-hyponym', 'radlex', 'doid', 'bireme_decs_eng2020', 'go', 'hp', 'chebi_lite']\nlexicons loaded ready to use:\n['bireme_decs_por2020', 'chebi_lite', 'hp', 'bireme_decs_spa2020', 'wordnet-hyponym', 'doid', 'lexicon', 'radlex', 'go', 'bireme_decs_eng2020']\nlexicons with linked concepts:\n['bireme_decs_eng2020', 'doid', 'hp', 'go', 'lexicon', 'bireme_decs_spa2020', 'bireme_decs_por2020', 'radlex', 'chebi_lite']\n\n>>> document = 'Influenza, commonly known as \"the flu\", is an infectious disease caused by an influenza virus. Symptoms can be mild to severe. The most common symptoms include: a high fever, runny nose, sore throat, muscle pains, headache, coughing, and feeling tired ... Acetylcysteine for reducing the oxygen transport and caffeine to stimulate ... fever, tachypnea ... fiebre, taquipnea ... febre, taquipneia' \n>>> entities = merpy.get_entities(document, \"hp\") # get_entities_mp uses multiprocessing (set n_cores param)\n>>> print(entities)\n[['111', '115', 'mild', 'http://purl.obolibrary.org/obo/HP_0012825'], ['119', '125', 'severe', 'http://purl.obolibrary.org/obo/HP_0012828'], ['168', '173', 'fever', 'http://purl.obolibrary.org/obo/HP_0001945'], ['181', '185', 'nose', 'http://purl.obolibrary.org/obo/UBERON_0000004'], ['200', '206', 'muscle', 'http://purl.obolibrary.org/obo/UBERON_0005090'], ['214', '222', 'headache', 'http://purl.obolibrary.org/obo/HP_0002315'], ['224', '232', 'coughing', 'http://purl.obolibrary.org/obo/HP_0012735'], ['246', '251', 'tired', 'http://purl.obolibrary.org/obo/HP_0012378'], ['288', '294', 'oxygen', 'http://purl.obolibrary.org/obo/CHEBI_15379'], ['295', '304', 'transport', 'http://purl.obolibrary.org/obo/GO_0006810'], ['335', '340', 'fever', 'http://purl.obolibrary.org/obo/HP_0001945'], ['342', '351', 'tachypnea', 'http://purl.obolibrary.org/obo/HP_0002789'], ['175', '185', 'runny nose', 'http://purl.obolibrary.org/obo/HP_0031417'], ['187', '198', 'sore throat', 'http://purl.obolibrary.org/obo/HP_0033050']]\n\n>>> entities = merpy.get_entities(document, \"bireme_decs_por2020\") \n>>> print(entities)\n[['0', '9', 'Influenza', 'https://decs.bvsalud.org/ths/?filter=ths_regid&q=D007251'], ['78', '87', 'influenza', 'https://decs.bvsalud.org/ths/?filter=ths_regid&q=D007251'], ['378', '383', 'febre', 'https://decs.bvsalud.org/ths/?filter=ths_regid&q=D005334'], ['385', '395', 'taquipneia', 'https://decs.bvsalud.org/ths/?filter=ths_regid&q=D059246']]\n\n>>> merpy.create_lexicon([\"gene1\", \"gene2\", \"gene3\"], \"genelist\")\nwrote genelist lexicon\n>>> merpy.process_lexicon(\"genelist\")\n>>> merpy.delete_lexicon(\"genelist\")\ndeleted genelist lexicon\n>>> merpy.download_lexicon(\"https://github.com/lasigeBioTM/MER/raw/biocreative2017/data/ChEBI.txt\", \"chebi\")\nwrote chebi lexicon\n>>> merpy.process_lexicon(\"chebi\")\n```\n\n## Semantic Similarities \n\n```bash\nwget http://labs.rd.ciencias.ulisboa.pt/dishin/chebi202104.db.gz\ngunzip -N chebi202104.db.gz\n```\n\n```python\n>>> import merpy\n>>> merpy.process_lexicon(\"lexicon\")\n>>> document = \"\u03b1-maltose and nicotinic acid was found, but not nicotinic acid D-ribonucleotide\"\n>>> entities = merpy.get_entities(document, \"lexicon\") \n>>> merpy.get_similarities(entities, 'chebi.db')\n[['0', '9', '\u03b1-maltose', 'http://purl.obolibrary.org/obo/CHEBI_18167', 0.02834388514184269], ['14', '28', 'nicotinic acid', 'http://purl.obolibrary.org/obo/CHEBI_15940', 0.07402224403263755], ['48', '62', 'nicotinic acid', 'http://purl.obolibrary.org/obo/CHEBI_15940', 0.07402224403263755], ['48', '79', 'nicotinic acid D-ribonucleotide', 'http://purl.obolibrary.org/obo/CHEBI_15763', 0.07402224403263755]]\n\n```\n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "Use Minimal Named-Entity Recognizer (MER) inside python",
"version": "1.7.12",
"split_keywords": [
"ner",
"named-entity recognition",
"entity linking",
"ontologies"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ef94cdb0b52639db5ded500c227c4a1cfb23841914e6c5185f8a17d43ac9641c",
"md5": "f612fb76740e00031f7e377775ea0c5e",
"sha256": "4d7f705ea99d03f4a310e13ac023631a081939d194c0c4ded53379fe5688be12"
},
"downloads": -1,
"filename": "merpy-1.7.12-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f612fb76740e00031f7e377775ea0c5e",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 28808,
"upload_time": "2023-02-06T18:22:50",
"upload_time_iso_8601": "2023-02-06T18:22:50.401521Z",
"url": "https://files.pythonhosted.org/packages/ef/94/cdb0b52639db5ded500c227c4a1cfb23841914e6c5185f8a17d43ac9641c/merpy-1.7.12-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2ab15c5fb66e25430ae126bb92b5fdc5641dce7cf199f116116cfa84a55935dd",
"md5": "e9c7c5cc1719c3d0bf655c3897287062",
"sha256": "16ccdd3cea48907a103a035ce19bb07c13d5080dcba57bc9a8c3d2c945641028"
},
"downloads": -1,
"filename": "merpy-1.7.12.tar.gz",
"has_sig": false,
"md5_digest": "e9c7c5cc1719c3d0bf655c3897287062",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 25722,
"upload_time": "2023-02-06T18:22:54",
"upload_time_iso_8601": "2023-02-06T18:22:54.975012Z",
"url": "https://files.pythonhosted.org/packages/2a/b1/5c5fb66e25430ae126bb92b5fdc5641dce7cf199f116116cfa84a55935dd/merpy-1.7.12.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-02-06 18:22:54",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "lasigeBioTM",
"github_project": "merpy",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "merpy"
}