## Named Entity Recognition Toolkit
Provide a toolkit for rapidly extracting useful entities from text using various Python packages, including [Stanza](https://stanfordnlp.github.io/stanza/index.html).
### Features
We try to bring the complicated use of existing NLP toolkits down to earth by keeping APIs as simple as possible with best practice.
### Installation
```pip
pip install ner-kit
```
### Examples
Example 1: Word segmention
```python
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
sw=StanzaWrapper()
sw.download(lang="en")
text='This is a test sentence for stanza. This is another sentence.'
result1=sw.tokenize(text)
sw.print_result(result1)
```
Example 2: Chinese word segmentation
```python
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
sw=StanzaWrapper()
sw.download(lang="zh")
text='我在北京吃苹果!'
result1=sw.tokenize(text,lang='zh')
sw.print_result(result1)
```
Example 3: Multi-Word Token (MWT) Expansion
```python
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
sw=StanzaWrapper()
sw.download(lang="fr")
text='Nous avons atteint la fin du sentier.'
result1=sw.mwt_expand(text,lang='fr')
sw.print_result(result1)
```
Example 4: POS tagging
```python
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
sw=StanzaWrapper()
sw.download(lang='en')
text='I like apple'
result1=sw.tag(text)
sw.print_result(result1)
sw.download_chinese_model()
text='我喜欢苹果'
result2=sw.tag_chinese(text,lang='zh')
sw.print_result(result2)
```
Example 5: Named Entity Recognition
```python
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
sw=StanzaWrapper()
sw.download(lang='en')
sw.download_chinese_model()
text_en = 'I like Beijing!'
result1 = sw.ner(text_en)
sw.print_result(result1)
text='我喜欢北京!'
result2=sw.ner_chinese(text)
sw.print_result(result2)
```
Example 6: Sentiment Analysis
```python
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
sw=StanzaWrapper()
text_en = 'I like Beijing!'
result1 = sw.sentiment(text_en)
sw.print_result(result1)
text_zh='我讨厌苹果!'
result2=sw.sentiment_chinese(text_zh)
sw.print_result(result2)
```
Example 7: Language detection from text
```python
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
sw=StanzaWrapper()
list_text = ['I like Beijing!','我喜欢北京!', "Bonjour le monde!"]
result1 = sw.lang(list_text)
sw.print_result(result1)
```
Example 8: Language detection from text with a user-defined processing function
```python
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
sw=StanzaWrapper()
list_text = ['I like Beijing!','我喜欢北京!', "Bonjour le monde!"]
def process(model):# do your own business
doc=model["doc"]
print(f"{doc.sentences[0].dependencies_string()}")
result1 = sw.lang_multi(list_text,func_process=process,download_lang='en,zh,fr')
print(result1)
sw.print_result(result1)
```
Example 9: Stanza's NER (Legacy use for Java-based Stanford CoreNLP)
```python
from nerkit.StanzaApi import StanfordCoreNLPClient
corenlp_root_path=f"stanfordcorenlp/stanford-corenlp-latest/stanford-corenlp-4.3.2"
corenlp=StanfordCoreNLPClient(corenlp_root_path=corenlp_root_path,language='zh')
text="我喜欢游览广东孙中山故居景点!"
list_token=corenlp.get_entity_list(text)
for token in list_token:
print(f"{token['value']}\t{token['pos']}\t{token['ner']}")
```
Example 10: Stanford CoreNLP (Not official version)
```python
import os
from nerkit.StanfordCoreNLP import get_entity_list
text="我喜欢游览广东孙中山故居景点!"
current_path = os.path.dirname(os.path.realpath(__file__))
res=get_entity_list(text,resource_path=f"{current_path}/stanfordcorenlp/stanford-corenlp-latest/stanford-corenlp-4.3.2")
print(res)
for w,tag in res:
if tag in ['PERSON','ORGANIZATION','LOCATION']:
print(w,tag)
```
Example 11: Open IE
```python
from nerkit.StanzaApi import StanfordCoreNLPClient
corenlp_root_path=f"stanfordcorenlp/stanford-corenlp-latest/stanford-corenlp-4.3.2"
text = "Barack Obama was born in Hawaii. Richard Manning wrote this sentence."
corenlp=StanfordCoreNLPClient(corenlp_root_path=corenlp_root_path)
list_result=corenlp.open_ie(text)
for model in list_result:
print(model["subject"],'--',model['relation'],'-->',model["object"])
out=corenlp.force_close_server() # support force closing port in Windows
print(out)
```
Example 12: Generate triples from files
```python
from nerkit.triples.text import generate_triples_from_files
input_folder='data'
output_folder='output'
list_result_all=generate_triples_from_files(input_folder=input_folder,
output_folder=output_folder,
return_all_results=True,
ltp_data_folder='../ltp_data')
print(list_result_all)
```
Example 13: Generate a list of tripls
```python
from nerkit.triples.ltp import *
text=open("data/test.txt",'r',encoding='utf-8').read()
extractor=get_ltp_triple_instance(ltp_data_folder='D:/UIBEResearch/ltp_data')
list_event=get_ltp_triple_list(extractor=extractor,text=text)
for event in list_event:
print(event)
```
### Credits & References
- [Stanza](https://stanfordnlp.github.io/stanza/index.html)
- [Stanford CoreNLP](https://stanfordnlp.github.io/CoreNLP/)
### License
The `ner-kit` project is provided by [Donghua Chen](https://github.com/dhchenx).
Raw data
{
"_id": null,
"home_page": "https://github.com/dhchenx/ner-kit",
"name": "ner-kit",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6, <4",
"maintainer_email": "",
"keywords": "named entity recognition,NER,text analysis",
"author": "Donghua Chen",
"author_email": "douglaschan@126.com",
"download_url": "https://files.pythonhosted.org/packages/4a/eb/dfee7470e28fadb26e6a3badc2f72effb7979ff37aa23418ef00ddcb8858/ner-kit-0.0.5.tar.gz",
"platform": null,
"description": "## Named Entity Recognition Toolkit\n\nProvide a toolkit for rapidly extracting useful entities from text using various Python packages, including [Stanza](https://stanfordnlp.github.io/stanza/index.html). \n\n### Features\nWe try to bring the complicated use of existing NLP toolkits down to earth by keeping APIs as simple as possible with best practice. \n\n### Installation\n```pip\npip install ner-kit\n```\n\n### Examples\n\nExample 1: Word segmention\n```python\nfrom nerkit.StanzaApi import StanzaWrapper\nif __name__==\"__main__\":\n sw=StanzaWrapper()\n sw.download(lang=\"en\")\n text='This is a test sentence for stanza. This is another sentence.'\n result1=sw.tokenize(text)\n sw.print_result(result1)\n```\n\nExample 2: Chinese word segmentation\n```python\nfrom nerkit.StanzaApi import StanzaWrapper\nif __name__==\"__main__\":\n sw=StanzaWrapper()\n sw.download(lang=\"zh\")\n text='\u6211\u5728\u5317\u4eac\u5403\u82f9\u679c\uff01'\n result1=sw.tokenize(text,lang='zh')\n sw.print_result(result1)\n```\n\nExample 3: Multi-Word Token (MWT) Expansion\n```python\nfrom nerkit.StanzaApi import StanzaWrapper\nif __name__==\"__main__\":\n sw=StanzaWrapper()\n sw.download(lang=\"fr\")\n text='Nous avons atteint la fin du sentier.'\n result1=sw.mwt_expand(text,lang='fr')\n sw.print_result(result1)\n```\n\nExample 4: POS tagging\n```python\nfrom nerkit.StanzaApi import StanzaWrapper\nif __name__==\"__main__\":\n sw=StanzaWrapper()\n sw.download(lang='en')\n text='I like apple'\n result1=sw.tag(text)\n sw.print_result(result1)\n sw.download_chinese_model()\n text='\u6211\u559c\u6b22\u82f9\u679c'\n result2=sw.tag_chinese(text,lang='zh')\n sw.print_result(result2)\n```\n\nExample 5: Named Entity Recognition\n```python\nfrom nerkit.StanzaApi import StanzaWrapper\n\nif __name__==\"__main__\":\n sw=StanzaWrapper()\n\n sw.download(lang='en')\n sw.download_chinese_model()\n\n text_en = 'I like Beijing!'\n result1 = sw.ner(text_en)\n sw.print_result(result1)\n\n text='\u6211\u559c\u6b22\u5317\u4eac\uff01'\n result2=sw.ner_chinese(text)\n sw.print_result(result2)\n\n```\n\nExample 6: Sentiment Analysis\n```python\nfrom nerkit.StanzaApi import StanzaWrapper\n\nif __name__==\"__main__\":\n sw=StanzaWrapper()\n text_en = 'I like Beijing!'\n result1 = sw.sentiment(text_en)\n sw.print_result(result1)\n\n text_zh='\u6211\u8ba8\u538c\u82f9\u679c\uff01'\n result2=sw.sentiment_chinese(text_zh)\n sw.print_result(result2)\n```\n\nExample 7: Language detection from text\n```python\nfrom nerkit.StanzaApi import StanzaWrapper\nif __name__==\"__main__\":\n sw=StanzaWrapper()\n list_text = ['I like Beijing!','\u6211\u559c\u6b22\u5317\u4eac\uff01', \"Bonjour le monde!\"]\n result1 = sw.lang(list_text)\n sw.print_result(result1)\n```\n\nExample 8: Language detection from text with a user-defined processing function\n```python\nfrom nerkit.StanzaApi import StanzaWrapper\nif __name__==\"__main__\":\n sw=StanzaWrapper()\n list_text = ['I like Beijing!','\u6211\u559c\u6b22\u5317\u4eac\uff01', \"Bonjour le monde!\"]\n def process(model):# do your own business\n doc=model[\"doc\"]\n print(f\"{doc.sentences[0].dependencies_string()}\")\n result1 = sw.lang_multi(list_text,func_process=process,download_lang='en,zh,fr')\n print(result1)\n sw.print_result(result1)\n```\n\nExample 9: Stanza's NER (Legacy use for Java-based Stanford CoreNLP)\n```python\nfrom nerkit.StanzaApi import StanfordCoreNLPClient\ncorenlp_root_path=f\"stanfordcorenlp/stanford-corenlp-latest/stanford-corenlp-4.3.2\"\ncorenlp=StanfordCoreNLPClient(corenlp_root_path=corenlp_root_path,language='zh')\ntext=\"\u6211\u559c\u6b22\u6e38\u89c8\u5e7f\u4e1c\u5b59\u4e2d\u5c71\u6545\u5c45\u666f\u70b9\uff01\"\nlist_token=corenlp.get_entity_list(text)\nfor token in list_token:\n print(f\"{token['value']}\\t{token['pos']}\\t{token['ner']}\")\n```\n\nExample 10: Stanford CoreNLP (Not official version)\n```python\nimport os\nfrom nerkit.StanfordCoreNLP import get_entity_list\ntext=\"\u6211\u559c\u6b22\u6e38\u89c8\u5e7f\u4e1c\u5b59\u4e2d\u5c71\u6545\u5c45\u666f\u70b9\uff01\"\ncurrent_path = os.path.dirname(os.path.realpath(__file__))\nres=get_entity_list(text,resource_path=f\"{current_path}/stanfordcorenlp/stanford-corenlp-latest/stanford-corenlp-4.3.2\")\nprint(res)\nfor w,tag in res:\n if tag in ['PERSON','ORGANIZATION','LOCATION']:\n print(w,tag)\n```\n\nExample 11: Open IE\n```python\nfrom nerkit.StanzaApi import StanfordCoreNLPClient\ncorenlp_root_path=f\"stanfordcorenlp/stanford-corenlp-latest/stanford-corenlp-4.3.2\"\ntext = \"Barack Obama was born in Hawaii. Richard Manning wrote this sentence.\"\ncorenlp=StanfordCoreNLPClient(corenlp_root_path=corenlp_root_path)\nlist_result=corenlp.open_ie(text)\nfor model in list_result:\n print(model[\"subject\"],'--',model['relation'],'-->',model[\"object\"])\nout=corenlp.force_close_server() # support force closing port in Windows\nprint(out)\n```\n\nExample 12: Generate triples from files\n```python\nfrom nerkit.triples.text import generate_triples_from_files\ninput_folder='data'\noutput_folder='output'\nlist_result_all=generate_triples_from_files(input_folder=input_folder,\n output_folder=output_folder,\n return_all_results=True,\n ltp_data_folder='../ltp_data')\n\nprint(list_result_all)\n```\n\nExample 13: Generate a list of tripls\n```python\nfrom nerkit.triples.ltp import *\ntext=open(\"data/test.txt\",'r',encoding='utf-8').read()\nextractor=get_ltp_triple_instance(ltp_data_folder='D:/UIBEResearch/ltp_data')\nlist_event=get_ltp_triple_list(extractor=extractor,text=text)\nfor event in list_event:\n print(event)\n```\n\n### Credits & References\n\n- [Stanza](https://stanfordnlp.github.io/stanza/index.html)\n- [Stanford CoreNLP](https://stanfordnlp.github.io/CoreNLP/)\n\n### License\nThe `ner-kit` project is provided by [Donghua Chen](https://github.com/dhchenx). \n\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Named Entity Recognition Toolkit",
"version": "0.0.5",
"project_urls": {
"Bug Reports": "https://github.com/dhchenx/ner-kit/issues",
"Homepage": "https://github.com/dhchenx/ner-kit",
"Source": "https://github.com/dhchenx/ner-kit"
},
"split_keywords": [
"named entity recognition",
"ner",
"text analysis"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ed42bbed82aee6def96e0551baf5220e05c154a9044dba591c7b68a8a391ef93",
"md5": "17b5d0cc5a7fd777fcb7e65486fdb9f4",
"sha256": "0d72c2a726def4d8956b7abb2e43debf708585f46bfabce77c3aa9e51aacf861"
},
"downloads": -1,
"filename": "ner_kit-0.0.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "17b5d0cc5a7fd777fcb7e65486fdb9f4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6, <4",
"size": 47500,
"upload_time": "2023-05-04T14:10:48",
"upload_time_iso_8601": "2023-05-04T14:10:48.219067Z",
"url": "https://files.pythonhosted.org/packages/ed/42/bbed82aee6def96e0551baf5220e05c154a9044dba591c7b68a8a391ef93/ner_kit-0.0.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4aebdfee7470e28fadb26e6a3badc2f72effb7979ff37aa23418ef00ddcb8858",
"md5": "aa1185c784ac7bac5f1003a759038d97",
"sha256": "0f8ca98d18c273900e626cd9526a33bc114cdeba832f60281bcc7c754de1ca71"
},
"downloads": -1,
"filename": "ner-kit-0.0.5.tar.gz",
"has_sig": false,
"md5_digest": "aa1185c784ac7bac5f1003a759038d97",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6, <4",
"size": 44253,
"upload_time": "2023-05-04T14:10:50",
"upload_time_iso_8601": "2023-05-04T14:10:50.781024Z",
"url": "https://files.pythonhosted.org/packages/4a/eb/dfee7470e28fadb26e6a3badc2f72effb7979ff37aa23418ef00ddcb8858/ner-kit-0.0.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-04 14:10:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "dhchenx",
"github_project": "ner-kit",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "ner-kit"
}