ner-kit


Namener-kit JSON
Version 0.0.5 PyPI version JSON
download
home_pagehttps://github.com/dhchenx/ner-kit
SummaryNamed Entity Recognition Toolkit
upload_time2023-05-04 14:10:50
maintainer
docs_urlNone
authorDonghua Chen
requires_python>=3.6, <4
licenseMIT
keywords named entity recognition ner text analysis
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## Named Entity Recognition Toolkit

Provide a toolkit for rapidly extracting useful entities from text using various Python packages, including [Stanza](https://stanfordnlp.github.io/stanza/index.html). 

### Features
We try to bring the complicated use of existing NLP toolkits down to earth by keeping APIs as simple as possible with best practice. 

### Installation
```pip
pip install ner-kit
```

### Examples

Example 1: Word segmention
```python
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
    sw=StanzaWrapper()
    sw.download(lang="en")
    text='This is a test sentence for stanza. This is another sentence.'
    result1=sw.tokenize(text)
    sw.print_result(result1)
```

Example 2: Chinese word segmentation
```python
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
    sw=StanzaWrapper()
    sw.download(lang="zh")
    text='我在北京吃苹果!'
    result1=sw.tokenize(text,lang='zh')
    sw.print_result(result1)
```

Example 3: Multi-Word Token (MWT) Expansion
```python
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
    sw=StanzaWrapper()
    sw.download(lang="fr")
    text='Nous avons atteint la fin du sentier.'
    result1=sw.mwt_expand(text,lang='fr')
    sw.print_result(result1)
```

Example 4: POS tagging
```python
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
    sw=StanzaWrapper()
    sw.download(lang='en')
    text='I like apple'
    result1=sw.tag(text)
    sw.print_result(result1)
    sw.download_chinese_model()
    text='我喜欢苹果'
    result2=sw.tag_chinese(text,lang='zh')
    sw.print_result(result2)
```

Example 5: Named Entity Recognition
```python
from nerkit.StanzaApi import StanzaWrapper

if __name__=="__main__":
    sw=StanzaWrapper()

    sw.download(lang='en')
    sw.download_chinese_model()

    text_en = 'I like Beijing!'
    result1 = sw.ner(text_en)
    sw.print_result(result1)

    text='我喜欢北京!'
    result2=sw.ner_chinese(text)
    sw.print_result(result2)

```

Example 6: Sentiment Analysis
```python
from nerkit.StanzaApi import StanzaWrapper

if __name__=="__main__":
    sw=StanzaWrapper()
    text_en = 'I like Beijing!'
    result1 = sw.sentiment(text_en)
    sw.print_result(result1)

    text_zh='我讨厌苹果!'
    result2=sw.sentiment_chinese(text_zh)
    sw.print_result(result2)
```

Example 7: Language detection from text
```python
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
    sw=StanzaWrapper()
    list_text = ['I like Beijing!','我喜欢北京!', "Bonjour le monde!"]
    result1 = sw.lang(list_text)
    sw.print_result(result1)
```

Example 8: Language detection from text with a user-defined processing function
```python
from nerkit.StanzaApi import StanzaWrapper
if __name__=="__main__":
    sw=StanzaWrapper()
    list_text = ['I like Beijing!','我喜欢北京!', "Bonjour le monde!"]
    def process(model):# do your own business
        doc=model["doc"]
        print(f"{doc.sentences[0].dependencies_string()}")
    result1 = sw.lang_multi(list_text,func_process=process,download_lang='en,zh,fr')
    print(result1)
    sw.print_result(result1)
```

Example 9: Stanza's NER (Legacy use for Java-based Stanford CoreNLP)
```python
from nerkit.StanzaApi import StanfordCoreNLPClient
corenlp_root_path=f"stanfordcorenlp/stanford-corenlp-latest/stanford-corenlp-4.3.2"
corenlp=StanfordCoreNLPClient(corenlp_root_path=corenlp_root_path,language='zh')
text="我喜欢游览广东孙中山故居景点!"
list_token=corenlp.get_entity_list(text)
for token in list_token:
    print(f"{token['value']}\t{token['pos']}\t{token['ner']}")
```

Example 10: Stanford CoreNLP (Not official version)
```python
import os
from nerkit.StanfordCoreNLP import get_entity_list
text="我喜欢游览广东孙中山故居景点!"
current_path = os.path.dirname(os.path.realpath(__file__))
res=get_entity_list(text,resource_path=f"{current_path}/stanfordcorenlp/stanford-corenlp-latest/stanford-corenlp-4.3.2")
print(res)
for w,tag in res:
    if tag in ['PERSON','ORGANIZATION','LOCATION']:
        print(w,tag)
```

Example 11: Open IE
```python
from nerkit.StanzaApi import StanfordCoreNLPClient
corenlp_root_path=f"stanfordcorenlp/stanford-corenlp-latest/stanford-corenlp-4.3.2"
text = "Barack Obama was born in Hawaii. Richard Manning wrote this sentence."
corenlp=StanfordCoreNLPClient(corenlp_root_path=corenlp_root_path)
list_result=corenlp.open_ie(text)
for model in list_result:
    print(model["subject"],'--',model['relation'],'-->',model["object"])
out=corenlp.force_close_server() # support force closing port in Windows
print(out)
```

Example 12: Generate triples from files
```python
from nerkit.triples.text import generate_triples_from_files
input_folder='data'
output_folder='output'
list_result_all=generate_triples_from_files(input_folder=input_folder,
                                            output_folder=output_folder,
                                            return_all_results=True,
                                            ltp_data_folder='../ltp_data')

print(list_result_all)
```

Example 13: Generate a list of tripls
```python
from nerkit.triples.ltp import *
text=open("data/test.txt",'r',encoding='utf-8').read()
extractor=get_ltp_triple_instance(ltp_data_folder='D:/UIBEResearch/ltp_data')
list_event=get_ltp_triple_list(extractor=extractor,text=text)
for event in list_event:
    print(event)
```

### Credits & References

- [Stanza](https://stanfordnlp.github.io/stanza/index.html)
- [Stanford CoreNLP](https://stanfordnlp.github.io/CoreNLP/)

### License
The `ner-kit` project is provided by [Donghua Chen](https://github.com/dhchenx). 




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/dhchenx/ner-kit",
    "name": "ner-kit",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6, <4",
    "maintainer_email": "",
    "keywords": "named entity recognition,NER,text analysis",
    "author": "Donghua Chen",
    "author_email": "douglaschan@126.com",
    "download_url": "https://files.pythonhosted.org/packages/4a/eb/dfee7470e28fadb26e6a3badc2f72effb7979ff37aa23418ef00ddcb8858/ner-kit-0.0.5.tar.gz",
    "platform": null,
    "description": "## Named Entity Recognition Toolkit\n\nProvide a toolkit for rapidly extracting useful entities from text using various Python packages, including [Stanza](https://stanfordnlp.github.io/stanza/index.html). \n\n### Features\nWe try to bring the complicated use of existing NLP toolkits down to earth by keeping APIs as simple as possible with best practice. \n\n### Installation\n```pip\npip install ner-kit\n```\n\n### Examples\n\nExample 1: Word segmention\n```python\nfrom nerkit.StanzaApi import StanzaWrapper\nif __name__==\"__main__\":\n    sw=StanzaWrapper()\n    sw.download(lang=\"en\")\n    text='This is a test sentence for stanza. This is another sentence.'\n    result1=sw.tokenize(text)\n    sw.print_result(result1)\n```\n\nExample 2: Chinese word segmentation\n```python\nfrom nerkit.StanzaApi import StanzaWrapper\nif __name__==\"__main__\":\n    sw=StanzaWrapper()\n    sw.download(lang=\"zh\")\n    text='\u6211\u5728\u5317\u4eac\u5403\u82f9\u679c\uff01'\n    result1=sw.tokenize(text,lang='zh')\n    sw.print_result(result1)\n```\n\nExample 3: Multi-Word Token (MWT) Expansion\n```python\nfrom nerkit.StanzaApi import StanzaWrapper\nif __name__==\"__main__\":\n    sw=StanzaWrapper()\n    sw.download(lang=\"fr\")\n    text='Nous avons atteint la fin du sentier.'\n    result1=sw.mwt_expand(text,lang='fr')\n    sw.print_result(result1)\n```\n\nExample 4: POS tagging\n```python\nfrom nerkit.StanzaApi import StanzaWrapper\nif __name__==\"__main__\":\n    sw=StanzaWrapper()\n    sw.download(lang='en')\n    text='I like apple'\n    result1=sw.tag(text)\n    sw.print_result(result1)\n    sw.download_chinese_model()\n    text='\u6211\u559c\u6b22\u82f9\u679c'\n    result2=sw.tag_chinese(text,lang='zh')\n    sw.print_result(result2)\n```\n\nExample 5: Named Entity Recognition\n```python\nfrom nerkit.StanzaApi import StanzaWrapper\n\nif __name__==\"__main__\":\n    sw=StanzaWrapper()\n\n    sw.download(lang='en')\n    sw.download_chinese_model()\n\n    text_en = 'I like Beijing!'\n    result1 = sw.ner(text_en)\n    sw.print_result(result1)\n\n    text='\u6211\u559c\u6b22\u5317\u4eac\uff01'\n    result2=sw.ner_chinese(text)\n    sw.print_result(result2)\n\n```\n\nExample 6: Sentiment Analysis\n```python\nfrom nerkit.StanzaApi import StanzaWrapper\n\nif __name__==\"__main__\":\n    sw=StanzaWrapper()\n    text_en = 'I like Beijing!'\n    result1 = sw.sentiment(text_en)\n    sw.print_result(result1)\n\n    text_zh='\u6211\u8ba8\u538c\u82f9\u679c\uff01'\n    result2=sw.sentiment_chinese(text_zh)\n    sw.print_result(result2)\n```\n\nExample 7: Language detection from text\n```python\nfrom nerkit.StanzaApi import StanzaWrapper\nif __name__==\"__main__\":\n    sw=StanzaWrapper()\n    list_text = ['I like Beijing!','\u6211\u559c\u6b22\u5317\u4eac\uff01', \"Bonjour le monde!\"]\n    result1 = sw.lang(list_text)\n    sw.print_result(result1)\n```\n\nExample 8: Language detection from text with a user-defined processing function\n```python\nfrom nerkit.StanzaApi import StanzaWrapper\nif __name__==\"__main__\":\n    sw=StanzaWrapper()\n    list_text = ['I like Beijing!','\u6211\u559c\u6b22\u5317\u4eac\uff01', \"Bonjour le monde!\"]\n    def process(model):# do your own business\n        doc=model[\"doc\"]\n        print(f\"{doc.sentences[0].dependencies_string()}\")\n    result1 = sw.lang_multi(list_text,func_process=process,download_lang='en,zh,fr')\n    print(result1)\n    sw.print_result(result1)\n```\n\nExample 9: Stanza's NER (Legacy use for Java-based Stanford CoreNLP)\n```python\nfrom nerkit.StanzaApi import StanfordCoreNLPClient\ncorenlp_root_path=f\"stanfordcorenlp/stanford-corenlp-latest/stanford-corenlp-4.3.2\"\ncorenlp=StanfordCoreNLPClient(corenlp_root_path=corenlp_root_path,language='zh')\ntext=\"\u6211\u559c\u6b22\u6e38\u89c8\u5e7f\u4e1c\u5b59\u4e2d\u5c71\u6545\u5c45\u666f\u70b9\uff01\"\nlist_token=corenlp.get_entity_list(text)\nfor token in list_token:\n    print(f\"{token['value']}\\t{token['pos']}\\t{token['ner']}\")\n```\n\nExample 10: Stanford CoreNLP (Not official version)\n```python\nimport os\nfrom nerkit.StanfordCoreNLP import get_entity_list\ntext=\"\u6211\u559c\u6b22\u6e38\u89c8\u5e7f\u4e1c\u5b59\u4e2d\u5c71\u6545\u5c45\u666f\u70b9\uff01\"\ncurrent_path = os.path.dirname(os.path.realpath(__file__))\nres=get_entity_list(text,resource_path=f\"{current_path}/stanfordcorenlp/stanford-corenlp-latest/stanford-corenlp-4.3.2\")\nprint(res)\nfor w,tag in res:\n    if tag in ['PERSON','ORGANIZATION','LOCATION']:\n        print(w,tag)\n```\n\nExample 11: Open IE\n```python\nfrom nerkit.StanzaApi import StanfordCoreNLPClient\ncorenlp_root_path=f\"stanfordcorenlp/stanford-corenlp-latest/stanford-corenlp-4.3.2\"\ntext = \"Barack Obama was born in Hawaii. Richard Manning wrote this sentence.\"\ncorenlp=StanfordCoreNLPClient(corenlp_root_path=corenlp_root_path)\nlist_result=corenlp.open_ie(text)\nfor model in list_result:\n    print(model[\"subject\"],'--',model['relation'],'-->',model[\"object\"])\nout=corenlp.force_close_server() # support force closing port in Windows\nprint(out)\n```\n\nExample 12: Generate triples from files\n```python\nfrom nerkit.triples.text import generate_triples_from_files\ninput_folder='data'\noutput_folder='output'\nlist_result_all=generate_triples_from_files(input_folder=input_folder,\n                                            output_folder=output_folder,\n                                            return_all_results=True,\n                                            ltp_data_folder='../ltp_data')\n\nprint(list_result_all)\n```\n\nExample 13: Generate a list of tripls\n```python\nfrom nerkit.triples.ltp import *\ntext=open(\"data/test.txt\",'r',encoding='utf-8').read()\nextractor=get_ltp_triple_instance(ltp_data_folder='D:/UIBEResearch/ltp_data')\nlist_event=get_ltp_triple_list(extractor=extractor,text=text)\nfor event in list_event:\n    print(event)\n```\n\n### Credits & References\n\n- [Stanza](https://stanfordnlp.github.io/stanza/index.html)\n- [Stanford CoreNLP](https://stanfordnlp.github.io/CoreNLP/)\n\n### License\nThe `ner-kit` project is provided by [Donghua Chen](https://github.com/dhchenx). \n\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Named Entity Recognition Toolkit",
    "version": "0.0.5",
    "project_urls": {
        "Bug Reports": "https://github.com/dhchenx/ner-kit/issues",
        "Homepage": "https://github.com/dhchenx/ner-kit",
        "Source": "https://github.com/dhchenx/ner-kit"
    },
    "split_keywords": [
        "named entity recognition",
        "ner",
        "text analysis"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ed42bbed82aee6def96e0551baf5220e05c154a9044dba591c7b68a8a391ef93",
                "md5": "17b5d0cc5a7fd777fcb7e65486fdb9f4",
                "sha256": "0d72c2a726def4d8956b7abb2e43debf708585f46bfabce77c3aa9e51aacf861"
            },
            "downloads": -1,
            "filename": "ner_kit-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "17b5d0cc5a7fd777fcb7e65486fdb9f4",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6, <4",
            "size": 47500,
            "upload_time": "2023-05-04T14:10:48",
            "upload_time_iso_8601": "2023-05-04T14:10:48.219067Z",
            "url": "https://files.pythonhosted.org/packages/ed/42/bbed82aee6def96e0551baf5220e05c154a9044dba591c7b68a8a391ef93/ner_kit-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4aebdfee7470e28fadb26e6a3badc2f72effb7979ff37aa23418ef00ddcb8858",
                "md5": "aa1185c784ac7bac5f1003a759038d97",
                "sha256": "0f8ca98d18c273900e626cd9526a33bc114cdeba832f60281bcc7c754de1ca71"
            },
            "downloads": -1,
            "filename": "ner-kit-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "aa1185c784ac7bac5f1003a759038d97",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6, <4",
            "size": 44253,
            "upload_time": "2023-05-04T14:10:50",
            "upload_time_iso_8601": "2023-05-04T14:10:50.781024Z",
            "url": "https://files.pythonhosted.org/packages/4a/eb/dfee7470e28fadb26e6a3badc2f72effb7979ff37aa23418ef00ddcb8858/ner-kit-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-04 14:10:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "dhchenx",
    "github_project": "ner-kit",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "ner-kit"
}
        
Elapsed time: 0.12608s