Name | litner JSON |
Version |
0.0.1
JSON |
| download |
home_page | https://github.com/xusenlinzy/lit-ner |
Summary | Pytorch-lightning Code Blocks for NER |
upload_time | 2023-06-12 10:00:15 |
maintainer | |
docs_url | None |
author | xusenlin |
requires_python | >=3.7 |
license | |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Lit-NER
<p align="center">
<a href="https://github.com/xusenlinzy/lit-ner"><img src="https://img.shields.io/github/license/xusenlinzy/lit-ner"></a>
<a href=""><img src="https://img.shields.io/badge/python-3.8+-aff.svg"></a>
<a href=""><img src="https://img.shields.io/badge/pytorch-%3E=1.12-red?logo=pytorch"></a>
<a href="https://github.com/xusenlinzy/lit-ner"><img src="https://img.shields.io/github/last-commit/xusenlinzy/lit-ner"></a>
<a href="https://github.com/xusenlinzy/lit-ner"><img src="https://img.shields.io/github/issues/xusenlinzy/lit-ner?color=9cc"></a>
<a href="https://github.com/xusenlinzy/lit-ner"><img src="https://img.shields.io/github/stars/xusenlinzy/lit-ner?color=ccf"></a>
<a href="https://github.com/xusenlinzy/lit-ner"><img src="https://img.shields.io/badge/langurage-py-brightgreen?style=flat&color=blue"></a>
</p>
此项目为开源**命名实体识别**模型的训练和推理提供统一的框架,具有以下特性
+ ✨ 支持多种开源实体抽取模型
+ 🚀 统一的训练和推理框架
## 📢 News
+ 【2023.6.12】 提交初版代码
---
## 🔨 安装
1. `pytorch`
```bash
conda create -n pytorch python=3.8
conda activate pytorch
conda install pytorch cudatoolkit -c pytorch
```
2. 安装 `litner`
```bash
pip install litner
```
## 🐼 模型
支持多种开源实体抽取模型
| 模型 | 论文 | 备注 |
|---------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
| [softmax](litner/nn/ner/crf.py) | | 全连接层序列标注并使用 `BIO` 解码 |
| [crf](litner/nn/ner/crf.py) | [Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data](https://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers) | 全连接层+条件随机场,并使用 `BIO` 解码 |
| [cascade-crf](litner/nn/ner/crf.py) | | 先预测实体再预测实体类型 |
| [span](litner/nn/ner/span.py) | | 使用两个指针网络预测实体起始位置 |
| [global-pointer](litner/nn/ner/global_pointer.py) | | [GlobalPointer:用统一的方式处理嵌套和非嵌套NER](https://spaces.ac.cn/archives/8373)、[Efficient GlobalPointer:少点参数,多点效果](https://spaces.ac.cn/archives/8877) |
| [mrc](litner/nn/ner/mrc.py) | [A Unified MRC Framework for Named Entity Recognition.](https://aclanthology.org/2020.acl-main.519.pdf) | 将实体识别任务转换为阅读理解问题,输入为实体类型模板+句子,预测对应实体的起始位置 |
| [tplinker](litner/nn/ner/tplinker.py) | [TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking.](https://aclanthology.org/2020.coling-main.138.pdf) | 将实体识别任务转换为表格填充问题 |
| [lear](litner/nn/ner/lear.py) | [Enhanced Language Representation with Label Knowledge for Span Extraction.](https://aclanthology.org/2021.emnlp-main.379.pdf) | 改进 `MRC` 方法效率问题,采用标签融合机制 |
| [w2ner](litner/nn/ner/w2ner.py) | [Unified Named Entity Recognition as Word-Word Relation Classification.](https://arxiv.org/pdf/2112.10070.pdf) | 统一解决嵌套实体、不连续实体的抽取问题 |
| [cnn](litner/nn/ner/cnn.py) | [An Embarrassingly Easy but Strong Baseline for Nested Named Entity Recognition.](https://arxiv.org/abs/2208.04534) | 改进 `W2NER` 方法,采用卷积网络提取实体内部token之间的关系 |
## 📚 数据
将数据集处理成以下 `json` 格式
```json
{
"text": "结果上周六他们主场0:3惨败给了中游球队瓦拉多利德,近7个多月以来西甲首次输球。",
"entities": [
{
"id": 0,
"entity": "瓦拉多利德",
"start_offset": 20,
"end_offset": 25,
"label": "organization"
},
{
"id": 1,
"entity": "西甲",
"start_offset": 33,
"end_offset": 35,
"label": "organization"
}
]
}
```
字段含义:
+ `text`: 文本内容
+ `entities`: 该文本所包含的所有实体
+ `id`: 实体 `id`
+ `entity`: 实体名称
+ `start_offset`: 实体开始位置
+ `end_offset`: 实体结束位置的下一位
+ `label`: 实体类型
## 🚀 模型训练
```python
import os
import sys
from transformers import HfArgumentParser
from litner.arguments import (
DataTrainingArguments,
ModelArguments,
TrainingArguments,
)
from litner.models import AutoNerModel
os.environ['TRANSFORMERS_NO_ADVISORY_WARNINGS'] = 'true'
parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
else:
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
# 1. create model
model = AutoNerModel.create(model_args=model_args, training_args=training_args)
# 2. finetune model
model.finetune(data_args)
```
训练脚本详见 [scripts](./scripts)
## 📊 模型推理
```python
from litner.pipelines import NerPipeline
task_model = "crf"
model_name_or_path = "path of crf model"
pipeline = NerPipeline(task_model, model_name_or_path=model_name_or_path)
print(pipeline("结果上周六他们主场0:3惨败给了中游球队瓦拉多利德,近7个多月以来西甲首次输球。"))
```
## 📜 License
此项目为 `Apache 2.0` 许可证授权,有关详细信息,请参阅 [LICENSE](LICENSE) 文件。
Raw data
{
"_id": null,
"home_page": "https://github.com/xusenlinzy/lit-ner",
"name": "litner",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "",
"author": "xusenlin",
"author_email": "1659821119@qq.com",
"download_url": "",
"platform": null,
"description": "# Lit-NER\n\n<p align=\"center\">\n <a href=\"https://github.com/xusenlinzy/lit-ner\"><img src=\"https://img.shields.io/github/license/xusenlinzy/lit-ner\"></a>\n <a href=\"\"><img src=\"https://img.shields.io/badge/python-3.8+-aff.svg\"></a>\n <a href=\"\"><img src=\"https://img.shields.io/badge/pytorch-%3E=1.12-red?logo=pytorch\"></a>\n <a href=\"https://github.com/xusenlinzy/lit-ner\"><img src=\"https://img.shields.io/github/last-commit/xusenlinzy/lit-ner\"></a>\n <a href=\"https://github.com/xusenlinzy/lit-ner\"><img src=\"https://img.shields.io/github/issues/xusenlinzy/lit-ner?color=9cc\"></a>\n <a href=\"https://github.com/xusenlinzy/lit-ner\"><img src=\"https://img.shields.io/github/stars/xusenlinzy/lit-ner?color=ccf\"></a>\n <a href=\"https://github.com/xusenlinzy/lit-ner\"><img src=\"https://img.shields.io/badge/langurage-py-brightgreen?style=flat&color=blue\"></a>\n</p>\n\n\u6b64\u9879\u76ee\u4e3a\u5f00\u6e90**\u547d\u540d\u5b9e\u4f53\u8bc6\u522b**\u6a21\u578b\u7684\u8bad\u7ec3\u548c\u63a8\u7406\u63d0\u4f9b\u7edf\u4e00\u7684\u6846\u67b6\uff0c\u5177\u6709\u4ee5\u4e0b\u7279\u6027\n\n\n+ \u2728 \u652f\u6301\u591a\u79cd\u5f00\u6e90\u5b9e\u4f53\u62bd\u53d6\u6a21\u578b\n\n\n+ \ud83d\ude80 \u7edf\u4e00\u7684\u8bad\u7ec3\u548c\u63a8\u7406\u6846\u67b6\n\n\n## \ud83d\udce2 News \n\n+ \u30102023.6.12\u3011 \u63d0\u4ea4\u521d\u7248\u4ee3\u7801\n\n\n---\n\n## \ud83d\udd28 \u5b89\u88c5\n\n1. `pytorch`\n\n```bash\nconda create -n pytorch python=3.8\nconda activate pytorch\nconda install pytorch cudatoolkit -c pytorch\n```\n\n2. \u5b89\u88c5 `litner`\n\n```bash\npip install litner\n```\n\n\n## \ud83d\udc3c \u6a21\u578b\n\n\u652f\u6301\u591a\u79cd\u5f00\u6e90\u5b9e\u4f53\u62bd\u53d6\u6a21\u578b\n\n| \u6a21\u578b | \u8bba\u6587 | \u5907\u6ce8 |\n|---------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|\n| [softmax](litner/nn/ner/crf.py) | | \u5168\u8fde\u63a5\u5c42\u5e8f\u5217\u6807\u6ce8\u5e76\u4f7f\u7528 `BIO` \u89e3\u7801 |\n| [crf](litner/nn/ner/crf.py) | [Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data](https://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers) | \u5168\u8fde\u63a5\u5c42+\u6761\u4ef6\u968f\u673a\u573a\uff0c\u5e76\u4f7f\u7528 `BIO` \u89e3\u7801 |\n| [cascade-crf](litner/nn/ner/crf.py) | | \u5148\u9884\u6d4b\u5b9e\u4f53\u518d\u9884\u6d4b\u5b9e\u4f53\u7c7b\u578b |\n| [span](litner/nn/ner/span.py) | | \u4f7f\u7528\u4e24\u4e2a\u6307\u9488\u7f51\u7edc\u9884\u6d4b\u5b9e\u4f53\u8d77\u59cb\u4f4d\u7f6e |\n| [global-pointer](litner/nn/ner/global_pointer.py) | | [GlobalPointer\uff1a\u7528\u7edf\u4e00\u7684\u65b9\u5f0f\u5904\u7406\u5d4c\u5957\u548c\u975e\u5d4c\u5957NER](https://spaces.ac.cn/archives/8373)\u3001[Efficient GlobalPointer\uff1a\u5c11\u70b9\u53c2\u6570\uff0c\u591a\u70b9\u6548\u679c](https://spaces.ac.cn/archives/8877) |\n| [mrc](litner/nn/ner/mrc.py) | [A Unified MRC Framework for Named Entity Recognition.](https://aclanthology.org/2020.acl-main.519.pdf) | \u5c06\u5b9e\u4f53\u8bc6\u522b\u4efb\u52a1\u8f6c\u6362\u4e3a\u9605\u8bfb\u7406\u89e3\u95ee\u9898\uff0c\u8f93\u5165\u4e3a\u5b9e\u4f53\u7c7b\u578b\u6a21\u677f+\u53e5\u5b50\uff0c\u9884\u6d4b\u5bf9\u5e94\u5b9e\u4f53\u7684\u8d77\u59cb\u4f4d\u7f6e |\n| [tplinker](litner/nn/ner/tplinker.py) | [TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking.](https://aclanthology.org/2020.coling-main.138.pdf) | \u5c06\u5b9e\u4f53\u8bc6\u522b\u4efb\u52a1\u8f6c\u6362\u4e3a\u8868\u683c\u586b\u5145\u95ee\u9898 |\n| [lear](litner/nn/ner/lear.py) | [Enhanced Language Representation with Label Knowledge for Span Extraction.](https://aclanthology.org/2021.emnlp-main.379.pdf) | \u6539\u8fdb `MRC` \u65b9\u6cd5\u6548\u7387\u95ee\u9898\uff0c\u91c7\u7528\u6807\u7b7e\u878d\u5408\u673a\u5236 |\n| [w2ner](litner/nn/ner/w2ner.py) | [Unified Named Entity Recognition as Word-Word Relation Classification.](https://arxiv.org/pdf/2112.10070.pdf) | \u7edf\u4e00\u89e3\u51b3\u5d4c\u5957\u5b9e\u4f53\u3001\u4e0d\u8fde\u7eed\u5b9e\u4f53\u7684\u62bd\u53d6\u95ee\u9898 |\n| [cnn](litner/nn/ner/cnn.py) | [An Embarrassingly Easy but Strong Baseline for Nested Named Entity Recognition.](https://arxiv.org/abs/2208.04534) | \u6539\u8fdb `W2NER` \u65b9\u6cd5\uff0c\u91c7\u7528\u5377\u79ef\u7f51\u7edc\u63d0\u53d6\u5b9e\u4f53\u5185\u90e8token\u4e4b\u95f4\u7684\u5173\u7cfb |\n\n\n## \ud83d\udcda \u6570\u636e\n\n\u5c06\u6570\u636e\u96c6\u5904\u7406\u6210\u4ee5\u4e0b `json` \u683c\u5f0f\n\n```json\n{\n \"text\": \"\u7ed3\u679c\u4e0a\u5468\u516d\u4ed6\u4eec\u4e3b\u573a0\uff1a3\u60e8\u8d25\u7ed9\u4e86\u4e2d\u6e38\u7403\u961f\u74e6\u62c9\u591a\u5229\u5fb7\uff0c\u8fd17\u4e2a\u591a\u6708\u4ee5\u6765\u897f\u7532\u9996\u6b21\u8f93\u7403\u3002\", \n \"entities\": [\n {\n \"id\": 0, \n \"entity\": \"\u74e6\u62c9\u591a\u5229\u5fb7\", \n \"start_offset\": 20, \n \"end_offset\": 25, \n \"label\": \"organization\"\n }, \n {\n \"id\": 1, \n \"entity\": \"\u897f\u7532\", \n \"start_offset\": 33, \n \"end_offset\": 35, \n \"label\": \"organization\"\n }\n ]\n}\n```\n\n\u5b57\u6bb5\u542b\u4e49\uff1a\n\n+ `text`: \u6587\u672c\u5185\u5bb9\n\n\n+ `entities`: \u8be5\u6587\u672c\u6240\u5305\u542b\u7684\u6240\u6709\u5b9e\u4f53\n\n + `id`: \u5b9e\u4f53 `id`\n\n + `entity`: \u5b9e\u4f53\u540d\u79f0\n \n + `start_offset`: \u5b9e\u4f53\u5f00\u59cb\u4f4d\u7f6e\n\n + `end_offset`: \u5b9e\u4f53\u7ed3\u675f\u4f4d\u7f6e\u7684\u4e0b\u4e00\u4f4d\n\n + `label`: \u5b9e\u4f53\u7c7b\u578b\n\n\n## \ud83d\ude80 \u6a21\u578b\u8bad\u7ec3\n\n```python\nimport os\nimport sys\n\nfrom transformers import HfArgumentParser\n\nfrom litner.arguments import (\n DataTrainingArguments,\n ModelArguments,\n TrainingArguments,\n)\nfrom litner.models import AutoNerModel\n\nos.environ['TRANSFORMERS_NO_ADVISORY_WARNINGS'] = 'true'\n\n\nparser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))\nif len(sys.argv) == 2 and sys.argv[1].endswith(\".json\"):\n model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))\nelse:\n model_args, data_args, training_args = parser.parse_args_into_dataclasses()\n\n# 1. create model\nmodel = AutoNerModel.create(model_args=model_args, training_args=training_args)\n\n# 2. finetune model\nmodel.finetune(data_args)\n```\n\n\u8bad\u7ec3\u811a\u672c\u8be6\u89c1 [scripts](./scripts)\n\n\n## \ud83d\udcca \u6a21\u578b\u63a8\u7406\n\n\n```python\nfrom litner.pipelines import NerPipeline\n\ntask_model = \"crf\"\nmodel_name_or_path = \"path of crf model\"\npipeline = NerPipeline(task_model, model_name_or_path=model_name_or_path)\n\nprint(pipeline(\"\u7ed3\u679c\u4e0a\u5468\u516d\u4ed6\u4eec\u4e3b\u573a0\uff1a3\u60e8\u8d25\u7ed9\u4e86\u4e2d\u6e38\u7403\u961f\u74e6\u62c9\u591a\u5229\u5fb7\uff0c\u8fd17\u4e2a\u591a\u6708\u4ee5\u6765\u897f\u7532\u9996\u6b21\u8f93\u7403\u3002\"))\n```\n \n\n## \ud83d\udcdc License\n\n\u6b64\u9879\u76ee\u4e3a `Apache 2.0` \u8bb8\u53ef\u8bc1\u6388\u6743\uff0c\u6709\u5173\u8be6\u7ec6\u4fe1\u606f\uff0c\u8bf7\u53c2\u9605 [LICENSE](LICENSE) \u6587\u4ef6\u3002\n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "Pytorch-lightning Code Blocks for NER",
"version": "0.0.1",
"project_urls": {
"Homepage": "https://github.com/xusenlinzy/lit-ner"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a9b695e6f5d4f8ce4097296362e91898e620b854b3fe7f886530727432e392ab",
"md5": "7b55848ed93fd12d1320ba2262493caf",
"sha256": "de13dced98fe164e2628f2cde50a165d07b5d09ece65d81e13108a940a91bfc7"
},
"downloads": -1,
"filename": "litner-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "7b55848ed93fd12d1320ba2262493caf",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.7",
"size": 132780,
"upload_time": "2023-06-12T10:00:15",
"upload_time_iso_8601": "2023-06-12T10:00:15.331498Z",
"url": "https://files.pythonhosted.org/packages/a9/b6/95e6f5d4f8ce4097296362e91898e620b854b3fe7f886530727432e392ab/litner-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-12 10:00:15",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "xusenlinzy",
"github_project": "lit-ner",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "litner"
}