litner


Namelitner JSON
Version 0.0.1 PyPI version JSON
download
home_pagehttps://github.com/xusenlinzy/lit-ner
SummaryPytorch-lightning Code Blocks for NER
upload_time2023-06-12 10:00:15
maintainer
docs_urlNone
authorxusenlin
requires_python>=3.7
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Lit-NER

<p align="center">
    <a href="https://github.com/xusenlinzy/lit-ner"><img src="https://img.shields.io/github/license/xusenlinzy/lit-ner"></a>
    <a href=""><img src="https://img.shields.io/badge/python-3.8+-aff.svg"></a>
    <a href=""><img src="https://img.shields.io/badge/pytorch-%3E=1.12-red?logo=pytorch"></a>
    <a href="https://github.com/xusenlinzy/lit-ner"><img src="https://img.shields.io/github/last-commit/xusenlinzy/lit-ner"></a>
    <a href="https://github.com/xusenlinzy/lit-ner"><img src="https://img.shields.io/github/issues/xusenlinzy/lit-ner?color=9cc"></a>
    <a href="https://github.com/xusenlinzy/lit-ner"><img src="https://img.shields.io/github/stars/xusenlinzy/lit-ner?color=ccf"></a>
    <a href="https://github.com/xusenlinzy/lit-ner"><img src="https://img.shields.io/badge/langurage-py-brightgreen?style=flat&color=blue"></a>
</p>

此项目为开源**命名实体识别**模型的训练和推理提供统一的框架,具有以下特性


+ ✨ 支持多种开源实体抽取模型


+ 🚀 统一的训练和推理框架


## 📢 News 

+ 【2023.6.12】 提交初版代码


---

## 🔨 安装

1. `pytorch`

```bash
conda create -n pytorch python=3.8
conda activate pytorch
conda install pytorch cudatoolkit -c pytorch
```

2. 安装 `litner`

```bash
pip install litner
```


## 🐼 模型

支持多种开源实体抽取模型

| 模型                                                | 论文                                                                                                                                                                            | 备注                                                                                                                                            |
|---------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
| [softmax](litner/nn/ner/crf.py)                   |                                                                                                                                                                               | 全连接层序列标注并使用 `BIO` 解码                                                                                                                          |
| [crf](litner/nn/ner/crf.py)                       | [Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data](https://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers) | 全连接层+条件随机场,并使用 `BIO` 解码                                                                                                                       |
| [cascade-crf](litner/nn/ner/crf.py)               |                                                                                                                                                                               | 先预测实体再预测实体类型                                                                                                                                  |
| [span](litner/nn/ner/span.py)                     |                                                                                                                                                                               | 使用两个指针网络预测实体起始位置                                                                                                                              |
| [global-pointer](litner/nn/ner/global_pointer.py) |                                                                                                                                                                               | [GlobalPointer:用统一的方式处理嵌套和非嵌套NER](https://spaces.ac.cn/archives/8373)、[Efficient GlobalPointer:少点参数,多点效果](https://spaces.ac.cn/archives/8877) |
| [mrc](litner/nn/ner/mrc.py)                       | [A Unified MRC Framework for Named Entity Recognition.](https://aclanthology.org/2020.acl-main.519.pdf)                                                                       | 将实体识别任务转换为阅读理解问题,输入为实体类型模板+句子,预测对应实体的起始位置                                                                                                     |
| [tplinker](litner/nn/ner/tplinker.py)             | [TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking.](https://aclanthology.org/2020.coling-main.138.pdf)                            | 将实体识别任务转换为表格填充问题                                                                                                                              |
| [lear](litner/nn/ner/lear.py)                     | [Enhanced Language Representation with Label Knowledge for Span Extraction.](https://aclanthology.org/2021.emnlp-main.379.pdf)                                                | 改进 `MRC` 方法效率问题,采用标签融合机制                                                                                                                      |
| [w2ner](litner/nn/ner/w2ner.py)                   | [Unified Named Entity Recognition as Word-Word Relation Classification.](https://arxiv.org/pdf/2112.10070.pdf)                                                                | 统一解决嵌套实体、不连续实体的抽取问题                                                                                                                           |
| [cnn](litner/nn/ner/cnn.py)                       | [An Embarrassingly Easy but Strong Baseline for Nested Named Entity Recognition.](https://arxiv.org/abs/2208.04534)                                                           | 改进 `W2NER` 方法,采用卷积网络提取实体内部token之间的关系                                                                                                          |


## 📚 数据

将数据集处理成以下 `json` 格式

```json
{
  "text": "结果上周六他们主场0:3惨败给了中游球队瓦拉多利德,近7个多月以来西甲首次输球。", 
  "entities": [
    {
      "id": 0, 
      "entity": "瓦拉多利德", 
      "start_offset": 20, 
      "end_offset": 25, 
      "label": "organization"
    }, 
    {
      "id": 1, 
      "entity": "西甲", 
      "start_offset": 33, 
      "end_offset": 35, 
      "label": "organization"
    }
  ]
}
```

字段含义:

+ `text`: 文本内容


+ `entities`: 该文本所包含的所有实体

    + `id`: 实体 `id`

    + `entity`: 实体名称
  
    + `start_offset`: 实体开始位置

    + `end_offset`: 实体结束位置的下一位

    + `label`: 实体类型


## 🚀 模型训练

```python
import os
import sys

from transformers import HfArgumentParser

from litner.arguments import (
    DataTrainingArguments,
    ModelArguments,
    TrainingArguments,
)
from litner.models import AutoNerModel

os.environ['TRANSFORMERS_NO_ADVISORY_WARNINGS'] = 'true'


parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
    model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
else:
    model_args, data_args, training_args = parser.parse_args_into_dataclasses()

# 1. create model
model = AutoNerModel.create(model_args=model_args, training_args=training_args)

# 2. finetune model
model.finetune(data_args)
```

训练脚本详见 [scripts](./scripts)


## 📊 模型推理


```python
from litner.pipelines import NerPipeline

task_model = "crf"
model_name_or_path = "path of crf model"
pipeline = NerPipeline(task_model, model_name_or_path=model_name_or_path)

print(pipeline("结果上周六他们主场0:3惨败给了中游球队瓦拉多利德,近7个多月以来西甲首次输球。"))
```
  

## 📜 License

此项目为 `Apache 2.0` 许可证授权,有关详细信息,请参阅 [LICENSE](LICENSE) 文件。



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/xusenlinzy/lit-ner",
    "name": "litner",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "xusenlin",
    "author_email": "1659821119@qq.com",
    "download_url": "",
    "platform": null,
    "description": "# Lit-NER\n\n<p align=\"center\">\n    <a href=\"https://github.com/xusenlinzy/lit-ner\"><img src=\"https://img.shields.io/github/license/xusenlinzy/lit-ner\"></a>\n    <a href=\"\"><img src=\"https://img.shields.io/badge/python-3.8+-aff.svg\"></a>\n    <a href=\"\"><img src=\"https://img.shields.io/badge/pytorch-%3E=1.12-red?logo=pytorch\"></a>\n    <a href=\"https://github.com/xusenlinzy/lit-ner\"><img src=\"https://img.shields.io/github/last-commit/xusenlinzy/lit-ner\"></a>\n    <a href=\"https://github.com/xusenlinzy/lit-ner\"><img src=\"https://img.shields.io/github/issues/xusenlinzy/lit-ner?color=9cc\"></a>\n    <a href=\"https://github.com/xusenlinzy/lit-ner\"><img src=\"https://img.shields.io/github/stars/xusenlinzy/lit-ner?color=ccf\"></a>\n    <a href=\"https://github.com/xusenlinzy/lit-ner\"><img src=\"https://img.shields.io/badge/langurage-py-brightgreen?style=flat&color=blue\"></a>\n</p>\n\n\u6b64\u9879\u76ee\u4e3a\u5f00\u6e90**\u547d\u540d\u5b9e\u4f53\u8bc6\u522b**\u6a21\u578b\u7684\u8bad\u7ec3\u548c\u63a8\u7406\u63d0\u4f9b\u7edf\u4e00\u7684\u6846\u67b6\uff0c\u5177\u6709\u4ee5\u4e0b\u7279\u6027\n\n\n+ \u2728 \u652f\u6301\u591a\u79cd\u5f00\u6e90\u5b9e\u4f53\u62bd\u53d6\u6a21\u578b\n\n\n+ \ud83d\ude80 \u7edf\u4e00\u7684\u8bad\u7ec3\u548c\u63a8\u7406\u6846\u67b6\n\n\n## \ud83d\udce2 News \n\n+ \u30102023.6.12\u3011 \u63d0\u4ea4\u521d\u7248\u4ee3\u7801\n\n\n---\n\n## \ud83d\udd28 \u5b89\u88c5\n\n1. `pytorch`\n\n```bash\nconda create -n pytorch python=3.8\nconda activate pytorch\nconda install pytorch cudatoolkit -c pytorch\n```\n\n2. \u5b89\u88c5 `litner`\n\n```bash\npip install litner\n```\n\n\n## \ud83d\udc3c \u6a21\u578b\n\n\u652f\u6301\u591a\u79cd\u5f00\u6e90\u5b9e\u4f53\u62bd\u53d6\u6a21\u578b\n\n| \u6a21\u578b                                                | \u8bba\u6587                                                                                                                                                                            | \u5907\u6ce8                                                                                                                                            |\n|---------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|\n| [softmax](litner/nn/ner/crf.py)                   |                                                                                                                                                                               | \u5168\u8fde\u63a5\u5c42\u5e8f\u5217\u6807\u6ce8\u5e76\u4f7f\u7528 `BIO` \u89e3\u7801                                                                                                                          |\n| [crf](litner/nn/ner/crf.py)                       | [Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data](https://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers) | \u5168\u8fde\u63a5\u5c42+\u6761\u4ef6\u968f\u673a\u573a\uff0c\u5e76\u4f7f\u7528 `BIO` \u89e3\u7801                                                                                                                       |\n| [cascade-crf](litner/nn/ner/crf.py)               |                                                                                                                                                                               | \u5148\u9884\u6d4b\u5b9e\u4f53\u518d\u9884\u6d4b\u5b9e\u4f53\u7c7b\u578b                                                                                                                                  |\n| [span](litner/nn/ner/span.py)                     |                                                                                                                                                                               | \u4f7f\u7528\u4e24\u4e2a\u6307\u9488\u7f51\u7edc\u9884\u6d4b\u5b9e\u4f53\u8d77\u59cb\u4f4d\u7f6e                                                                                                                              |\n| [global-pointer](litner/nn/ner/global_pointer.py) |                                                                                                                                                                               | [GlobalPointer\uff1a\u7528\u7edf\u4e00\u7684\u65b9\u5f0f\u5904\u7406\u5d4c\u5957\u548c\u975e\u5d4c\u5957NER](https://spaces.ac.cn/archives/8373)\u3001[Efficient GlobalPointer\uff1a\u5c11\u70b9\u53c2\u6570\uff0c\u591a\u70b9\u6548\u679c](https://spaces.ac.cn/archives/8877) |\n| [mrc](litner/nn/ner/mrc.py)                       | [A Unified MRC Framework for Named Entity Recognition.](https://aclanthology.org/2020.acl-main.519.pdf)                                                                       | \u5c06\u5b9e\u4f53\u8bc6\u522b\u4efb\u52a1\u8f6c\u6362\u4e3a\u9605\u8bfb\u7406\u89e3\u95ee\u9898\uff0c\u8f93\u5165\u4e3a\u5b9e\u4f53\u7c7b\u578b\u6a21\u677f+\u53e5\u5b50\uff0c\u9884\u6d4b\u5bf9\u5e94\u5b9e\u4f53\u7684\u8d77\u59cb\u4f4d\u7f6e                                                                                                     |\n| [tplinker](litner/nn/ner/tplinker.py)             | [TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking.](https://aclanthology.org/2020.coling-main.138.pdf)                            | \u5c06\u5b9e\u4f53\u8bc6\u522b\u4efb\u52a1\u8f6c\u6362\u4e3a\u8868\u683c\u586b\u5145\u95ee\u9898                                                                                                                              |\n| [lear](litner/nn/ner/lear.py)                     | [Enhanced Language Representation with Label Knowledge for Span Extraction.](https://aclanthology.org/2021.emnlp-main.379.pdf)                                                | \u6539\u8fdb `MRC` \u65b9\u6cd5\u6548\u7387\u95ee\u9898\uff0c\u91c7\u7528\u6807\u7b7e\u878d\u5408\u673a\u5236                                                                                                                      |\n| [w2ner](litner/nn/ner/w2ner.py)                   | [Unified Named Entity Recognition as Word-Word Relation Classification.](https://arxiv.org/pdf/2112.10070.pdf)                                                                | \u7edf\u4e00\u89e3\u51b3\u5d4c\u5957\u5b9e\u4f53\u3001\u4e0d\u8fde\u7eed\u5b9e\u4f53\u7684\u62bd\u53d6\u95ee\u9898                                                                                                                           |\n| [cnn](litner/nn/ner/cnn.py)                       | [An Embarrassingly Easy but Strong Baseline for Nested Named Entity Recognition.](https://arxiv.org/abs/2208.04534)                                                           | \u6539\u8fdb `W2NER` \u65b9\u6cd5\uff0c\u91c7\u7528\u5377\u79ef\u7f51\u7edc\u63d0\u53d6\u5b9e\u4f53\u5185\u90e8token\u4e4b\u95f4\u7684\u5173\u7cfb                                                                                                          |\n\n\n## \ud83d\udcda \u6570\u636e\n\n\u5c06\u6570\u636e\u96c6\u5904\u7406\u6210\u4ee5\u4e0b `json` \u683c\u5f0f\n\n```json\n{\n  \"text\": \"\u7ed3\u679c\u4e0a\u5468\u516d\u4ed6\u4eec\u4e3b\u573a0\uff1a3\u60e8\u8d25\u7ed9\u4e86\u4e2d\u6e38\u7403\u961f\u74e6\u62c9\u591a\u5229\u5fb7\uff0c\u8fd17\u4e2a\u591a\u6708\u4ee5\u6765\u897f\u7532\u9996\u6b21\u8f93\u7403\u3002\", \n  \"entities\": [\n    {\n      \"id\": 0, \n      \"entity\": \"\u74e6\u62c9\u591a\u5229\u5fb7\", \n      \"start_offset\": 20, \n      \"end_offset\": 25, \n      \"label\": \"organization\"\n    }, \n    {\n      \"id\": 1, \n      \"entity\": \"\u897f\u7532\", \n      \"start_offset\": 33, \n      \"end_offset\": 35, \n      \"label\": \"organization\"\n    }\n  ]\n}\n```\n\n\u5b57\u6bb5\u542b\u4e49\uff1a\n\n+ `text`: \u6587\u672c\u5185\u5bb9\n\n\n+ `entities`: \u8be5\u6587\u672c\u6240\u5305\u542b\u7684\u6240\u6709\u5b9e\u4f53\n\n    + `id`: \u5b9e\u4f53 `id`\n\n    + `entity`: \u5b9e\u4f53\u540d\u79f0\n  \n    + `start_offset`: \u5b9e\u4f53\u5f00\u59cb\u4f4d\u7f6e\n\n    + `end_offset`: \u5b9e\u4f53\u7ed3\u675f\u4f4d\u7f6e\u7684\u4e0b\u4e00\u4f4d\n\n    + `label`: \u5b9e\u4f53\u7c7b\u578b\n\n\n## \ud83d\ude80 \u6a21\u578b\u8bad\u7ec3\n\n```python\nimport os\nimport sys\n\nfrom transformers import HfArgumentParser\n\nfrom litner.arguments import (\n    DataTrainingArguments,\n    ModelArguments,\n    TrainingArguments,\n)\nfrom litner.models import AutoNerModel\n\nos.environ['TRANSFORMERS_NO_ADVISORY_WARNINGS'] = 'true'\n\n\nparser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))\nif len(sys.argv) == 2 and sys.argv[1].endswith(\".json\"):\n    model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))\nelse:\n    model_args, data_args, training_args = parser.parse_args_into_dataclasses()\n\n# 1. create model\nmodel = AutoNerModel.create(model_args=model_args, training_args=training_args)\n\n# 2. finetune model\nmodel.finetune(data_args)\n```\n\n\u8bad\u7ec3\u811a\u672c\u8be6\u89c1 [scripts](./scripts)\n\n\n## \ud83d\udcca \u6a21\u578b\u63a8\u7406\n\n\n```python\nfrom litner.pipelines import NerPipeline\n\ntask_model = \"crf\"\nmodel_name_or_path = \"path of crf model\"\npipeline = NerPipeline(task_model, model_name_or_path=model_name_or_path)\n\nprint(pipeline(\"\u7ed3\u679c\u4e0a\u5468\u516d\u4ed6\u4eec\u4e3b\u573a0\uff1a3\u60e8\u8d25\u7ed9\u4e86\u4e2d\u6e38\u7403\u961f\u74e6\u62c9\u591a\u5229\u5fb7\uff0c\u8fd17\u4e2a\u591a\u6708\u4ee5\u6765\u897f\u7532\u9996\u6b21\u8f93\u7403\u3002\"))\n```\n  \n\n## \ud83d\udcdc License\n\n\u6b64\u9879\u76ee\u4e3a `Apache 2.0` \u8bb8\u53ef\u8bc1\u6388\u6743\uff0c\u6709\u5173\u8be6\u7ec6\u4fe1\u606f\uff0c\u8bf7\u53c2\u9605 [LICENSE](LICENSE) \u6587\u4ef6\u3002\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Pytorch-lightning Code Blocks for NER",
    "version": "0.0.1",
    "project_urls": {
        "Homepage": "https://github.com/xusenlinzy/lit-ner"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a9b695e6f5d4f8ce4097296362e91898e620b854b3fe7f886530727432e392ab",
                "md5": "7b55848ed93fd12d1320ba2262493caf",
                "sha256": "de13dced98fe164e2628f2cde50a165d07b5d09ece65d81e13108a940a91bfc7"
            },
            "downloads": -1,
            "filename": "litner-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7b55848ed93fd12d1320ba2262493caf",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 132780,
            "upload_time": "2023-06-12T10:00:15",
            "upload_time_iso_8601": "2023-06-12T10:00:15.331498Z",
            "url": "https://files.pythonhosted.org/packages/a9/b6/95e6f5d4f8ce4097296362e91898e620b854b3fe7f886530727432e392ab/litner-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-12 10:00:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "xusenlinzy",
    "github_project": "lit-ner",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "litner"
}
        
Elapsed time: 0.10251s