# envText
[English](README-en.md)
**首款**中文环境领域文本分析工具。
特性:
1. :one:支持中文环境领域大规模预训练模型**envBert**!
2. :two:支持中文环境领域大规模预训练**词向量**!
3. :three:支持中文环境领域专家过滤的**词表**!
4. :four: **一且设计均为领域专家研究服务**:
- 为神经网络模型精简了接口,只保留了必要的batch_size, learning_rate等参数
- 进一步优化huggingface transformers输入输出接口,支持20余种数据集格式
- 一键使用模型,让领域专家精力集中在分析问题上
5. :five: 使用transformers接口,支持轻松自定义模型
如果您觉得本项目有用或是有帮助到您,麻烦您点击一下右上角的star :star:。您的支持是我们维护项目的最大动力:metal:!
# 快速开始
## 1. 安装
```bash
pip install envtext
```
## 2. 推理 (without training)
支持的预训练模型
```python
from envtext import Config
print(Config.pretrained_models)
```
| Task | backbone | model name | number of labels | description |
| ---- | ---- | ---- | ---- | ---- |
| 掩码语言模型 | env-bert | celtics1863/env-bert-chinese| --- | [link](https://huggingface.co/celtics1863/env-bert-chinese) |
| 新闻分类 | env-bert | celtics1863/env-news-cls-bert | 8 类别 | [link](https://huggingface.co/celtics1863/env-news-cls-bert)|
| 论文分类 | env-bert | celtics1863/env-news-cls-bert | 10类别 | [link](https://huggingface.co/celtics1863/env-news-cls-bert) |
| 政策分类 | env-bert | celtics1863/env-news-cls-bert | 15 类别 | [link](https://huggingface.co/celtics1863/env-news-cls-bert) |
| 话题分类 | env-bert | celtics1863/env-topic | 63 类别 | [link](https://huggingface.co/celtics1863/env-topic) |
| 词性/实体/术语识别 | env-bert | celtics1863/pos-bert | 41 类别 | [link](https://huggingface.co/celtics1863/pos-bert) |
| 掩码语言模型 | env-albert | celtics1863/env-albert-chinese| --- | [link](https://huggingface.co/celtics1863/env-albert-chinese) |
| 新闻分类 | env-albert | celtics1863/env-news-cls-albert | 8 类别 | [link](https://huggingface.co/celtics1863/env-news-cls-albert) |
| 论文分类 | env-albert | celtics1863/env-paper-cls-albert | 10 类别 | [link](https://huggingface.co/celtics1863/env-paper-cls-albert) |
| 政策分类 | env-albert | celtics1863/env-policy-cls-albert | 15 类别 | [link](https://huggingface.co/celtics1863/env-policy-cls-albert) |
| 话题分类 | env-albert | celtics1863/env-topic | 63 类别 | [link](https://huggingface.co/celtics1863/env-topic-albert) |
| 词性/实体/术语识别 | env-albert | celtics1863/pos-ner-albert | 41 类别 | [link](https://huggingface.co/celtics1863/pos-ner-albert) |
| 词向量 | word2vec | word2vec | ---- | [link](https://links.jianshu.com/go?to=https%3A%2F%2Farxiv.org%2Fabs%2F1301.3781v3) |
| 词向量 | env-bert | bert2vec | ---- | [link](https://huggingface.co/celtics1863/env-bert-chinese) |
#### 2.1 环境话题分类
```python
from envtext import AlbertCLS,Config
model = AlbertCLS(Config.albert.topic_cls)
model("在全球气候大会上,气候变化是各国政府都关心的话题")
```
<!-- ![](./fig/topic_albert.html) -->
![](./fig/topic_albert.png)
#### 2.2 环境新闻分类
```python
from envtext import AlbertCLS,Config
model = AlbertCLS(Config.albert.news_cls)
model("清洁能源基地建设对国家能源安全具有战略支撑作用。打造高质量的清洁能源基地的同时,也面临着一系列挑战,比如如何持续降低光储系统的度电成本、如何通过数字化的手段进一步提升运营与运维效率,如何更有效地提升光储系统的安全防护水平、如何在高比例新能源条件下实现稳定并网与消纳等。")
```
<!-- ![](./fig/news_albert.html) -->
![](./fig/news_albert.png)
#### 2.3 环境政策分类
```python
from envtext import AlbertCLS,Config
model = AlbertCLS(Config.albert.news_cls)
model("两个《办法》适用于行政主管部门在依法行使监督管理职责中,对建设用地和农用地土壤污染责任人不明确或者存在争议的情况下,开展的土壤污染责任人认定活动。这是当前土壤污染责任人认定工作的重点。涉及民事纠纷的责任人认定应当依据民事法律予以确定,不适用本《办法》。")
```
<!-- ![](./fig/policy_albert.html) -->
![](./fig/policy_albert.png)
#### 2.4 环境术语/实体/词性识别
```python
from envtext import AlbertNER,Config
model = AlbertNER(Config.albert.pos_ner)
model("在全球气候大会上,气候变化是各国政府都关心的话题")
```
<!-- ![](./fig/pos_albert.svg) -->
![](./fig/pos_albert.png)
#### 2.5 word2vec词向量
导入模型
```python
from envtext.models import load_word2vec
model = load_word2vec()
```
获得向量:
```python
model.get_vector('环境保护')
```
results:
```bash
array([-13.304651 , -3.1560812 , 6.4074125 , -3.6906316 ,
-1.4232658 , 4.7912726 , -0.8003967 , 4.0756955 ,
-2.7932549 , 4.029449 , -1.9410586 , -6.844793 ,
-8.859059 , -0.93295586, 6.1359916 , 1.9588425 ,
2.625194 , -4.3848248 , -6.4393744 , 6.0373173 ,
-6.155831 , -6.4436955 , 5.107795 , -11.209849 ,
0.04123919, 1.286314 , -11.320914 , -6.475419 ,
0.8528328 , -6.1932034 , 2.0541244 , -3.3850324 ,
4.284287 , -7.197888 , -2.6205683 , 0.31572345,
5.227246 , 3.903521 , -2.5171268 , 2.4655945 ,
-5.5421305 , 5.5044537 , 6.984615 , -7.6862364 ,
0.87583727, 0.03240405, 2.3616972 , -0.9396556 ,
3.9617348 , 0.6690969 , -10.708663 , -2.8534212 ,
-0.8638448 , 12.048176 , 5.5968127 , -6.834452 ,
6.9515004 , 3.948555 , -4.527055 , 4.389503 ,
-0.47533572, 6.79178 , -0.8689579 , -2.7712438 ],
dtype=float32)
```
计算相似度
```python
model.most_similar('环境保护')
```
results:
```bash
[('环保', 0.8425659537315369),
('生态环境保护', 0.7966809868812561),
('土壤环境保护', 0.7429764270782471),
('环境污染防治', 0.7383896708488464),
('生态保护', 0.6929160952568054),
('大气环境保护', 0.6914916634559631),
('应对气候变化', 0.6642681956291199),
('水污染防治', 0.6642411947250366),
('大气污染防治', 0.6606612801551819),
('环境管理', 0.6518533825874329)]
```
#### 2.6 env-bert词向量
导入模型:
```python
from envtext import Bert2Vec,Config
model = Bert2Vec(Config.bert.bert_mlm)
```
获取向量:
```python
#获得词向量
model.get_vector('环境保护')
#获得句向量,输入已经被分好词的句子
model.get_vector(["环境保护","人人有责"])
```
结果:
```
array([ 1.4521e+00, -3.4131e-01, 6.8420e-02, -6.1371e-02, 2.9004e-01,
1.8872e-01, -4.0405e-01, 4.1138e-01, -5.0000e-01, 5.2344e-01,
5.9814e-01, -3.1396e-01, 3.0029e-01, 3.2959e-02, 1.6553e+00,
-4.4800e-01, 1.0195e+00, -6.4697e-01, 3.0200e-01, 5.7080e-01,
7.6599e-02, 3.4155e-01, 1.2805e-01, -2.1863e-01, -3.3398e-01,
6.9092e-01, 4.2725e-01, -4.8364e-01, 7.8760e-01, 3.8940e-01,
4.9927e-02, -7.1106e-02, -5.3271e-01, -4.8486e-01, 3.1665e-01,
5.1367e-01, 8.8477e-01, -2.2302e-01, 1.9943e-02, 7.3047e-01,
-1.5417e-01, -1.4206e-02, -5.2881e-01, 4.0674e-01, 2.7466e-01,
-1.3940e-01, 5.2490e-01, -1.1514e+00, -4.2676e-01, 9.5508e-01,
...])
```
计算相似度
```python
model.add_words(
[
"环境污染",
"水污染",
"大气污染",
"北京市",
"上海市",
"兰州市"
])
model.most_similar("郑州市")
```
results:
```bash
[('兰州市', 0.8755860328674316),
('北京市', 0.7335232496261597),
('上海市', 0.7241109013557434),
('大气污染', 0.471857488155365),
('水污染', 0.4557272493839264)]
```
#### 2.7 完型填空
用`[MASK]`标记需要填的部分
```
from envtext import BertMLM,Config
model = BertMLM(Config.bert_mlm)
model("在全球气候大会上,[MASK][MASK][MASK][MASK]是各国政府都关心的话题")
```
results:
```bash
text:在全球气候大会上,[MASK][MASK][MASK][MASK]是各国政府都关心的话题
predict: ['气', '体', '减', '少'] ; probability: 0.5166
predict: ['气', '体', '减', '排'] ; probability: 0.5166
predict: ['气', '体', '减', '碳'] ; probability: 0.5166
predict: ['气', '体', '减', '缓'] ; probability: 0.5166
predict: ['气', '体', '减', '量'] ; probability: 0.5166
```
#### 2.8 情感分析
预测情感激烈程度
```
from envtext import BertSA,Config
model = BertMLM(Config.intensity_sa)
model("中国到现在都没有达到3000年的平均气温,现在就把近期时间气温上升跟工业革命联系起来是不是为时尚早?即便没有工业革命1743年中国北方的罕见高温,1743年7月20至25日,华北地区下午的气温均高于40℃。其中7月25日最热,气温高达44.4℃。这样的极端高温纪录,迄今从未被超越。民国三十一年(公元1942年)和公元1999年夏季,华北地区先后出现了两次极端高温纪录,分别为42.6℃、42.2℃,均低于乾隆八年的温度。又要算到什么头上呢?!!!")
```
results:
![](./fig/sa_bert.png)
#### 2.9 实体抽取
使用cluener上训练的模型
```
from envtext import BertNER,Config
model = BertNER(Config.bert.clue_ner)
model([
"生生不息CSOL生化狂潮让你填弹狂扫",
"那不勒斯vs锡耶纳以及桑普vs热那亚之上呢?",
"加勒比海盗3:世界尽头》的去年同期成绩死死甩在身后,后者则即将赶超《变形金刚》,",
"布鲁京斯研究所桑顿中国中心研究部主任李成说,东亚的和平与安全,是美国的“核心利益”之一。",
"此数据换算成亚洲盘罗马客场可让平半低水。",
],print_result=True)
```
results:
![](fig/cluener_bert.png)
## 3. 训练并推理
使用envtext,您可以标记一些示例,训练您的模型,并进一步使用模型来推断其余的文本。
目前支持的模型:
| Taskname | Bert models |Albert models | RNNs models | Others |
| ------ | ------ | ------ | ------ | ----- |
| 完型填空 | BertMLM | ------ | ------ | ----- |
| 分类 | BertCLS | AlbertCLS | RNNCLS | CNNCLS,TFIDFCLS |
| 情感分析 | BertSA | ---- | RNNSA | ------ |
| 多选 |BertMultiChoice | AlbertMultiChoice | RNNMultiChoice | ----- |
| 命名实体识别 | BertNER | AlbertNER | RNNNER | ----- |
| 嵌套命名实体识别 | BertGP | ----- | ----- | ----- |
| 关系分类 | BertRelation | ---- | ---- | ----- |
| 实体关系联合抽取 | BertTriple | ---- | ---- | ----- |
| 词向量 | Bert2vec | ----- |----- | Word2Vec |
除了文本生成任务外,基本支持大部分的NLP任务。
Bert and Albert支持环境文本中的大规模预训练模型' envBert '和' envalbert ',以及huggingface transformer中的其他Bert模型。
RNN模型由“LSTM”、“GRU”和“RNN”组成,可以用环境域预训练的词向量进行初始化,也可以用Onehot编码进行初始化。
### 3.1 训练
##### 3.1 Bert/albert 模型训练
```python
#导入bert模型(eg. 分类模型)
from envtext.models import BertCLS
model = BertCLS('celtics1863/env-bert-chinese')
# # 如果使用自定义的数据集
# model.load_dataset(file_path,task = 'cls',format = 'datasets-format')
# # 使用envtext中默认的数据集
model.load_dataset('isclimate')
#模型训练
model.train()
#模型保存
model.save_model('classification') #input directory
```
##### 3.2 RNN training
```python
#导入 rnn model(eg. 分类模型)
from envtext.models import RNNCLS
model = RNNCLS()
# # 使用自定义的模型
# model.load_dataset(file_path,task = 'cls',format = 'datasets-format')
# # 使用EnvText自带的数据集
model.load_dataset('isclimate')
#模型训练
model.train()
#保存模型
model.save_model('classification') #输入待保存的文件夹
```
### 3.2 使用自定义的模型推理
#### 3.2.1 使用自定义bert模型推理
```python
#从文件夹导入莫小仙
from envtext.models import BertMLM
model = BertMLM('celtics1863/env-bert-chinese')
#预测结果,可以输入 str 或 List[str]
model('[MASK][MASK][MASK][MASK]是各国政府都关心的话题')
#导出结果
model.save_result('result.csv')
```
#### 3.2.2 使用RNN模型推理
从 含有`pytorch_model.bin`的`文件` 推理
```python
from envtext.models import RNNCLS
model = RNNCLS('local directory')
#predict
model('气候变化是各国政府都关心的话题')
#save result
model.save_result('result.csv')
```
### 4. 自定义模型
##### 4.1 自定义bert模型
从bert模型定义一个回归器
```python
from envtext.models.bert_base import BertBase
import torch
from transformers import BertPreTrainedModel,BertModel
class MyBert(BertPreTrainedModel):
def __init__(self, config):
super(MyBert, self).__init__(config)
self.bert = BertModel(config) #bert model
self.regressor = torch.nn.Linear(config.hidden_size, 1) #regressor
self.loss = torch.nn.MSELoss() #loss function
def forward(self, input_ids, token_type_ids=None, attention_mask=None, labels=None,
position_ids=None, inputs_embeds=None, head_mask=None):
outputs = self.bert(input_ids,
attention_mask=attention_mask,
token_type_ids=token_type_ids,
position_ids=position_ids,
head_mask=head_mask,
inputs_embeds=inputs_embeds)
#use[CLS] token
cls_output = outputs[0][:,0,:]
#get logits
logits = self.regressor(cls_output)
outputs = (logits,)
#这里需要与bert的接口保持一致
if labels is not None:
loss = self.loss(logits.squeeze(),labels)
outputs = (loss,) + outputs
return outputs
```
对齐EnvText的接口:
```python
class MyBertModel(BertBase):
#Rewrite the initialization function
def initialize_bert(self,path = None,config = None,**kwargs):
super().initialize_bert(path,config,**kwargs)
self.model = MyBert.from_pretrained(self.model_path)
#[Optional] 重写预处理函数
def preprocess(self,text, logits, **kwargs):
text = text.replace("\n", "")
return text
#[Optional] 重写后处理函数
def postprocess(self,text, logits, **kwargs):
logits = logits.squeeze()
return logits.tolist()
#[Optional] 在训练时会调用,计算除loss以外的metric
def compute_metrics(eval_pred)
from envtext.utils.metrics import metrics_for_reg
return metrics_for_reg(eval_pred)
#[Optional] Optional to align parameters in config,
#对齐参数配置
def align_config(self):
super().align_config()
##可以使用self.update_config() 或 self.set_attribute() 接口重新设置config
pass
```
##### 4.2 self-defined RNN model
RNN模型的定义与此类似。
首先,实现LSTM分类模型,具体如下:
```python
from torch import nn
import torch
class MyRNN(nn.Module):
def __init__(self,config):
self.rnn = nn.LSTM(config.embed_size, config.hidden_size ,config.num_layers,batch_first = True)
self.classifier = nn.Linear(config.hidden_size,config.num_labels)
def forward(self,X,labels = None):
X,_ = self.rnn(X)
logits = self.classifier(X)
#Align interfaces, still need to output with labels present (loss,logits) and without labels (logits,)
if labels is not None:
loss = self.loss_fn(logits,labels)
return (loss,logits)
return (logits,)
```
对齐EnvText的接口
```python
import numpy as np
class MyRNNModel(BertBase):
#Rewrite the initialization function
def initialize_bert(self,path = None,config = None,**kwargs):
super().initialize_bert(path,config,**kwargs) #保持不变
self.model = MyRNN.from_pretrained(self.model_path)
#[Optional] rewrite the function that postprocesses the prediction result
def postprocess(self,text, logits, print_result = True ,save_result = True):
pred = np.argmax(logits,axis = -1)
return pred.tolist()
#[Optional] rewrite metrics,add metric besides loss, for training
def compute_metrics(eval_pred):
return {} #返回一个dict
#[Optional] rewrite align_config
#Because there are times when you need to accept multiple inputs, such as the number of categories or a list of categories when classifying tasks, you can use this interface for alignment.
def align_config(self):
super().align_config()
```
对于更详细的教程,案例将添加在[jupyter notebooks]('notebooks')
### 5. 使用建议
1. Bert模型比较大,如果只有cpu的情况下,建议先用RNN模型,跑出一个结果,观察数据集的数量/质量是否达标,再考虑是否用Bert模型。一般envbert模型要比RNN模型领先10个点左右,尤其在数据集越小的情况下,envbert的优势越明显。
2. 神经网络模型受到初始化权重影响,每一次训练的情况不一样,建议多跑几次,取最好的结果。
3. Learning rate, Epoch, Batchsize是三个最关键的超参数,需要对不同数据集小心调整。默认的参数可以在大多数情况下达到较优的值,但是一定不会达到最好的结果。
# LISENCE
Apache Lisence
Raw data
{
"_id": null,
"home_page": "https://github.com/celtics1863/envtext",
"name": "envtext",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "NLP,bert,Chinese,LSTM,RNN,domain text analysis",
"author": "Bi Huaibin",
"author_email": "bi.huaibin@foxmail.com",
"download_url": "",
"platform": null,
"description": "# envText\n\n[English](README-en.md)\n\n\n**\u9996\u6b3e**\u4e2d\u6587\u73af\u5883\u9886\u57df\u6587\u672c\u5206\u6790\u5de5\u5177\u3002\n\n\u7279\u6027\uff1a \n1. :one:\u652f\u6301\u4e2d\u6587\u73af\u5883\u9886\u57df\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u6a21\u578b**envBert**\uff01\n\n2. :two:\u652f\u6301\u4e2d\u6587\u73af\u5883\u9886\u57df\u5927\u89c4\u6a21\u9884\u8bad\u7ec3**\u8bcd\u5411\u91cf**!\n\n3. :three:\u652f\u6301\u4e2d\u6587\u73af\u5883\u9886\u57df\u4e13\u5bb6\u8fc7\u6ee4\u7684**\u8bcd\u8868**!\n\n4. :four: **\u4e00\u4e14\u8bbe\u8ba1\u5747\u4e3a\u9886\u57df\u4e13\u5bb6\u7814\u7a76\u670d\u52a1**\uff1a\n - \u4e3a\u795e\u7ecf\u7f51\u7edc\u6a21\u578b\u7cbe\u7b80\u4e86\u63a5\u53e3\uff0c\u53ea\u4fdd\u7559\u4e86\u5fc5\u8981\u7684batch_size, learning_rate\u7b49\u53c2\u6570\n - \u8fdb\u4e00\u6b65\u4f18\u5316huggingface transformers\u8f93\u5165\u8f93\u51fa\u63a5\u53e3\uff0c\u652f\u630120\u4f59\u79cd\u6570\u636e\u96c6\u683c\u5f0f\n - \u4e00\u952e\u4f7f\u7528\u6a21\u578b\uff0c\u8ba9\u9886\u57df\u4e13\u5bb6\u7cbe\u529b\u96c6\u4e2d\u5728\u5206\u6790\u95ee\u9898\u4e0a\n\n5. :five: \u4f7f\u7528transformers\u63a5\u53e3\uff0c\u652f\u6301\u8f7b\u677e\u81ea\u5b9a\u4e49\u6a21\u578b\n\n\n\u5982\u679c\u60a8\u89c9\u5f97\u672c\u9879\u76ee\u6709\u7528\u6216\u662f\u6709\u5e2e\u52a9\u5230\u60a8\uff0c\u9ebb\u70e6\u60a8\u70b9\u51fb\u4e00\u4e0b\u53f3\u4e0a\u89d2\u7684star :star:\u3002\u60a8\u7684\u652f\u6301\u662f\u6211\u4eec\u7ef4\u62a4\u9879\u76ee\u7684\u6700\u5927\u52a8\u529b:metal:\uff01\n\n\n\n# \u5feb\u901f\u5f00\u59cb\n\n## 1. \u5b89\u88c5\n\n\n```bash\npip install envtext\n```\n\n\n## 2. \u63a8\u7406 (without training)\n\n\u652f\u6301\u7684\u9884\u8bad\u7ec3\u6a21\u578b\n```python\nfrom envtext import Config\nprint(Config.pretrained_models)\n```\n\n| Task | backbone | model name | number of labels | description |\n| ---- | ---- | ---- | ---- | ---- |\n| \u63a9\u7801\u8bed\u8a00\u6a21\u578b | env-bert | celtics1863/env-bert-chinese| --- | [link](https://huggingface.co/celtics1863/env-bert-chinese) | \n| \u65b0\u95fb\u5206\u7c7b | env-bert | celtics1863/env-news-cls-bert | 8 \u7c7b\u522b | [link](https://huggingface.co/celtics1863/env-news-cls-bert)|\n| \u8bba\u6587\u5206\u7c7b | env-bert | celtics1863/env-news-cls-bert | 10\u7c7b\u522b | [link](https://huggingface.co/celtics1863/env-news-cls-bert) |\n| \u653f\u7b56\u5206\u7c7b | env-bert | celtics1863/env-news-cls-bert | 15 \u7c7b\u522b | [link](https://huggingface.co/celtics1863/env-news-cls-bert) |\n| \u8bdd\u9898\u5206\u7c7b | env-bert | celtics1863/env-topic | 63 \u7c7b\u522b | [link](https://huggingface.co/celtics1863/env-topic) |\n| \u8bcd\u6027/\u5b9e\u4f53/\u672f\u8bed\u8bc6\u522b | env-bert | celtics1863/pos-bert | 41 \u7c7b\u522b | [link](https://huggingface.co/celtics1863/pos-bert) |\n| \u63a9\u7801\u8bed\u8a00\u6a21\u578b | env-albert | celtics1863/env-albert-chinese| --- | [link](https://huggingface.co/celtics1863/env-albert-chinese) |\n| \u65b0\u95fb\u5206\u7c7b | env-albert | celtics1863/env-news-cls-albert | 8 \u7c7b\u522b | [link](https://huggingface.co/celtics1863/env-news-cls-albert) |\n| \u8bba\u6587\u5206\u7c7b | env-albert | celtics1863/env-paper-cls-albert | 10 \u7c7b\u522b | [link](https://huggingface.co/celtics1863/env-paper-cls-albert) |\n| \u653f\u7b56\u5206\u7c7b | env-albert | celtics1863/env-policy-cls-albert | 15 \u7c7b\u522b | [link](https://huggingface.co/celtics1863/env-policy-cls-albert) |\n| \u8bdd\u9898\u5206\u7c7b | env-albert | celtics1863/env-topic | 63 \u7c7b\u522b | [link](https://huggingface.co/celtics1863/env-topic-albert) |\n| \u8bcd\u6027/\u5b9e\u4f53/\u672f\u8bed\u8bc6\u522b | env-albert | celtics1863/pos-ner-albert | 41 \u7c7b\u522b | [link](https://huggingface.co/celtics1863/pos-ner-albert) |\n| \u8bcd\u5411\u91cf | word2vec | word2vec | ---- | [link](https://links.jianshu.com/go?to=https%3A%2F%2Farxiv.org%2Fabs%2F1301.3781v3) |\n| \u8bcd\u5411\u91cf | env-bert | bert2vec | ---- | [link](https://huggingface.co/celtics1863/env-bert-chinese) |\n\n\n\n#### 2.1 \u73af\u5883\u8bdd\u9898\u5206\u7c7b\n```python\nfrom envtext import AlbertCLS,Config\nmodel = AlbertCLS(Config.albert.topic_cls)\nmodel(\"\u5728\u5168\u7403\u6c14\u5019\u5927\u4f1a\u4e0a\uff0c\u6c14\u5019\u53d8\u5316\u662f\u5404\u56fd\u653f\u5e9c\u90fd\u5173\u5fc3\u7684\u8bdd\u9898\")\n```\n<!-- ![](./fig/topic_albert.html) -->\n\n![](./fig/topic_albert.png)\n\n\n#### 2.2 \u73af\u5883\u65b0\u95fb\u5206\u7c7b\n\n```python\nfrom envtext import AlbertCLS,Config\nmodel = AlbertCLS(Config.albert.news_cls)\nmodel(\"\u6e05\u6d01\u80fd\u6e90\u57fa\u5730\u5efa\u8bbe\u5bf9\u56fd\u5bb6\u80fd\u6e90\u5b89\u5168\u5177\u6709\u6218\u7565\u652f\u6491\u4f5c\u7528\u3002\u6253\u9020\u9ad8\u8d28\u91cf\u7684\u6e05\u6d01\u80fd\u6e90\u57fa\u5730\u7684\u540c\u65f6\uff0c\u4e5f\u9762\u4e34\u7740\u4e00\u7cfb\u5217\u6311\u6218\uff0c\u6bd4\u5982\u5982\u4f55\u6301\u7eed\u964d\u4f4e\u5149\u50a8\u7cfb\u7edf\u7684\u5ea6\u7535\u6210\u672c\u3001\u5982\u4f55\u901a\u8fc7\u6570\u5b57\u5316\u7684\u624b\u6bb5\u8fdb\u4e00\u6b65\u63d0\u5347\u8fd0\u8425\u4e0e\u8fd0\u7ef4\u6548\u7387\uff0c\u5982\u4f55\u66f4\u6709\u6548\u5730\u63d0\u5347\u5149\u50a8\u7cfb\u7edf\u7684\u5b89\u5168\u9632\u62a4\u6c34\u5e73\u3001\u5982\u4f55\u5728\u9ad8\u6bd4\u4f8b\u65b0\u80fd\u6e90\u6761\u4ef6\u4e0b\u5b9e\u73b0\u7a33\u5b9a\u5e76\u7f51\u4e0e\u6d88\u7eb3\u7b49\u3002\")\n```\n\n<!-- ![](./fig/news_albert.html) -->\n![](./fig/news_albert.png)\n\n#### 2.3 \u73af\u5883\u653f\u7b56\u5206\u7c7b\n\n```python\nfrom envtext import AlbertCLS,Config\nmodel = AlbertCLS(Config.albert.news_cls)\nmodel(\"\u4e24\u4e2a\u300a\u529e\u6cd5\u300b\u9002\u7528\u4e8e\u884c\u653f\u4e3b\u7ba1\u90e8\u95e8\u5728\u4f9d\u6cd5\u884c\u4f7f\u76d1\u7763\u7ba1\u7406\u804c\u8d23\u4e2d\uff0c\u5bf9\u5efa\u8bbe\u7528\u5730\u548c\u519c\u7528\u5730\u571f\u58e4\u6c61\u67d3\u8d23\u4efb\u4eba\u4e0d\u660e\u786e\u6216\u8005\u5b58\u5728\u4e89\u8bae\u7684\u60c5\u51b5\u4e0b\uff0c\u5f00\u5c55\u7684\u571f\u58e4\u6c61\u67d3\u8d23\u4efb\u4eba\u8ba4\u5b9a\u6d3b\u52a8\u3002\u8fd9\u662f\u5f53\u524d\u571f\u58e4\u6c61\u67d3\u8d23\u4efb\u4eba\u8ba4\u5b9a\u5de5\u4f5c\u7684\u91cd\u70b9\u3002\u6d89\u53ca\u6c11\u4e8b\u7ea0\u7eb7\u7684\u8d23\u4efb\u4eba\u8ba4\u5b9a\u5e94\u5f53\u4f9d\u636e\u6c11\u4e8b\u6cd5\u5f8b\u4e88\u4ee5\u786e\u5b9a\uff0c\u4e0d\u9002\u7528\u672c\u300a\u529e\u6cd5\u300b\u3002\")\n```\n<!-- ![](./fig/policy_albert.html) -->\n![](./fig/policy_albert.png)\n\n\n#### 2.4 \u73af\u5883\u672f\u8bed/\u5b9e\u4f53/\u8bcd\u6027\u8bc6\u522b\n\n```python\nfrom envtext import AlbertNER,Config\nmodel = AlbertNER(Config.albert.pos_ner)\nmodel(\"\u5728\u5168\u7403\u6c14\u5019\u5927\u4f1a\u4e0a\uff0c\u6c14\u5019\u53d8\u5316\u662f\u5404\u56fd\u653f\u5e9c\u90fd\u5173\u5fc3\u7684\u8bdd\u9898\")\n```\n<!-- ![](./fig/pos_albert.svg) -->\n![](./fig/pos_albert.png)\n\n#### 2.5 word2vec\u8bcd\u5411\u91cf\n\n\u5bfc\u5165\u6a21\u578b\n```python\nfrom envtext.models import load_word2vec\nmodel = load_word2vec()\n```\n\n\u83b7\u5f97\u5411\u91cf\uff1a\n```python\nmodel.get_vector('\u73af\u5883\u4fdd\u62a4')\n```\nresults:\n```bash\narray([-13.304651 , -3.1560812 , 6.4074125 , -3.6906316 ,\n -1.4232658 , 4.7912726 , -0.8003967 , 4.0756955 ,\n -2.7932549 , 4.029449 , -1.9410586 , -6.844793 ,\n -8.859059 , -0.93295586, 6.1359916 , 1.9588425 ,\n 2.625194 , -4.3848248 , -6.4393744 , 6.0373173 ,\n -6.155831 , -6.4436955 , 5.107795 , -11.209849 ,\n 0.04123919, 1.286314 , -11.320914 , -6.475419 ,\n 0.8528328 , -6.1932034 , 2.0541244 , -3.3850324 ,\n 4.284287 , -7.197888 , -2.6205683 , 0.31572345,\n 5.227246 , 3.903521 , -2.5171268 , 2.4655945 ,\n -5.5421305 , 5.5044537 , 6.984615 , -7.6862364 ,\n 0.87583727, 0.03240405, 2.3616972 , -0.9396556 ,\n 3.9617348 , 0.6690969 , -10.708663 , -2.8534212 ,\n -0.8638448 , 12.048176 , 5.5968127 , -6.834452 ,\n 6.9515004 , 3.948555 , -4.527055 , 4.389503 ,\n -0.47533572, 6.79178 , -0.8689579 , -2.7712438 ],\n dtype=float32)\n```\n\n\u8ba1\u7b97\u76f8\u4f3c\u5ea6\n```python\nmodel.most_similar('\u73af\u5883\u4fdd\u62a4')\n```\nresults:\n```bash\n[('\u73af\u4fdd', 0.8425659537315369),\n ('\u751f\u6001\u73af\u5883\u4fdd\u62a4', 0.7966809868812561),\n ('\u571f\u58e4\u73af\u5883\u4fdd\u62a4', 0.7429764270782471),\n ('\u73af\u5883\u6c61\u67d3\u9632\u6cbb', 0.7383896708488464),\n ('\u751f\u6001\u4fdd\u62a4', 0.6929160952568054),\n ('\u5927\u6c14\u73af\u5883\u4fdd\u62a4', 0.6914916634559631),\n ('\u5e94\u5bf9\u6c14\u5019\u53d8\u5316', 0.6642681956291199),\n ('\u6c34\u6c61\u67d3\u9632\u6cbb', 0.6642411947250366),\n ('\u5927\u6c14\u6c61\u67d3\u9632\u6cbb', 0.6606612801551819),\n ('\u73af\u5883\u7ba1\u7406', 0.6518533825874329)]\n```\n\n#### 2.6 env-bert\u8bcd\u5411\u91cf\n\n\u5bfc\u5165\u6a21\u578b\uff1a\n```python\nfrom envtext import Bert2Vec,Config\nmodel = Bert2Vec(Config.bert.bert_mlm)\n```\n\u83b7\u53d6\u5411\u91cf\uff1a\n```python\n#\u83b7\u5f97\u8bcd\u5411\u91cf\nmodel.get_vector('\u73af\u5883\u4fdd\u62a4')\n#\u83b7\u5f97\u53e5\u5411\u91cf\uff0c\u8f93\u5165\u5df2\u7ecf\u88ab\u5206\u597d\u8bcd\u7684\u53e5\u5b50\nmodel.get_vector([\"\u73af\u5883\u4fdd\u62a4\",\"\u4eba\u4eba\u6709\u8d23\"])\n```\n\u7ed3\u679c\uff1a\n```\narray([ 1.4521e+00, -3.4131e-01, 6.8420e-02, -6.1371e-02, 2.9004e-01,\n 1.8872e-01, -4.0405e-01, 4.1138e-01, -5.0000e-01, 5.2344e-01,\n 5.9814e-01, -3.1396e-01, 3.0029e-01, 3.2959e-02, 1.6553e+00,\n -4.4800e-01, 1.0195e+00, -6.4697e-01, 3.0200e-01, 5.7080e-01,\n 7.6599e-02, 3.4155e-01, 1.2805e-01, -2.1863e-01, -3.3398e-01,\n 6.9092e-01, 4.2725e-01, -4.8364e-01, 7.8760e-01, 3.8940e-01,\n 4.9927e-02, -7.1106e-02, -5.3271e-01, -4.8486e-01, 3.1665e-01,\n 5.1367e-01, 8.8477e-01, -2.2302e-01, 1.9943e-02, 7.3047e-01,\n -1.5417e-01, -1.4206e-02, -5.2881e-01, 4.0674e-01, 2.7466e-01,\n -1.3940e-01, 5.2490e-01, -1.1514e+00, -4.2676e-01, 9.5508e-01,\n ...])\n```\n\n\u8ba1\u7b97\u76f8\u4f3c\u5ea6\n```python\nmodel.add_words(\n [\n \"\u73af\u5883\u6c61\u67d3\",\n \"\u6c34\u6c61\u67d3\",\n \"\u5927\u6c14\u6c61\u67d3\",\n \"\u5317\u4eac\u5e02\",\n \"\u4e0a\u6d77\u5e02\",\n \"\u5170\u5dde\u5e02\"\n ])\nmodel.most_similar(\"\u90d1\u5dde\u5e02\")\n```\nresults:\n```bash\n[('\u5170\u5dde\u5e02', 0.8755860328674316),\n ('\u5317\u4eac\u5e02', 0.7335232496261597),\n ('\u4e0a\u6d77\u5e02', 0.7241109013557434),\n ('\u5927\u6c14\u6c61\u67d3', 0.471857488155365),\n ('\u6c34\u6c61\u67d3', 0.4557272493839264)]\n```\n\n#### 2.7 \u5b8c\u578b\u586b\u7a7a\n\n\u7528`[MASK]`\u6807\u8bb0\u9700\u8981\u586b\u7684\u90e8\u5206\n```\nfrom envtext import BertMLM,Config\nmodel = BertMLM(Config.bert_mlm)\nmodel(\"\u5728\u5168\u7403\u6c14\u5019\u5927\u4f1a\u4e0a\uff0c[MASK][MASK][MASK][MASK]\u662f\u5404\u56fd\u653f\u5e9c\u90fd\u5173\u5fc3\u7684\u8bdd\u9898\")\n```\nresults\uff1a\n```bash\ntext:\u5728\u5168\u7403\u6c14\u5019\u5927\u4f1a\u4e0a\uff0c[MASK][MASK][MASK][MASK]\u662f\u5404\u56fd\u653f\u5e9c\u90fd\u5173\u5fc3\u7684\u8bdd\u9898 \n predict: ['\u6c14', '\u4f53', '\u51cf', '\u5c11'] ; probability: 0.5166 \n predict: ['\u6c14', '\u4f53', '\u51cf', '\u6392'] ; probability: 0.5166 \n predict: ['\u6c14', '\u4f53', '\u51cf', '\u78b3'] ; probability: 0.5166 \n predict: ['\u6c14', '\u4f53', '\u51cf', '\u7f13'] ; probability: 0.5166 \n predict: ['\u6c14', '\u4f53', '\u51cf', '\u91cf'] ; probability: 0.5166 \n```\n\n\n#### 2.8 \u60c5\u611f\u5206\u6790\n\n\u9884\u6d4b\u60c5\u611f\u6fc0\u70c8\u7a0b\u5ea6\n```\nfrom envtext import BertSA,Config\nmodel = BertMLM(Config.intensity_sa)\nmodel(\"\u4e2d\u56fd\u5230\u73b0\u5728\u90fd\u6ca1\u6709\u8fbe\u52303000\u5e74\u7684\u5e73\u5747\u6c14\u6e29\uff0c\u73b0\u5728\u5c31\u628a\u8fd1\u671f\u65f6\u95f4\u6c14\u6e29\u4e0a\u5347\u8ddf\u5de5\u4e1a\u9769\u547d\u8054\u7cfb\u8d77\u6765\u662f\u4e0d\u662f\u4e3a\u65f6\u5c1a\u65e9\uff1f\u5373\u4fbf\u6ca1\u6709\u5de5\u4e1a\u9769\u547d1743\u5e74\u4e2d\u56fd\u5317\u65b9\u7684\u7f55\u89c1\u9ad8\u6e29\uff0c1743\u5e747\u670820\u81f325\u65e5\uff0c\u534e\u5317\u5730\u533a\u4e0b\u5348\u7684\u6c14\u6e29\u5747\u9ad8\u4e8e40\u2103\u3002\u5176\u4e2d7\u670825\u65e5\u6700\u70ed\uff0c\u6c14\u6e29\u9ad8\u8fbe44.4\u2103\u3002\u8fd9\u6837\u7684\u6781\u7aef\u9ad8\u6e29\u7eaa\u5f55\uff0c\u8fc4\u4eca\u4ece\u672a\u88ab\u8d85\u8d8a\u3002\u6c11\u56fd\u4e09\u5341\u4e00\u5e74(\u516c\u51431942\u5e74)\u548c\u516c\u51431999\u5e74\u590f\u5b63\uff0c\u534e\u5317\u5730\u533a\u5148\u540e\u51fa\u73b0\u4e86\u4e24\u6b21\u6781\u7aef\u9ad8\u6e29\u7eaa\u5f55\uff0c\u5206\u522b\u4e3a42.6\u2103\u300142.2\u2103\uff0c\u5747\u4f4e\u4e8e\u4e7e\u9686\u516b\u5e74\u7684\u6e29\u5ea6\u3002\u53c8\u8981\u7b97\u5230\u4ec0\u4e48\u5934\u4e0a\u5462\uff1f\uff01\uff01\uff01\")\n```\nresults\uff1a\n![](./fig/sa_bert.png)\n\n\n#### 2.9 \u5b9e\u4f53\u62bd\u53d6\n\n\u4f7f\u7528cluener\u4e0a\u8bad\u7ec3\u7684\u6a21\u578b\n```\nfrom envtext import BertNER,Config\nmodel = BertNER(Config.bert.clue_ner)\nmodel([\n\"\u751f\u751f\u4e0d\u606fCSOL\u751f\u5316\u72c2\u6f6e\u8ba9\u4f60\u586b\u5f39\u72c2\u626b\",\n\"\u90a3\u4e0d\u52d2\u65afvs\u9521\u8036\u7eb3\u4ee5\u53ca\u6851\u666evs\u70ed\u90a3\u4e9a\u4e4b\u4e0a\u5462\uff1f\",\n\"\u52a0\u52d2\u6bd4\u6d77\u76d73\uff1a\u4e16\u754c\u5c3d\u5934\u300b\u7684\u53bb\u5e74\u540c\u671f\u6210\u7ee9\u6b7b\u6b7b\u7529\u5728\u8eab\u540e\uff0c\u540e\u8005\u5219\u5373\u5c06\u8d76\u8d85\u300a\u53d8\u5f62\u91d1\u521a\u300b\uff0c\",\n\"\u5e03\u9c81\u4eac\u65af\u7814\u7a76\u6240\u6851\u987f\u4e2d\u56fd\u4e2d\u5fc3\u7814\u7a76\u90e8\u4e3b\u4efb\u674e\u6210\u8bf4\uff0c\u4e1c\u4e9a\u7684\u548c\u5e73\u4e0e\u5b89\u5168\uff0c\u662f\u7f8e\u56fd\u7684\u201c\u6838\u5fc3\u5229\u76ca\u201d\u4e4b\u4e00\u3002\",\n\"\u6b64\u6570\u636e\u6362\u7b97\u6210\u4e9a\u6d32\u76d8\u7f57\u9a6c\u5ba2\u573a\u53ef\u8ba9\u5e73\u534a\u4f4e\u6c34\u3002\",\n],print_result=True)\n```\nresults\uff1a\n![](fig/cluener_bert.png)\n\n\n## 3. \u8bad\u7ec3\u5e76\u63a8\u7406\n\n\u4f7f\u7528envtext\uff0c\u60a8\u53ef\u4ee5\u6807\u8bb0\u4e00\u4e9b\u793a\u4f8b\uff0c\u8bad\u7ec3\u60a8\u7684\u6a21\u578b\uff0c\u5e76\u8fdb\u4e00\u6b65\u4f7f\u7528\u6a21\u578b\u6765\u63a8\u65ad\u5176\u4f59\u7684\u6587\u672c\u3002\n\n\u76ee\u524d\u652f\u6301\u7684\u6a21\u578b\uff1a\n\n| Taskname | Bert models |Albert models | RNNs models | Others |\n| ------ | ------ | ------ | ------ | ----- | \n| \u5b8c\u578b\u586b\u7a7a | BertMLM | ------ | ------ | ----- |\n| \u5206\u7c7b | BertCLS | AlbertCLS | RNNCLS | CNNCLS,TFIDFCLS |\n| \u60c5\u611f\u5206\u6790 | BertSA | ---- | RNNSA | ------ |\n| \u591a\u9009 |BertMultiChoice | AlbertMultiChoice | RNNMultiChoice | ----- |\n| \u547d\u540d\u5b9e\u4f53\u8bc6\u522b | BertNER | AlbertNER | RNNNER | ----- |\n| \u5d4c\u5957\u547d\u540d\u5b9e\u4f53\u8bc6\u522b | BertGP | ----- | ----- | ----- |\n| \u5173\u7cfb\u5206\u7c7b | BertRelation | ---- | ---- | ----- |\n| \u5b9e\u4f53\u5173\u7cfb\u8054\u5408\u62bd\u53d6 | BertTriple | ---- | ---- | ----- |\n| \u8bcd\u5411\u91cf | Bert2vec | ----- |----- | Word2Vec |\n\n\u9664\u4e86\u6587\u672c\u751f\u6210\u4efb\u52a1\u5916\uff0c\u57fa\u672c\u652f\u6301\u5927\u90e8\u5206\u7684NLP\u4efb\u52a1\u3002\n\nBert and Albert\u652f\u6301\u73af\u5883\u6587\u672c\u4e2d\u7684\u5927\u89c4\u6a21\u9884\u8bad\u7ec3\u6a21\u578b' envBert '\u548c' envalbert '\uff0c\u4ee5\u53cahuggingface transformer\u4e2d\u7684\u5176\u4ed6Bert\u6a21\u578b\u3002\n\nRNN\u6a21\u578b\u7531\u201cLSTM\u201d\u3001\u201cGRU\u201d\u548c\u201cRNN\u201d\u7ec4\u6210\uff0c\u53ef\u4ee5\u7528\u73af\u5883\u57df\u9884\u8bad\u7ec3\u7684\u8bcd\u5411\u91cf\u8fdb\u884c\u521d\u59cb\u5316\uff0c\u4e5f\u53ef\u4ee5\u7528Onehot\u7f16\u7801\u8fdb\u884c\u521d\u59cb\u5316\u3002\n\n### 3.1 \u8bad\u7ec3\n\n##### 3.1 Bert/albert \u6a21\u578b\u8bad\u7ec3\n\n```python\n#\u5bfc\u5165bert\u6a21\u578b(eg. \u5206\u7c7b\u6a21\u578b)\nfrom envtext.models import BertCLS\nmodel = BertCLS('celtics1863/env-bert-chinese')\n\n# # \u5982\u679c\u4f7f\u7528\u81ea\u5b9a\u4e49\u7684\u6570\u636e\u96c6\n# model.load_dataset(file_path,task = 'cls',format = 'datasets-format')\n# # \u4f7f\u7528envtext\u4e2d\u9ed8\u8ba4\u7684\u6570\u636e\u96c6\nmodel.load_dataset('isclimate')\n\n#\u6a21\u578b\u8bad\u7ec3\nmodel.train()\n\n#\u6a21\u578b\u4fdd\u5b58\nmodel.save_model('classification') #input directory\n```\n\n##### 3.2 RNN training\n\n```python\n#\u5bfc\u5165 rnn model(eg. \u5206\u7c7b\u6a21\u578b)\nfrom envtext.models import RNNCLS\nmodel = RNNCLS()\n\n# # \u4f7f\u7528\u81ea\u5b9a\u4e49\u7684\u6a21\u578b\n# model.load_dataset(file_path,task = 'cls',format = 'datasets-format')\n# # \u4f7f\u7528EnvText\u81ea\u5e26\u7684\u6570\u636e\u96c6\nmodel.load_dataset('isclimate')\n\n#\u6a21\u578b\u8bad\u7ec3\nmodel.train()\n\n#\u4fdd\u5b58\u6a21\u578b\nmodel.save_model('classification') #\u8f93\u5165\u5f85\u4fdd\u5b58\u7684\u6587\u4ef6\u5939\n```\n\n\n### 3.2 \u4f7f\u7528\u81ea\u5b9a\u4e49\u7684\u6a21\u578b\u63a8\u7406\n\n#### 3.2.1 \u4f7f\u7528\u81ea\u5b9a\u4e49bert\u6a21\u578b\u63a8\u7406\n```python\n#\u4ece\u6587\u4ef6\u5939\u5bfc\u5165\u83ab\u5c0f\u4ed9\nfrom envtext.models import BertMLM\nmodel = BertMLM('celtics1863/env-bert-chinese')\n\n#\u9884\u6d4b\u7ed3\u679c\uff0c\u53ef\u4ee5\u8f93\u5165 str \u6216 List[str]\nmodel('[MASK][MASK][MASK][MASK]\u662f\u5404\u56fd\u653f\u5e9c\u90fd\u5173\u5fc3\u7684\u8bdd\u9898')\n\n#\u5bfc\u51fa\u7ed3\u679c\nmodel.save_result('result.csv')\n```\n#### 3.2.2 \u4f7f\u7528RNN\u6a21\u578b\u63a8\u7406\n\n\u4ece \u542b\u6709`pytorch_model.bin`\u7684`\u6587\u4ef6` \u63a8\u7406\n\n```python\nfrom envtext.models import RNNCLS\n\nmodel = RNNCLS('local directory')\n\n#predict\nmodel('\u6c14\u5019\u53d8\u5316\u662f\u5404\u56fd\u653f\u5e9c\u90fd\u5173\u5fc3\u7684\u8bdd\u9898')\n\n#save result\nmodel.save_result('result.csv')\n```\n\n\n### 4. \u81ea\u5b9a\u4e49\u6a21\u578b\n\n##### 4.1 \u81ea\u5b9a\u4e49bert\u6a21\u578b\n\n\u4ecebert\u6a21\u578b\u5b9a\u4e49\u4e00\u4e2a\u56de\u5f52\u5668\n\n```python\nfrom envtext.models.bert_base import BertBase\nimport torch\nfrom transformers import BertPreTrainedModel,BertModel\n\nclass MyBert(BertPreTrainedModel):\n def __init__(self, config):\n super(MyBert, self).__init__(config)\n self.bert = BertModel(config) #bert model\n self.regressor = torch.nn.Linear(config.hidden_size, 1) #regressor\n self.loss = torch.nn.MSELoss() #loss function\n \n def forward(self, input_ids, token_type_ids=None, attention_mask=None, labels=None,\n position_ids=None, inputs_embeds=None, head_mask=None):\n outputs = self.bert(input_ids,\n attention_mask=attention_mask,\n token_type_ids=token_type_ids,\n position_ids=position_ids,\n head_mask=head_mask,\n inputs_embeds=inputs_embeds)\n #use[CLS] token\n cls_output = outputs[0][:,0,:] \n\n #get logits \n logits = self.regressor(cls_output)\n\n outputs = (logits,)\n \n #\u8fd9\u91cc\u9700\u8981\u4e0ebert\u7684\u63a5\u53e3\u4fdd\u6301\u4e00\u81f4\n if labels is not None: \n loss = self.loss(logits.squeeze(),labels)\n outputs = (loss,) + outputs\n return outputs\n\n```\n\u5bf9\u9f50EnvText\u7684\u63a5\u53e3\uff1a\n\n```python\nclass MyBertModel(BertBase):\n #Rewrite the initialization function\n def initialize_bert(self,path = None,config = None,**kwargs):\n super().initialize_bert(path,config,**kwargs)\n self.model = MyBert.from_pretrained(self.model_path)\n\n #[Optional] \u91cd\u5199\u9884\u5904\u7406\u51fd\u6570\n def preprocess(self,text, logits, **kwargs): \n text = text.replace(\"\\n\", \"\")\n return text\n\n #[Optional] \u91cd\u5199\u540e\u5904\u7406\u51fd\u6570\n def postprocess(self,text, logits, **kwargs): \n logits = logits.squeeze()\n return logits.tolist()\n \n #[Optional] \u5728\u8bad\u7ec3\u65f6\u4f1a\u8c03\u7528\uff0c\u8ba1\u7b97\u9664loss\u4ee5\u5916\u7684metric\n def compute_metrics(eval_pred)\n from envtext.utils.metrics import metrics_for_reg\n return metrics_for_reg(eval_pred)\n \n #[Optional] Optional to align parameters in config,\n #\u5bf9\u9f50\u53c2\u6570\u914d\u7f6e\n def align_config(self):\n super().align_config()\n ##\u53ef\u4ee5\u4f7f\u7528self.update_config() \u6216 self.set_attribute() \u63a5\u53e3\u91cd\u65b0\u8bbe\u7f6econfig\n pass\n```\n\n##### 4.2 self-defined RNN model\n\nRNN\u6a21\u578b\u7684\u5b9a\u4e49\u4e0e\u6b64\u7c7b\u4f3c\u3002 \n\n\u9996\u5148\uff0c\u5b9e\u73b0LSTM\u5206\u7c7b\u6a21\u578b\uff0c\u5177\u4f53\u5982\u4e0b:\n\n```python\nfrom torch import nn\nimport torch\nclass MyRNN(nn.Module):\n def __init__(self,config):\n self.rnn = nn.LSTM(config.embed_size, config.hidden_size ,config.num_layers,batch_first = True)\n self.classifier = nn.Linear(config.hidden_size,config.num_labels)\n \n def forward(self,X,labels = None):\n X,_ = self.rnn(X)\n logits = self.classifier(X)\n \n #Align interfaces, still need to output with labels present (loss,logits) and without labels (logits,)\n if labels is not None:\n loss = self.loss_fn(logits,labels)\n return (loss,logits) \n return (logits,)\n```\n\u5bf9\u9f50EnvText\u7684\u63a5\u53e3\n\n```python\nimport numpy as np\nclass MyRNNModel(BertBase):\n #Rewrite the initialization function\n def initialize_bert(self,path = None,config = None,**kwargs):\n super().initialize_bert(path,config,**kwargs) #\u4fdd\u6301\u4e0d\u53d8\n self.model = MyRNN.from_pretrained(self.model_path) \n\n #[Optional] rewrite the function that postprocesses the prediction result\n def postprocess(self,text, logits, print_result = True ,save_result = True): \n pred = np.argmax(logits,axis = -1)\n return pred.tolist()\n \n #[Optional] rewrite metrics\uff0cadd metric besides loss, for training\n def compute_metrics(eval_pred):\n return {} #\u8fd4\u56de\u4e00\u4e2adict\n \n #[Optional] rewrite align_config\n #Because there are times when you need to accept multiple inputs, such as the number of categories or a list of categories when classifying tasks, you can use this interface for alignment.\n def align_config(self):\n super().align_config()\n```\n\n\u5bf9\u4e8e\u66f4\u8be6\u7ec6\u7684\u6559\u7a0b\uff0c\u6848\u4f8b\u5c06\u6dfb\u52a0\u5728[jupyter notebooks]('notebooks')\n\n\n### 5. \u4f7f\u7528\u5efa\u8bae\n\n1. Bert\u6a21\u578b\u6bd4\u8f83\u5927\uff0c\u5982\u679c\u53ea\u6709cpu\u7684\u60c5\u51b5\u4e0b\uff0c\u5efa\u8bae\u5148\u7528RNN\u6a21\u578b\uff0c\u8dd1\u51fa\u4e00\u4e2a\u7ed3\u679c\uff0c\u89c2\u5bdf\u6570\u636e\u96c6\u7684\u6570\u91cf/\u8d28\u91cf\u662f\u5426\u8fbe\u6807\uff0c\u518d\u8003\u8651\u662f\u5426\u7528Bert\u6a21\u578b\u3002\u4e00\u822cenvbert\u6a21\u578b\u8981\u6bd4RNN\u6a21\u578b\u9886\u514810\u4e2a\u70b9\u5de6\u53f3\uff0c\u5c24\u5176\u5728\u6570\u636e\u96c6\u8d8a\u5c0f\u7684\u60c5\u51b5\u4e0b\uff0cenvbert\u7684\u4f18\u52bf\u8d8a\u660e\u663e\u3002\n2. \u795e\u7ecf\u7f51\u7edc\u6a21\u578b\u53d7\u5230\u521d\u59cb\u5316\u6743\u91cd\u5f71\u54cd\uff0c\u6bcf\u4e00\u6b21\u8bad\u7ec3\u7684\u60c5\u51b5\u4e0d\u4e00\u6837\uff0c\u5efa\u8bae\u591a\u8dd1\u51e0\u6b21\uff0c\u53d6\u6700\u597d\u7684\u7ed3\u679c\u3002\n3. Learning rate, Epoch, Batchsize\u662f\u4e09\u4e2a\u6700\u5173\u952e\u7684\u8d85\u53c2\u6570\uff0c\u9700\u8981\u5bf9\u4e0d\u540c\u6570\u636e\u96c6\u5c0f\u5fc3\u8c03\u6574\u3002\u9ed8\u8ba4\u7684\u53c2\u6570\u53ef\u4ee5\u5728\u5927\u591a\u6570\u60c5\u51b5\u4e0b\u8fbe\u5230\u8f83\u4f18\u7684\u503c\uff0c\u4f46\u662f\u4e00\u5b9a\u4e0d\u4f1a\u8fbe\u5230\u6700\u597d\u7684\u7ed3\u679c\u3002\n\n# LISENCE\nApache Lisence\n\n\n\n\n",
"bugtrack_url": null,
"license": "",
"summary": "envtext for Chinese texts analysis in Environment domain",
"version": "0.1.4",
"split_keywords": [
"nlp",
"bert",
"chinese",
"lstm",
"rnn",
"domain text analysis"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "cde1b54b103ff882e916256a3902e7d2d4c69827925e1380e33e3d8d14e334a0",
"md5": "65f197e6797056d9d556d4f07e8d44ce",
"sha256": "2edae27a818a668e9c62be99a800786ef258cc2182fb5bec6a3bc719a69af68b"
},
"downloads": -1,
"filename": "envtext-0.1.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "65f197e6797056d9d556d4f07e8d44ce",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6",
"size": 35796282,
"upload_time": "2023-04-19T04:47:30",
"upload_time_iso_8601": "2023-04-19T04:47:30.252490Z",
"url": "https://files.pythonhosted.org/packages/cd/e1/b54b103ff882e916256a3902e7d2d4c69827925e1380e33e3d8d14e334a0/envtext-0.1.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-04-19 04:47:30",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "celtics1863",
"github_project": "envtext",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "envtext"
}