# pybert
安装
```bash
>pip install pybert
```
## 预训练模型
下载地址:
- bert_Chinese 模型文件: https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese.tar.gz
- 词表 https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt
- 【备用】百度网盘:https://pan.baidu.com/s/1HPZBvkMAyu0nDUqHWsb0SA?pwd=abbn
所需文件:
- pytorch_model.bin
- bert_config.json
- vocab.txt
放到 bert_pretrain 文件夹中
## 训练数据下载
[THUCNews](https://github.com/guofei9987/datasets_for_ml/blob/master/nlp/THUCNews.7z)
- 可以任意指定文件夹名称,训练数据的格式要和上面一致
## 训练和预测
训练
```python
from pybert.models import bert
from pybert.train_eval import load_and_train
dataset = 'THUCNews' # 数据集
logfile = 'log.txt' # 日志文件
config = bert.Config(dataset, logfile=logfile)
load_and_train(config)
```
预测
```python
# coding: UTF-8
import pybert.models.bert as bert
from pybert.train_eval import Prediction
config = bert.Config(dataset='THUCNews')
prediction = Prediction(config)
sentences = ['野兽用纪录打爆第二中锋 掘金版三巨头已巍然成型', '56所高校预估2009年湖北录取分数线出炉']
predict_label, score = prediction.predict(sentences)
print("predict label:")
print(predict_label)
```
## 对应论文
[1] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Raw data
{
"_id": null,
"home_page": "https://github.com/guofei9987/pybert",
"name": "pybert",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.5",
"maintainer_email": "",
"keywords": "",
"author": "Guo Fei",
"author_email": "guofei9987@foxmail.com",
"download_url": "https://files.pythonhosted.org/packages/49/f1/372bd98b5ae7ebce3906ee284cb5fd507dcf9e897f020be7f1733f73e1e3/pybert-0.0.2.tar.gz",
"platform": "linux",
"description": "# pybert\n\n\u5b89\u88c5\n```bash\n>pip install pybert\n```\n\n\n## \u9884\u8bad\u7ec3\u6a21\u578b\n\n\n\u4e0b\u8f7d\u5730\u5740\uff1a \n- bert_Chinese \u6a21\u578b\u6587\u4ef6: https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese.tar.gz\n- \u8bcd\u8868 https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt\n- \u3010\u5907\u7528\u3011\u767e\u5ea6\u7f51\u76d8\uff1ahttps://pan.baidu.com/s/1HPZBvkMAyu0nDUqHWsb0SA?pwd=abbn\n\n\u6240\u9700\u6587\u4ef6\uff1a\n- pytorch_model.bin \n- bert_config.json \n- vocab.txt \n\n\n\u653e\u5230 bert_pretrain \u6587\u4ef6\u5939\u4e2d\n\n## \u8bad\u7ec3\u6570\u636e\u4e0b\u8f7d\n\n[THUCNews](https://github.com/guofei9987/datasets_for_ml/blob/master/nlp/THUCNews.7z)\n- \u53ef\u4ee5\u4efb\u610f\u6307\u5b9a\u6587\u4ef6\u5939\u540d\u79f0\uff0c\u8bad\u7ec3\u6570\u636e\u7684\u683c\u5f0f\u8981\u548c\u4e0a\u9762\u4e00\u81f4\n\n\n## \u8bad\u7ec3\u548c\u9884\u6d4b\n\n\u8bad\u7ec3\n```python\nfrom pybert.models import bert\nfrom pybert.train_eval import load_and_train\n\ndataset = 'THUCNews' # \u6570\u636e\u96c6\nlogfile = 'log.txt' # \u65e5\u5fd7\u6587\u4ef6\nconfig = bert.Config(dataset, logfile=logfile)\nload_and_train(config)\n```\n\n\u9884\u6d4b\n```python\n# coding: UTF-8\nimport pybert.models.bert as bert\nfrom pybert.train_eval import Prediction\n\nconfig = bert.Config(dataset='THUCNews')\nprediction = Prediction(config)\n\nsentences = ['\u91ce\u517d\u7528\u7eaa\u5f55\u6253\u7206\u7b2c\u4e8c\u4e2d\u950b \u6398\u91d1\u7248\u4e09\u5de8\u5934\u5df2\u5dcd\u7136\u6210\u578b', '56\u6240\u9ad8\u6821\u9884\u4f302009\u5e74\u6e56\u5317\u5f55\u53d6\u5206\u6570\u7ebf\u51fa\u7089']\n\npredict_label, score = prediction.predict(sentences)\nprint(\"predict label:\")\nprint(predict_label)\n```\n\n\n\n\n## \u5bf9\u5e94\u8bba\u6587\n[1] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding \n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "\u4e00\u884c\u4ee3\u7801\u8bad\u7ec3 BERT",
"version": "0.0.2",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"md5": "f1624b10f56f44e26084708998f68846",
"sha256": "c8ffa6ccfd44232ada27082d285690c8cbda9b363c4c20f284365796151116d4"
},
"downloads": -1,
"filename": "pybert-0.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f1624b10f56f44e26084708998f68846",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.5",
"size": 29936,
"upload_time": "2022-12-29T12:32:49",
"upload_time_iso_8601": "2022-12-29T12:32:49.401023Z",
"url": "https://files.pythonhosted.org/packages/ba/ee/4873531be4c3c307749b15bd894ae38c805a16902c5acbaf0787a6bf1ddf/pybert-0.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"md5": "c6b56aec2b40f9ef7572735b84d8c73e",
"sha256": "88a558d14930ed546ff827e189955c8448dfdb90a06d3350e260fa34a32a1f93"
},
"downloads": -1,
"filename": "pybert-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "c6b56aec2b40f9ef7572735b84d8c73e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5",
"size": 28560,
"upload_time": "2022-12-29T12:32:52",
"upload_time_iso_8601": "2022-12-29T12:32:52.887208Z",
"url": "https://files.pythonhosted.org/packages/49/f1/372bd98b5ae7ebce3906ee284cb5fd507dcf9e897f020be7f1733f73e1e3/pybert-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2022-12-29 12:32:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "guofei9987",
"github_project": "pybert",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "boto3",
"specs": []
},
{
"name": "botocore",
"specs": []
}
],
"lcname": "pybert"
}