pybert


Namepybert JSON
Version 0.0.2 PyPI version JSON
download
home_pagehttps://github.com/guofei9987/pybert
Summary一行代码训练 BERT
upload_time2022-12-29 12:32:52
maintainer
docs_urlNone
authorGuo Fei
requires_python>=3.5
licenseMIT
keywords
VCS
bugtrack_url
requirements boto3 botocore
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pybert

安装
```bash
>pip install pybert
```


## 预训练模型


下载地址:  
- bert_Chinese 模型文件: https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese.tar.gz
- 词表 https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt
- 【备用】百度网盘:https://pan.baidu.com/s/1HPZBvkMAyu0nDUqHWsb0SA?pwd=abbn

所需文件:
- pytorch_model.bin  
- bert_config.json  
- vocab.txt  


放到 bert_pretrain 文件夹中

## 训练数据下载

[THUCNews](https://github.com/guofei9987/datasets_for_ml/blob/master/nlp/THUCNews.7z)
- 可以任意指定文件夹名称,训练数据的格式要和上面一致


## 训练和预测

训练
```python
from pybert.models import bert
from pybert.train_eval import load_and_train

dataset = 'THUCNews'  # 数据集
logfile = 'log.txt'  # 日志文件
config = bert.Config(dataset, logfile=logfile)
load_and_train(config)
```

预测
```python
# coding: UTF-8
import pybert.models.bert as bert
from pybert.train_eval import Prediction

config = bert.Config(dataset='THUCNews')
prediction = Prediction(config)

sentences = ['野兽用纪录打爆第二中锋 掘金版三巨头已巍然成型', '56所高校预估2009年湖北录取分数线出炉']

predict_label, score = prediction.predict(sentences)
print("predict label:")
print(predict_label)
```




## 对应论文
[1] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding  



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/guofei9987/pybert",
    "name": "pybert",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.5",
    "maintainer_email": "",
    "keywords": "",
    "author": "Guo Fei",
    "author_email": "guofei9987@foxmail.com",
    "download_url": "https://files.pythonhosted.org/packages/49/f1/372bd98b5ae7ebce3906ee284cb5fd507dcf9e897f020be7f1733f73e1e3/pybert-0.0.2.tar.gz",
    "platform": "linux",
    "description": "# pybert\n\n\u5b89\u88c5\n```bash\n>pip install pybert\n```\n\n\n## \u9884\u8bad\u7ec3\u6a21\u578b\n\n\n\u4e0b\u8f7d\u5730\u5740\uff1a  \n- bert_Chinese \u6a21\u578b\u6587\u4ef6: https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese.tar.gz\n- \u8bcd\u8868 https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt\n- \u3010\u5907\u7528\u3011\u767e\u5ea6\u7f51\u76d8\uff1ahttps://pan.baidu.com/s/1HPZBvkMAyu0nDUqHWsb0SA?pwd=abbn\n\n\u6240\u9700\u6587\u4ef6\uff1a\n- pytorch_model.bin  \n- bert_config.json  \n- vocab.txt  \n\n\n\u653e\u5230 bert_pretrain \u6587\u4ef6\u5939\u4e2d\n\n## \u8bad\u7ec3\u6570\u636e\u4e0b\u8f7d\n\n[THUCNews](https://github.com/guofei9987/datasets_for_ml/blob/master/nlp/THUCNews.7z)\n- \u53ef\u4ee5\u4efb\u610f\u6307\u5b9a\u6587\u4ef6\u5939\u540d\u79f0\uff0c\u8bad\u7ec3\u6570\u636e\u7684\u683c\u5f0f\u8981\u548c\u4e0a\u9762\u4e00\u81f4\n\n\n## \u8bad\u7ec3\u548c\u9884\u6d4b\n\n\u8bad\u7ec3\n```python\nfrom pybert.models import bert\nfrom pybert.train_eval import load_and_train\n\ndataset = 'THUCNews'  # \u6570\u636e\u96c6\nlogfile = 'log.txt'  # \u65e5\u5fd7\u6587\u4ef6\nconfig = bert.Config(dataset, logfile=logfile)\nload_and_train(config)\n```\n\n\u9884\u6d4b\n```python\n# coding: UTF-8\nimport pybert.models.bert as bert\nfrom pybert.train_eval import Prediction\n\nconfig = bert.Config(dataset='THUCNews')\nprediction = Prediction(config)\n\nsentences = ['\u91ce\u517d\u7528\u7eaa\u5f55\u6253\u7206\u7b2c\u4e8c\u4e2d\u950b \u6398\u91d1\u7248\u4e09\u5de8\u5934\u5df2\u5dcd\u7136\u6210\u578b', '56\u6240\u9ad8\u6821\u9884\u4f302009\u5e74\u6e56\u5317\u5f55\u53d6\u5206\u6570\u7ebf\u51fa\u7089']\n\npredict_label, score = prediction.predict(sentences)\nprint(\"predict label:\")\nprint(predict_label)\n```\n\n\n\n\n## \u5bf9\u5e94\u8bba\u6587\n[1] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding  \n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "\u4e00\u884c\u4ee3\u7801\u8bad\u7ec3 BERT",
    "version": "0.0.2",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "f1624b10f56f44e26084708998f68846",
                "sha256": "c8ffa6ccfd44232ada27082d285690c8cbda9b363c4c20f284365796151116d4"
            },
            "downloads": -1,
            "filename": "pybert-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "f1624b10f56f44e26084708998f68846",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.5",
            "size": 29936,
            "upload_time": "2022-12-29T12:32:49",
            "upload_time_iso_8601": "2022-12-29T12:32:49.401023Z",
            "url": "https://files.pythonhosted.org/packages/ba/ee/4873531be4c3c307749b15bd894ae38c805a16902c5acbaf0787a6bf1ddf/pybert-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "c6b56aec2b40f9ef7572735b84d8c73e",
                "sha256": "88a558d14930ed546ff827e189955c8448dfdb90a06d3350e260fa34a32a1f93"
            },
            "downloads": -1,
            "filename": "pybert-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "c6b56aec2b40f9ef7572735b84d8c73e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.5",
            "size": 28560,
            "upload_time": "2022-12-29T12:32:52",
            "upload_time_iso_8601": "2022-12-29T12:32:52.887208Z",
            "url": "https://files.pythonhosted.org/packages/49/f1/372bd98b5ae7ebce3906ee284cb5fd507dcf9e897f020be7f1733f73e1e3/pybert-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2022-12-29 12:32:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "guofei9987",
    "github_project": "pybert",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "boto3",
            "specs": []
        },
        {
            "name": "botocore",
            "specs": []
        }
    ],
    "lcname": "pybert"
}
        
Elapsed time: 0.02555s