<a href="https://explosion.ai"><img src="https://explosion.ai/assets/img/logo.svg" width="125" height="125" align="right" /></a>
# spacy-pkuseg: Chinese word segmentation toolkit for spaCy
This package is a fork of
[`pkuseg-python`](https://github.com/lancopku/pkuseg-python) that simplifies
installation and serialization for use with [spaCy](https://spacy.io). The
underlying segmentation tools remain unmodified.
----------
# pkuseg:一个多领域中文分词工具包 [**(English Version)**](readme/readme_english.md)
pkuseg 是基于论文[[Luo et. al, 2019](#论文引用)]的工具包。其简单易用,支持细分领域分词,有效提升了分词准确度。
## 目录
* [主要亮点](#主要亮点)
* [编译和安装](#编译和安装)
* [各类分词工具包的性能对比](#各类分词工具包的性能对比)
* [使用方式](#使用方式)
* [论文引用](#论文引用)
* [作者](#作者)
* [常见问题及解答](#常见问题及解答)
## 主要亮点
pkuseg具有如下几个特点:
1. 多领域分词。不同于以往的通用中文分词工具,此工具包同时致力于为不同领域的数据提供个性化的预训练模型。根据待分词文本的领域特点,用户可以自由地选择不同的模型。 我们目前支持了新闻领域,网络领域,医药领域,旅游领域,以及混合领域的分词预训练模型。在使用中,如果用户明确待分词的领域,可加载对应的模型进行分词。如果用户无法确定具体领域,推荐使用在混合领域上训练的通用模型。各领域分词样例可参考 [**example.txt**](https://github.com/lancopku/pkuseg-python/blob/master/example.txt)。
2. 更高的分词准确率。相比于其他的分词工具包,当使用相同的训练数据和测试数据,pkuseg可以取得更高的分词准确率。
3. 支持用户自训练模型。支持用户使用全新的标注数据进行训练。
4. 支持词性标注。
## 编译和安装
- 目前**仅支持python3**
- **为了获得好的效果和速度,强烈建议大家通过pip install更新到目前的最新版本**
1. 通过PyPI安装(自带模型文件):
```
pip3 install pkuseg
之后通过import pkuseg来引用
```
**建议更新到最新版本**以获得更好的开箱体验:
```
pip3 install -U pkuseg
```
2. 如果PyPI官方源下载速度不理想,建议使用镜像源,比如:
初次安装:
```
pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple pkuseg
```
更新:
```
pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple -U pkuseg
```
3. 如果不使用pip安装方式,选择从GitHub下载,可运行以下命令安装:
```
python setup.py build_ext -i
```
GitHub的代码并不包括预训练模型,因此需要用户自行下载或训练模型,预训练模型可详见[release](https://github.com/lancopku/pkuseg-python/releases)。使用时需设定"model_name"为模型文件。
注意:**安装方式1和2目前仅支持linux(ubuntu)、mac、windows 64 位的python3版本**。如果非以上系统,请使用安装方式3进行本地编译安装。
## 各类分词工具包的性能对比
我们选择jieba、THULAC等国内代表分词工具包与pkuseg做性能比较,详细设置可参考[实验环境](readme/environment.md)。
#### 细领域训练及测试结果
以下是在不同数据集上的对比结果:
| MSRA | Precision | Recall | F-score |
| :----- | --------: | -----: | --------: |
| jieba | 87.01 | 89.88 | 88.42 |
| THULAC | 95.60 | 95.91 | 95.71 |
| pkuseg | 96.94 | 96.81 | **96.88** |
| WEIBO | Precision | Recall | F-score |
| :----- | --------: | -----: | --------: |
| jieba | 87.79 | 87.54 | 87.66 |
| THULAC | 93.40 | 92.40 | 92.87 |
| pkuseg | 93.78 | 94.65 | **94.21** |
#### 默认模型在不同领域的测试效果
考虑到很多用户在尝试分词工具的时候,大多数时候会使用工具包自带模型测试。为了直接对比“初始”性能,我们也比较了各个工具包的默认模型在不同领域的测试效果。请注意,这样的比较只是为了说明默认情况下的效果,并不一定是公平的。
| Default | MSRA | CTB8 | PKU | WEIBO | All Average |
| ------- | :---: | :---: | :---: | :---: | :---------: |
| jieba | 81.45 | 79.58 | 81.83 | 83.56 | 81.61 |
| THULAC | 85.55 | 87.84 | 92.29 | 86.65 | 88.08 |
| pkuseg | 87.29 | 91.77 | 92.68 | 93.43 | **91.29** |
其中,`All Average`显示的是在所有测试集上F-score的平均。
更多详细比较可参见[和现有工具包的比较](readme/comparison.md)。
## 使用方式
#### 代码示例
以下代码示例适用于python交互式环境。
代码示例1:使用默认配置进行分词(**如果用户无法确定分词领域,推荐使用默认模型分词**)
```python3
import pkuseg
seg = pkuseg.pkuseg() # 以默认配置加载模型
text = seg.cut('我爱北京天安门') # 进行分词
print(text)
```
代码示例2:细领域分词(**如果用户明确分词领域,推荐使用细领域模型分词**)
```python3
import pkuseg
seg = pkuseg.pkuseg(model_name='medicine') # 程序会自动下载所对应的细领域模型
text = seg.cut('我爱北京天安门') # 进行分词
print(text)
```
代码示例3:分词同时进行词性标注,各词性标签的详细含义可参考 [tags.txt](https://github.com/lancopku/pkuseg-python/blob/master/tags.txt)
```python3
import pkuseg
seg = pkuseg.pkuseg(postag=True) # 开启词性标注功能
text = seg.cut('我爱北京天安门') # 进行分词和词性标注
print(text)
```
代码示例4:对文件分词
```python3
import pkuseg
# 对input.txt的文件分词输出到output.txt中
# 开20个进程
pkuseg.test('input.txt', 'output.txt', nthread=20)
```
其他使用示例可参见[详细代码示例](readme/interface.md)。
#### 参数说明
模型配置
```
pkuseg.pkuseg(model_name = "default", user_dict = "default", postag = False)
model_name 模型路径。
"default",默认参数,表示使用我们预训练好的混合领域模型(仅对pip下载的用户)。
"news", 使用新闻领域模型。
"web", 使用网络领域模型。
"medicine", 使用医药领域模型。
"tourism", 使用旅游领域模型。
model_path, 从用户指定路径加载模型。
user_dict 设置用户词典。
"default", 默认参数,使用我们提供的词典。
None, 不使用词典。
dict_path, 在使用默认词典的同时会额外使用用户自定义词典,可以填自己的用户词典的路径,词典格式为一行一个词(如果选择进行词性标注并且已知该词的词性,则在该行写下词和词性,中间用tab字符隔开)。
postag 是否进行词性分析。
False, 默认参数,只进行分词,不进行词性标注。
True, 会在分词的同时进行词性标注。
```
对文件进行分词
```
pkuseg.test(readFile, outputFile, model_name = "default", user_dict = "default", postag = False, nthread = 10)
readFile 输入文件路径。
outputFile 输出文件路径。
model_name 模型路径。同pkuseg.pkuseg
user_dict 设置用户词典。同pkuseg.pkuseg
postag 设置是否开启词性分析功能。同pkuseg.pkuseg
nthread 测试时开的进程数。
```
模型训练
```
pkuseg.train(trainFile, testFile, savedir, train_iter = 20, init_model = None)
trainFile 训练文件路径。
testFile 测试文件路径。
savedir 训练模型的保存路径。
train_iter 训练轮数。
init_model 初始化模型,默认为None表示使用默认初始化,用户可以填自己想要初始化的模型的路径如init_model='./models/'。
```
#### 多进程分词
当将以上代码示例置于文件中运行时,如涉及多进程功能,请务必使用`if __name__ == '__main__'`保护全局语句,详见[多进程分词](readme/multiprocess.md)。
## 预训练模型
从pip安装的用户在使用细领域分词功能时,只需要设置model_name字段为对应的领域即可,会自动下载对应的细领域模型。
从github下载的用户则需要自己下载对应的预训练模型,并设置model_name字段为预训练模型路径。预训练模型可以在[release](https://github.com/lancopku/pkuseg-python/releases)部分下载。以下是对预训练模型的说明:
- **news**: 在MSRA(新闻语料)上训练的模型。
- **web**: 在微博(网络文本语料)上训练的模型。
- **medicine**: 在医药领域上训练的模型。
- **tourism**: 在旅游领域上训练的模型。
- **mixed**: 混合数据集训练的通用模型。随pip包附带的是此模型。
欢迎更多用户可以分享自己训练好的细分领域模型。
## 版本历史
详见[版本历史](readme/history.md)。
## 开源协议
1. 本代码采用MIT许可证。
2. 欢迎对该工具包提出任何宝贵意见和建议,请发邮件至jingjingxu@pku.edu.cn。
## 论文引用
该代码包主要基于以下科研论文,如使用了本工具,请引用以下论文:
* Ruixuan Luo, Jingjing Xu, Yi Zhang, Xuancheng Ren, Xu Sun. [PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation](https://arxiv.org/abs/1906.11455). Arxiv. 2019.
```
@article{pkuseg,
author = {Luo, Ruixuan and Xu, Jingjing and Zhang, Yi and Ren, Xuancheng and Sun, Xu},
journal = {CoRR},
title = {PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation.},
url = {https://arxiv.org/abs/1906.11455},
volume = {abs/1906.11455},
year = 2019
}
```
## 其他相关论文
* Xu Sun, Houfeng Wang, Wenjie Li. Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection. ACL. 2012.
* Jingjing Xu and Xu Sun. Dependency-based gated recursive neural network for chinese word segmentation. ACL. 2016.
* Jingjing Xu and Xu Sun. Transfer learning for low-resource chinese word segmentation with a novel neural network. NLPCC. 2017.
## 常见问题及解答
1. [为什么要发布pkuseg?](https://github.com/lancopku/pkuseg-python/wiki/FAQ#1-为什么要发布pkuseg)
2. [pkuseg使用了哪些技术?](https://github.com/lancopku/pkuseg-python/wiki/FAQ#2-pkuseg使用了哪些技术)
3. [无法使用多进程分词和训练功能,提示RuntimeError和BrokenPipeError。](https://github.com/lancopku/pkuseg-python/wiki/FAQ#3-无法使用多进程分词和训练功能提示runtimeerror和brokenpipeerror)
4. [是如何跟其它工具包在细领域数据上进行比较的?](https://github.com/lancopku/pkuseg-python/wiki/FAQ#4-是如何跟其它工具包在细领域数据上进行比较的)
5. [在黑盒测试集上进行比较的话,效果如何?](https://github.com/lancopku/pkuseg-python/wiki/FAQ#5-在黑盒测试集上进行比较的话效果如何)
6. [如果我不了解待分词语料的所属领域呢?](https://github.com/lancopku/pkuseg-python/wiki/FAQ#6-如果我不了解待分词语料的所属领域呢)
7. [如何看待在一些特定样例上的分词结果?](https://github.com/lancopku/pkuseg-python/wiki/FAQ#7-如何看待在一些特定样例上的分词结果)
8. [关于运行速度问题?](https://github.com/lancopku/pkuseg-python/wiki/FAQ#8-关于运行速度问题)
9. [关于多进程速度问题?](https://github.com/lancopku/pkuseg-python/wiki/FAQ#9-关于多进程速度问题)
## 致谢
感谢俞士汶教授(北京大学计算语言所)与邱立坤博士提供的训练数据集!
## 作者
Ruixuan Luo (罗睿轩), Jingjing Xu(许晶晶), Xuancheng Ren(任宣丞), Yi Zhang(张艺), Bingzhen Wei(位冰镇), Xu Sun (孙栩)
北京大学 [语言计算与机器学习研究组](http://lanco.pku.edu.cn/)
Raw data
{
"_id": null,
"home_page": "https://github.com/explosion/spacy-pkuseg",
"name": "spacy-pkuseg",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": null,
"author": "Explosion",
"author_email": "contact@explosion.ai",
"download_url": "https://files.pythonhosted.org/packages/a7/33/c2370bbe09daf655332a34b263a0e6279e630b30b57364438381a511f964/spacy_pkuseg-1.0.0.tar.gz",
"platform": null,
"description": "<a href=\"https://explosion.ai\"><img src=\"https://explosion.ai/assets/img/logo.svg\" width=\"125\" height=\"125\" align=\"right\" /></a>\n\n# spacy-pkuseg: Chinese word segmentation toolkit for spaCy\n\nThis package is a fork of\n[`pkuseg-python`](https://github.com/lancopku/pkuseg-python) that simplifies\ninstallation and serialization for use with [spaCy](https://spacy.io). The\nunderlying segmentation tools remain unmodified.\n\n----------\n\n# pkuseg\uff1a\u4e00\u4e2a\u591a\u9886\u57df\u4e2d\u6587\u5206\u8bcd\u5de5\u5177\u5305 [**(English Version)**](readme/readme_english.md)\n\npkuseg \u662f\u57fa\u4e8e\u8bba\u6587[[Luo et. al, 2019](#\u8bba\u6587\u5f15\u7528)]\u7684\u5de5\u5177\u5305\u3002\u5176\u7b80\u5355\u6613\u7528\uff0c\u652f\u6301\u7ec6\u5206\u9886\u57df\u5206\u8bcd\uff0c\u6709\u6548\u63d0\u5347\u4e86\u5206\u8bcd\u51c6\u786e\u5ea6\u3002\n\n\n\n## \u76ee\u5f55\n\n* [\u4e3b\u8981\u4eae\u70b9](#\u4e3b\u8981\u4eae\u70b9)\n* [\u7f16\u8bd1\u548c\u5b89\u88c5](#\u7f16\u8bd1\u548c\u5b89\u88c5)\n* [\u5404\u7c7b\u5206\u8bcd\u5de5\u5177\u5305\u7684\u6027\u80fd\u5bf9\u6bd4](#\u5404\u7c7b\u5206\u8bcd\u5de5\u5177\u5305\u7684\u6027\u80fd\u5bf9\u6bd4)\n* [\u4f7f\u7528\u65b9\u5f0f](#\u4f7f\u7528\u65b9\u5f0f)\n* [\u8bba\u6587\u5f15\u7528](#\u8bba\u6587\u5f15\u7528)\n* [\u4f5c\u8005](#\u4f5c\u8005)\n* [\u5e38\u89c1\u95ee\u9898\u53ca\u89e3\u7b54](#\u5e38\u89c1\u95ee\u9898\u53ca\u89e3\u7b54)\n\n\n\n## \u4e3b\u8981\u4eae\u70b9\n\npkuseg\u5177\u6709\u5982\u4e0b\u51e0\u4e2a\u7279\u70b9\uff1a\n\n1. \u591a\u9886\u57df\u5206\u8bcd\u3002\u4e0d\u540c\u4e8e\u4ee5\u5f80\u7684\u901a\u7528\u4e2d\u6587\u5206\u8bcd\u5de5\u5177\uff0c\u6b64\u5de5\u5177\u5305\u540c\u65f6\u81f4\u529b\u4e8e\u4e3a\u4e0d\u540c\u9886\u57df\u7684\u6570\u636e\u63d0\u4f9b\u4e2a\u6027\u5316\u7684\u9884\u8bad\u7ec3\u6a21\u578b\u3002\u6839\u636e\u5f85\u5206\u8bcd\u6587\u672c\u7684\u9886\u57df\u7279\u70b9\uff0c\u7528\u6237\u53ef\u4ee5\u81ea\u7531\u5730\u9009\u62e9\u4e0d\u540c\u7684\u6a21\u578b\u3002 \u6211\u4eec\u76ee\u524d\u652f\u6301\u4e86\u65b0\u95fb\u9886\u57df\uff0c\u7f51\u7edc\u9886\u57df\uff0c\u533b\u836f\u9886\u57df\uff0c\u65c5\u6e38\u9886\u57df\uff0c\u4ee5\u53ca\u6df7\u5408\u9886\u57df\u7684\u5206\u8bcd\u9884\u8bad\u7ec3\u6a21\u578b\u3002\u5728\u4f7f\u7528\u4e2d\uff0c\u5982\u679c\u7528\u6237\u660e\u786e\u5f85\u5206\u8bcd\u7684\u9886\u57df\uff0c\u53ef\u52a0\u8f7d\u5bf9\u5e94\u7684\u6a21\u578b\u8fdb\u884c\u5206\u8bcd\u3002\u5982\u679c\u7528\u6237\u65e0\u6cd5\u786e\u5b9a\u5177\u4f53\u9886\u57df\uff0c\u63a8\u8350\u4f7f\u7528\u5728\u6df7\u5408\u9886\u57df\u4e0a\u8bad\u7ec3\u7684\u901a\u7528\u6a21\u578b\u3002\u5404\u9886\u57df\u5206\u8bcd\u6837\u4f8b\u53ef\u53c2\u8003 [**example.txt**](https://github.com/lancopku/pkuseg-python/blob/master/example.txt)\u3002\n2. \u66f4\u9ad8\u7684\u5206\u8bcd\u51c6\u786e\u7387\u3002\u76f8\u6bd4\u4e8e\u5176\u4ed6\u7684\u5206\u8bcd\u5de5\u5177\u5305\uff0c\u5f53\u4f7f\u7528\u76f8\u540c\u7684\u8bad\u7ec3\u6570\u636e\u548c\u6d4b\u8bd5\u6570\u636e\uff0cpkuseg\u53ef\u4ee5\u53d6\u5f97\u66f4\u9ad8\u7684\u5206\u8bcd\u51c6\u786e\u7387\u3002\n3. \u652f\u6301\u7528\u6237\u81ea\u8bad\u7ec3\u6a21\u578b\u3002\u652f\u6301\u7528\u6237\u4f7f\u7528\u5168\u65b0\u7684\u6807\u6ce8\u6570\u636e\u8fdb\u884c\u8bad\u7ec3\u3002\n4. \u652f\u6301\u8bcd\u6027\u6807\u6ce8\u3002\n\n\n## \u7f16\u8bd1\u548c\u5b89\u88c5\n\n- \u76ee\u524d**\u4ec5\u652f\u6301python3**\n- **\u4e3a\u4e86\u83b7\u5f97\u597d\u7684\u6548\u679c\u548c\u901f\u5ea6\uff0c\u5f3a\u70c8\u5efa\u8bae\u5927\u5bb6\u901a\u8fc7pip install\u66f4\u65b0\u5230\u76ee\u524d\u7684\u6700\u65b0\u7248\u672c**\n\n1. \u901a\u8fc7PyPI\u5b89\u88c5(\u81ea\u5e26\u6a21\u578b\u6587\u4ef6)\uff1a\n\t```\n\tpip3 install pkuseg\n\t\u4e4b\u540e\u901a\u8fc7import pkuseg\u6765\u5f15\u7528\n\t```\n **\u5efa\u8bae\u66f4\u65b0\u5230\u6700\u65b0\u7248\u672c**\u4ee5\u83b7\u5f97\u66f4\u597d\u7684\u5f00\u7bb1\u4f53\u9a8c\uff1a\n \t```\n\tpip3 install -U pkuseg\n\t```\n2. \u5982\u679cPyPI\u5b98\u65b9\u6e90\u4e0b\u8f7d\u901f\u5ea6\u4e0d\u7406\u60f3\uff0c\u5efa\u8bae\u4f7f\u7528\u955c\u50cf\u6e90\uff0c\u6bd4\u5982\uff1a \n \u521d\u6b21\u5b89\u88c5\uff1a\n\t```\n\tpip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple pkuseg\n\t```\n \u66f4\u65b0\uff1a\n\t```\n\tpip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple -U pkuseg\n\t```\n \n3. \u5982\u679c\u4e0d\u4f7f\u7528pip\u5b89\u88c5\u65b9\u5f0f\uff0c\u9009\u62e9\u4eceGitHub\u4e0b\u8f7d\uff0c\u53ef\u8fd0\u884c\u4ee5\u4e0b\u547d\u4ee4\u5b89\u88c5\uff1a\n\t```\n\tpython setup.py build_ext -i\n\t```\n\t\n GitHub\u7684\u4ee3\u7801\u5e76\u4e0d\u5305\u62ec\u9884\u8bad\u7ec3\u6a21\u578b\uff0c\u56e0\u6b64\u9700\u8981\u7528\u6237\u81ea\u884c\u4e0b\u8f7d\u6216\u8bad\u7ec3\u6a21\u578b\uff0c\u9884\u8bad\u7ec3\u6a21\u578b\u53ef\u8be6\u89c1[release](https://github.com/lancopku/pkuseg-python/releases)\u3002\u4f7f\u7528\u65f6\u9700\u8bbe\u5b9a\"model_name\"\u4e3a\u6a21\u578b\u6587\u4ef6\u3002\n\n\u6ce8\u610f\uff1a**\u5b89\u88c5\u65b9\u5f0f1\u548c2\u76ee\u524d\u4ec5\u652f\u6301linux(ubuntu)\u3001mac\u3001windows 64 \u4f4d\u7684python3\u7248\u672c**\u3002\u5982\u679c\u975e\u4ee5\u4e0a\u7cfb\u7edf\uff0c\u8bf7\u4f7f\u7528\u5b89\u88c5\u65b9\u5f0f3\u8fdb\u884c\u672c\u5730\u7f16\u8bd1\u5b89\u88c5\u3002\n\t\n\n## \u5404\u7c7b\u5206\u8bcd\u5de5\u5177\u5305\u7684\u6027\u80fd\u5bf9\u6bd4\n\n\u6211\u4eec\u9009\u62e9jieba\u3001THULAC\u7b49\u56fd\u5185\u4ee3\u8868\u5206\u8bcd\u5de5\u5177\u5305\u4e0epkuseg\u505a\u6027\u80fd\u6bd4\u8f83\uff0c\u8be6\u7ec6\u8bbe\u7f6e\u53ef\u53c2\u8003[\u5b9e\u9a8c\u73af\u5883](readme/environment.md)\u3002\n\n\n\n#### \u7ec6\u9886\u57df\u8bad\u7ec3\u53ca\u6d4b\u8bd5\u7ed3\u679c\n\n\u4ee5\u4e0b\u662f\u5728\u4e0d\u540c\u6570\u636e\u96c6\u4e0a\u7684\u5bf9\u6bd4\u7ed3\u679c\uff1a\n\n| MSRA | Precision | Recall | F-score |\n| :----- | --------: | -----: | --------: |\n| jieba | 87.01 | 89.88 | 88.42 |\n| THULAC | 95.60 | 95.91 | 95.71 |\n| pkuseg | 96.94 | 96.81 | **96.88** |\n\n\n| WEIBO | Precision | Recall | F-score |\n| :----- | --------: | -----: | --------: |\n| jieba | 87.79 | 87.54 | 87.66 |\n| THULAC | 93.40 | 92.40 | 92.87 |\n| pkuseg | 93.78 | 94.65 | **94.21** |\n\n\n\n\n#### \u9ed8\u8ba4\u6a21\u578b\u5728\u4e0d\u540c\u9886\u57df\u7684\u6d4b\u8bd5\u6548\u679c\n\n\u8003\u8651\u5230\u5f88\u591a\u7528\u6237\u5728\u5c1d\u8bd5\u5206\u8bcd\u5de5\u5177\u7684\u65f6\u5019\uff0c\u5927\u591a\u6570\u65f6\u5019\u4f1a\u4f7f\u7528\u5de5\u5177\u5305\u81ea\u5e26\u6a21\u578b\u6d4b\u8bd5\u3002\u4e3a\u4e86\u76f4\u63a5\u5bf9\u6bd4\u201c\u521d\u59cb\u201d\u6027\u80fd\uff0c\u6211\u4eec\u4e5f\u6bd4\u8f83\u4e86\u5404\u4e2a\u5de5\u5177\u5305\u7684\u9ed8\u8ba4\u6a21\u578b\u5728\u4e0d\u540c\u9886\u57df\u7684\u6d4b\u8bd5\u6548\u679c\u3002\u8bf7\u6ce8\u610f\uff0c\u8fd9\u6837\u7684\u6bd4\u8f83\u53ea\u662f\u4e3a\u4e86\u8bf4\u660e\u9ed8\u8ba4\u60c5\u51b5\u4e0b\u7684\u6548\u679c\uff0c\u5e76\u4e0d\u4e00\u5b9a\u662f\u516c\u5e73\u7684\u3002\n\n| Default | MSRA | CTB8 | PKU | WEIBO | All Average |\n| ------- | :---: | :---: | :---: | :---: | :---------: |\n| jieba | 81.45 | 79.58 | 81.83 | 83.56 | 81.61 |\n| THULAC |\t85.55 | 87.84 | 92.29 | 86.65 | 88.08 |\n| pkuseg | 87.29 | 91.77 | 92.68 | 93.43 | **91.29** |\n\n\u5176\u4e2d\uff0c`All Average`\u663e\u793a\u7684\u662f\u5728\u6240\u6709\u6d4b\u8bd5\u96c6\u4e0aF-score\u7684\u5e73\u5747\u3002\n\n\u66f4\u591a\u8be6\u7ec6\u6bd4\u8f83\u53ef\u53c2\u89c1[\u548c\u73b0\u6709\u5de5\u5177\u5305\u7684\u6bd4\u8f83](readme/comparison.md)\u3002\n\n## \u4f7f\u7528\u65b9\u5f0f\n\n#### \u4ee3\u7801\u793a\u4f8b\n\n\u4ee5\u4e0b\u4ee3\u7801\u793a\u4f8b\u9002\u7528\u4e8epython\u4ea4\u4e92\u5f0f\u73af\u5883\u3002\n\n\u4ee3\u7801\u793a\u4f8b1\uff1a\u4f7f\u7528\u9ed8\u8ba4\u914d\u7f6e\u8fdb\u884c\u5206\u8bcd\uff08**\u5982\u679c\u7528\u6237\u65e0\u6cd5\u786e\u5b9a\u5206\u8bcd\u9886\u57df\uff0c\u63a8\u8350\u4f7f\u7528\u9ed8\u8ba4\u6a21\u578b\u5206\u8bcd**\uff09\n```python3\nimport pkuseg\n\nseg = pkuseg.pkuseg() # \u4ee5\u9ed8\u8ba4\u914d\u7f6e\u52a0\u8f7d\u6a21\u578b\ntext = seg.cut('\u6211\u7231\u5317\u4eac\u5929\u5b89\u95e8') # \u8fdb\u884c\u5206\u8bcd\nprint(text)\n```\n\n\u4ee3\u7801\u793a\u4f8b2\uff1a\u7ec6\u9886\u57df\u5206\u8bcd\uff08**\u5982\u679c\u7528\u6237\u660e\u786e\u5206\u8bcd\u9886\u57df\uff0c\u63a8\u8350\u4f7f\u7528\u7ec6\u9886\u57df\u6a21\u578b\u5206\u8bcd**\uff09\n```python3\nimport pkuseg\n\nseg = pkuseg.pkuseg(model_name='medicine') # \u7a0b\u5e8f\u4f1a\u81ea\u52a8\u4e0b\u8f7d\u6240\u5bf9\u5e94\u7684\u7ec6\u9886\u57df\u6a21\u578b\ntext = seg.cut('\u6211\u7231\u5317\u4eac\u5929\u5b89\u95e8') # \u8fdb\u884c\u5206\u8bcd\nprint(text)\n```\n\n\u4ee3\u7801\u793a\u4f8b3\uff1a\u5206\u8bcd\u540c\u65f6\u8fdb\u884c\u8bcd\u6027\u6807\u6ce8\uff0c\u5404\u8bcd\u6027\u6807\u7b7e\u7684\u8be6\u7ec6\u542b\u4e49\u53ef\u53c2\u8003 [tags.txt](https://github.com/lancopku/pkuseg-python/blob/master/tags.txt)\n```python3\nimport pkuseg\n\nseg = pkuseg.pkuseg(postag=True) # \u5f00\u542f\u8bcd\u6027\u6807\u6ce8\u529f\u80fd\ntext = seg.cut('\u6211\u7231\u5317\u4eac\u5929\u5b89\u95e8') # \u8fdb\u884c\u5206\u8bcd\u548c\u8bcd\u6027\u6807\u6ce8\nprint(text)\n```\n\n\n\u4ee3\u7801\u793a\u4f8b4\uff1a\u5bf9\u6587\u4ef6\u5206\u8bcd\n```python3\nimport pkuseg\n\n# \u5bf9input.txt\u7684\u6587\u4ef6\u5206\u8bcd\u8f93\u51fa\u5230output.txt\u4e2d\n# \u5f0020\u4e2a\u8fdb\u7a0b\npkuseg.test('input.txt', 'output.txt', nthread=20) \n```\n\n\u5176\u4ed6\u4f7f\u7528\u793a\u4f8b\u53ef\u53c2\u89c1[\u8be6\u7ec6\u4ee3\u7801\u793a\u4f8b](readme/interface.md)\u3002\n\n\n\n#### \u53c2\u6570\u8bf4\u660e\n\n\u6a21\u578b\u914d\u7f6e\n```\npkuseg.pkuseg(model_name = \"default\", user_dict = \"default\", postag = False)\n\tmodel_name\t\t\u6a21\u578b\u8def\u5f84\u3002\n\t\t\t \"default\"\uff0c\u9ed8\u8ba4\u53c2\u6570\uff0c\u8868\u793a\u4f7f\u7528\u6211\u4eec\u9884\u8bad\u7ec3\u597d\u7684\u6df7\u5408\u9886\u57df\u6a21\u578b(\u4ec5\u5bf9pip\u4e0b\u8f7d\u7684\u7528\u6237)\u3002\n\t\t\t\t\"news\", \u4f7f\u7528\u65b0\u95fb\u9886\u57df\u6a21\u578b\u3002\n\t\t\t\t\"web\", \u4f7f\u7528\u7f51\u7edc\u9886\u57df\u6a21\u578b\u3002\n\t\t\t\t\"medicine\", \u4f7f\u7528\u533b\u836f\u9886\u57df\u6a21\u578b\u3002\n\t\t\t\t\"tourism\", \u4f7f\u7528\u65c5\u6e38\u9886\u57df\u6a21\u578b\u3002\n\t\t\t model_path, \u4ece\u7528\u6237\u6307\u5b9a\u8def\u5f84\u52a0\u8f7d\u6a21\u578b\u3002\n\tuser_dict\t\t\u8bbe\u7f6e\u7528\u6237\u8bcd\u5178\u3002\n\t\t\t\t\"default\", \u9ed8\u8ba4\u53c2\u6570\uff0c\u4f7f\u7528\u6211\u4eec\u63d0\u4f9b\u7684\u8bcd\u5178\u3002\n\t\t\t\tNone, \u4e0d\u4f7f\u7528\u8bcd\u5178\u3002\n\t\t\t\tdict_path, \u5728\u4f7f\u7528\u9ed8\u8ba4\u8bcd\u5178\u7684\u540c\u65f6\u4f1a\u989d\u5916\u4f7f\u7528\u7528\u6237\u81ea\u5b9a\u4e49\u8bcd\u5178\uff0c\u53ef\u4ee5\u586b\u81ea\u5df1\u7684\u7528\u6237\u8bcd\u5178\u7684\u8def\u5f84\uff0c\u8bcd\u5178\u683c\u5f0f\u4e3a\u4e00\u884c\u4e00\u4e2a\u8bcd\uff08\u5982\u679c\u9009\u62e9\u8fdb\u884c\u8bcd\u6027\u6807\u6ce8\u5e76\u4e14\u5df2\u77e5\u8be5\u8bcd\u7684\u8bcd\u6027\uff0c\u5219\u5728\u8be5\u884c\u5199\u4e0b\u8bcd\u548c\u8bcd\u6027\uff0c\u4e2d\u95f4\u7528tab\u5b57\u7b26\u9694\u5f00\uff09\u3002\n\tpostag\t\t \u662f\u5426\u8fdb\u884c\u8bcd\u6027\u5206\u6790\u3002\n\t\t\t\tFalse, \u9ed8\u8ba4\u53c2\u6570\uff0c\u53ea\u8fdb\u884c\u5206\u8bcd\uff0c\u4e0d\u8fdb\u884c\u8bcd\u6027\u6807\u6ce8\u3002\n\t\t\t\tTrue, \u4f1a\u5728\u5206\u8bcd\u7684\u540c\u65f6\u8fdb\u884c\u8bcd\u6027\u6807\u6ce8\u3002\n```\n\n\u5bf9\u6587\u4ef6\u8fdb\u884c\u5206\u8bcd\n```\npkuseg.test(readFile, outputFile, model_name = \"default\", user_dict = \"default\", postag = False, nthread = 10)\n\treadFile\t\t\u8f93\u5165\u6587\u4ef6\u8def\u5f84\u3002\n\toutputFile\t\t\u8f93\u51fa\u6587\u4ef6\u8def\u5f84\u3002\n\tmodel_name\t\t\u6a21\u578b\u8def\u5f84\u3002\u540cpkuseg.pkuseg\n\tuser_dict\t\t\u8bbe\u7f6e\u7528\u6237\u8bcd\u5178\u3002\u540cpkuseg.pkuseg\n\tpostag\t\t\t\u8bbe\u7f6e\u662f\u5426\u5f00\u542f\u8bcd\u6027\u5206\u6790\u529f\u80fd\u3002\u540cpkuseg.pkuseg\n\tnthread\t\t\t\u6d4b\u8bd5\u65f6\u5f00\u7684\u8fdb\u7a0b\u6570\u3002\n```\n\n\u6a21\u578b\u8bad\u7ec3\n```\npkuseg.train(trainFile, testFile, savedir, train_iter = 20, init_model = None)\n\ttrainFile\t\t\u8bad\u7ec3\u6587\u4ef6\u8def\u5f84\u3002\n\ttestFile\t\t\u6d4b\u8bd5\u6587\u4ef6\u8def\u5f84\u3002\n\tsavedir\t\t\t\u8bad\u7ec3\u6a21\u578b\u7684\u4fdd\u5b58\u8def\u5f84\u3002\n\ttrain_iter\t\t\u8bad\u7ec3\u8f6e\u6570\u3002\n\tinit_model\t\t\u521d\u59cb\u5316\u6a21\u578b\uff0c\u9ed8\u8ba4\u4e3aNone\u8868\u793a\u4f7f\u7528\u9ed8\u8ba4\u521d\u59cb\u5316\uff0c\u7528\u6237\u53ef\u4ee5\u586b\u81ea\u5df1\u60f3\u8981\u521d\u59cb\u5316\u7684\u6a21\u578b\u7684\u8def\u5f84\u5982init_model='./models/'\u3002\n```\n\n\n\n#### \u591a\u8fdb\u7a0b\u5206\u8bcd\n\n\u5f53\u5c06\u4ee5\u4e0a\u4ee3\u7801\u793a\u4f8b\u7f6e\u4e8e\u6587\u4ef6\u4e2d\u8fd0\u884c\u65f6\uff0c\u5982\u6d89\u53ca\u591a\u8fdb\u7a0b\u529f\u80fd\uff0c\u8bf7\u52a1\u5fc5\u4f7f\u7528`if __name__ == '__main__'`\u4fdd\u62a4\u5168\u5c40\u8bed\u53e5\uff0c\u8be6\u89c1[\u591a\u8fdb\u7a0b\u5206\u8bcd](readme/multiprocess.md)\u3002\n\n\n\n## \u9884\u8bad\u7ec3\u6a21\u578b\n\n\u4ecepip\u5b89\u88c5\u7684\u7528\u6237\u5728\u4f7f\u7528\u7ec6\u9886\u57df\u5206\u8bcd\u529f\u80fd\u65f6\uff0c\u53ea\u9700\u8981\u8bbe\u7f6emodel_name\u5b57\u6bb5\u4e3a\u5bf9\u5e94\u7684\u9886\u57df\u5373\u53ef\uff0c\u4f1a\u81ea\u52a8\u4e0b\u8f7d\u5bf9\u5e94\u7684\u7ec6\u9886\u57df\u6a21\u578b\u3002\n\n\u4ecegithub\u4e0b\u8f7d\u7684\u7528\u6237\u5219\u9700\u8981\u81ea\u5df1\u4e0b\u8f7d\u5bf9\u5e94\u7684\u9884\u8bad\u7ec3\u6a21\u578b\uff0c\u5e76\u8bbe\u7f6emodel_name\u5b57\u6bb5\u4e3a\u9884\u8bad\u7ec3\u6a21\u578b\u8def\u5f84\u3002\u9884\u8bad\u7ec3\u6a21\u578b\u53ef\u4ee5\u5728[release](https://github.com/lancopku/pkuseg-python/releases)\u90e8\u5206\u4e0b\u8f7d\u3002\u4ee5\u4e0b\u662f\u5bf9\u9884\u8bad\u7ec3\u6a21\u578b\u7684\u8bf4\u660e\uff1a\n\n- **news**: \u5728MSRA\uff08\u65b0\u95fb\u8bed\u6599\uff09\u4e0a\u8bad\u7ec3\u7684\u6a21\u578b\u3002\n\n- **web**: \u5728\u5fae\u535a\uff08\u7f51\u7edc\u6587\u672c\u8bed\u6599\uff09\u4e0a\u8bad\u7ec3\u7684\u6a21\u578b\u3002\n\n- **medicine**: \u5728\u533b\u836f\u9886\u57df\u4e0a\u8bad\u7ec3\u7684\u6a21\u578b\u3002\n\n- **tourism**: \u5728\u65c5\u6e38\u9886\u57df\u4e0a\u8bad\u7ec3\u7684\u6a21\u578b\u3002\n\n- **mixed**: \u6df7\u5408\u6570\u636e\u96c6\u8bad\u7ec3\u7684\u901a\u7528\u6a21\u578b\u3002\u968fpip\u5305\u9644\u5e26\u7684\u662f\u6b64\u6a21\u578b\u3002\n\n\n\n\u6b22\u8fce\u66f4\u591a\u7528\u6237\u53ef\u4ee5\u5206\u4eab\u81ea\u5df1\u8bad\u7ec3\u597d\u7684\u7ec6\u5206\u9886\u57df\u6a21\u578b\u3002\n\n\n\n## \u7248\u672c\u5386\u53f2\n\n\u8be6\u89c1[\u7248\u672c\u5386\u53f2](readme/history.md)\u3002\n\n\n## \u5f00\u6e90\u534f\u8bae\n1. \u672c\u4ee3\u7801\u91c7\u7528MIT\u8bb8\u53ef\u8bc1\u3002\n2. \u6b22\u8fce\u5bf9\u8be5\u5de5\u5177\u5305\u63d0\u51fa\u4efb\u4f55\u5b9d\u8d35\u610f\u89c1\u548c\u5efa\u8bae\uff0c\u8bf7\u53d1\u90ae\u4ef6\u81f3jingjingxu@pku.edu.cn\u3002\n\n\n\n## \u8bba\u6587\u5f15\u7528\n\n\u8be5\u4ee3\u7801\u5305\u4e3b\u8981\u57fa\u4e8e\u4ee5\u4e0b\u79d1\u7814\u8bba\u6587\uff0c\u5982\u4f7f\u7528\u4e86\u672c\u5de5\u5177\uff0c\u8bf7\u5f15\u7528\u4ee5\u4e0b\u8bba\u6587\uff1a\n* Ruixuan Luo, Jingjing Xu, Yi Zhang, Xuancheng Ren, Xu Sun. [PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation](https://arxiv.org/abs/1906.11455). Arxiv. 2019.\n\n```\n\n@article{pkuseg,\n author = {Luo, Ruixuan and Xu, Jingjing and Zhang, Yi and Ren, Xuancheng and Sun, Xu},\n journal = {CoRR},\n title = {PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation.},\n url = {https://arxiv.org/abs/1906.11455},\n volume = {abs/1906.11455},\n year = 2019\n}\n```\n\n## \u5176\u4ed6\u76f8\u5173\u8bba\u6587\n\n* Xu Sun, Houfeng Wang, Wenjie Li. Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection. ACL. 2012.\n* Jingjing Xu and Xu Sun. Dependency-based gated recursive neural network for chinese word segmentation. ACL. 2016.\n* Jingjing Xu and Xu Sun. Transfer learning for low-resource chinese word segmentation with a novel neural network. NLPCC. 2017.\n\n## \u5e38\u89c1\u95ee\u9898\u53ca\u89e3\u7b54\n\n\n1. [\u4e3a\u4ec0\u4e48\u8981\u53d1\u5e03pkuseg\uff1f](https://github.com/lancopku/pkuseg-python/wiki/FAQ#1-\u4e3a\u4ec0\u4e48\u8981\u53d1\u5e03pkuseg)\n2. [pkuseg\u4f7f\u7528\u4e86\u54ea\u4e9b\u6280\u672f\uff1f](https://github.com/lancopku/pkuseg-python/wiki/FAQ#2-pkuseg\u4f7f\u7528\u4e86\u54ea\u4e9b\u6280\u672f)\n3. [\u65e0\u6cd5\u4f7f\u7528\u591a\u8fdb\u7a0b\u5206\u8bcd\u548c\u8bad\u7ec3\u529f\u80fd\uff0c\u63d0\u793aRuntimeError\u548cBrokenPipeError\u3002](https://github.com/lancopku/pkuseg-python/wiki/FAQ#3-\u65e0\u6cd5\u4f7f\u7528\u591a\u8fdb\u7a0b\u5206\u8bcd\u548c\u8bad\u7ec3\u529f\u80fd\u63d0\u793aruntimeerror\u548cbrokenpipeerror)\n4. [\u662f\u5982\u4f55\u8ddf\u5176\u5b83\u5de5\u5177\u5305\u5728\u7ec6\u9886\u57df\u6570\u636e\u4e0a\u8fdb\u884c\u6bd4\u8f83\u7684\uff1f](https://github.com/lancopku/pkuseg-python/wiki/FAQ#4-\u662f\u5982\u4f55\u8ddf\u5176\u5b83\u5de5\u5177\u5305\u5728\u7ec6\u9886\u57df\u6570\u636e\u4e0a\u8fdb\u884c\u6bd4\u8f83\u7684)\n5. [\u5728\u9ed1\u76d2\u6d4b\u8bd5\u96c6\u4e0a\u8fdb\u884c\u6bd4\u8f83\u7684\u8bdd\uff0c\u6548\u679c\u5982\u4f55\uff1f](https://github.com/lancopku/pkuseg-python/wiki/FAQ#5-\u5728\u9ed1\u76d2\u6d4b\u8bd5\u96c6\u4e0a\u8fdb\u884c\u6bd4\u8f83\u7684\u8bdd\u6548\u679c\u5982\u4f55)\n6. [\u5982\u679c\u6211\u4e0d\u4e86\u89e3\u5f85\u5206\u8bcd\u8bed\u6599\u7684\u6240\u5c5e\u9886\u57df\u5462\uff1f](https://github.com/lancopku/pkuseg-python/wiki/FAQ#6-\u5982\u679c\u6211\u4e0d\u4e86\u89e3\u5f85\u5206\u8bcd\u8bed\u6599\u7684\u6240\u5c5e\u9886\u57df\u5462)\n7. [\u5982\u4f55\u770b\u5f85\u5728\u4e00\u4e9b\u7279\u5b9a\u6837\u4f8b\u4e0a\u7684\u5206\u8bcd\u7ed3\u679c\uff1f](https://github.com/lancopku/pkuseg-python/wiki/FAQ#7-\u5982\u4f55\u770b\u5f85\u5728\u4e00\u4e9b\u7279\u5b9a\u6837\u4f8b\u4e0a\u7684\u5206\u8bcd\u7ed3\u679c)\n8. [\u5173\u4e8e\u8fd0\u884c\u901f\u5ea6\u95ee\u9898\uff1f](https://github.com/lancopku/pkuseg-python/wiki/FAQ#8-\u5173\u4e8e\u8fd0\u884c\u901f\u5ea6\u95ee\u9898)\n9. [\u5173\u4e8e\u591a\u8fdb\u7a0b\u901f\u5ea6\u95ee\u9898\uff1f](https://github.com/lancopku/pkuseg-python/wiki/FAQ#9-\u5173\u4e8e\u591a\u8fdb\u7a0b\u901f\u5ea6\u95ee\u9898)\n\n\n## \u81f4\u8c22\n\n\u611f\u8c22\u4fde\u58eb\u6c76\u6559\u6388\uff08\u5317\u4eac\u5927\u5b66\u8ba1\u7b97\u8bed\u8a00\u6240\uff09\u4e0e\u90b1\u7acb\u5764\u535a\u58eb\u63d0\u4f9b\u7684\u8bad\u7ec3\u6570\u636e\u96c6\uff01\n\n## \u4f5c\u8005\n\nRuixuan Luo \uff08\u7f57\u777f\u8f69\uff09, Jingjing Xu\uff08\u8bb8\u6676\u6676\uff09, Xuancheng Ren\uff08\u4efb\u5ba3\u4e1e\uff09, Yi Zhang\uff08\u5f20\u827a\uff09, Bingzhen Wei\uff08\u4f4d\u51b0\u9547\uff09\uff0c Xu Sun \uff08\u5b59\u6829\uff09 \n\n\u5317\u4eac\u5927\u5b66 [\u8bed\u8a00\u8ba1\u7b97\u4e0e\u673a\u5668\u5b66\u4e60\u7814\u7a76\u7ec4](http://lanco.pku.edu.cn/)\n\n\n\n\n\n\n\n\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Chinese word segmentation toolkit for spaCy (fork of pkuseg-python)",
"version": "1.0.0",
"project_urls": {
"Homepage": "https://github.com/explosion/spacy-pkuseg"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e9ea6850962f1eef56ef45226f665fca3e94441a955a93beeed61a0ebe342396",
"md5": "1f55b35a3aa3f4a9c9a899ee71d5f075",
"sha256": "9463788ef0c906bcbc587c379a35fd86a93e9634a49059e60e0a9314537a0364"
},
"downloads": -1,
"filename": "spacy_pkuseg-1.0.0-cp310-cp310-macosx_10_9_x86_64.whl",
"has_sig": false,
"md5_digest": "1f55b35a3aa3f4a9c9a899ee71d5f075",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.9",
"size": 2459701,
"upload_time": "2024-09-04T18:34:43",
"upload_time_iso_8601": "2024-09-04T18:34:43.075201Z",
"url": "https://files.pythonhosted.org/packages/e9/ea/6850962f1eef56ef45226f665fca3e94441a955a93beeed61a0ebe342396/spacy_pkuseg-1.0.0-cp310-cp310-macosx_10_9_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6b037742adf7fd9e74e0040b4f606b5f8c7fcf8b8f53a475e25c2391b81bbafa",
"md5": "6b6f7a5fd55accb95a982194ba891abd",
"sha256": "7108075c345faa6cc7f18628cdd89df78850c2fc850b4c2ef23a324ea437dea3"
},
"downloads": -1,
"filename": "spacy_pkuseg-1.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "6b6f7a5fd55accb95a982194ba891abd",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.9",
"size": 4072314,
"upload_time": "2024-09-04T18:34:45",
"upload_time_iso_8601": "2024-09-04T18:34:45.202836Z",
"url": "https://files.pythonhosted.org/packages/6b/03/7742adf7fd9e74e0040b4f606b5f8c7fcf8b8f53a475e25c2391b81bbafa/spacy_pkuseg-1.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6d4f1ad1c9607f82e5e08df1b873b70f2a235045f9b43845312e87fc6369b5af",
"md5": "b012847430fa52b1440f3c0e3140af00",
"sha256": "d5a088602217a7e68ec3a98d73082315c3161f7e48bf7fd4295b4f5e22bbd7a8"
},
"downloads": -1,
"filename": "spacy_pkuseg-1.0.0-cp310-cp310-musllinux_1_2_x86_64.whl",
"has_sig": false,
"md5_digest": "b012847430fa52b1440f3c0e3140af00",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.9",
"size": 5107640,
"upload_time": "2024-09-04T18:34:47",
"upload_time_iso_8601": "2024-09-04T18:34:47.000767Z",
"url": "https://files.pythonhosted.org/packages/6d/4f/1ad1c9607f82e5e08df1b873b70f2a235045f9b43845312e87fc6369b5af/spacy_pkuseg-1.0.0-cp310-cp310-musllinux_1_2_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8d4e38c257cd9d59a55e75a3857301fb3e4c29b136daaa45bdd3ac9df65faa8b",
"md5": "8f1e92c802c4c9ac984676db5e2410a8",
"sha256": "a32760d9df0412ac11fe01b2b0aea57c6fd32b9ba081a06b4f3841e6c360ae93"
},
"downloads": -1,
"filename": "spacy_pkuseg-1.0.0-cp310-cp310-win_amd64.whl",
"has_sig": false,
"md5_digest": "8f1e92c802c4c9ac984676db5e2410a8",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.9",
"size": 2412030,
"upload_time": "2024-09-04T18:34:48",
"upload_time_iso_8601": "2024-09-04T18:34:48.716685Z",
"url": "https://files.pythonhosted.org/packages/8d/4e/38c257cd9d59a55e75a3857301fb3e4c29b136daaa45bdd3ac9df65faa8b/spacy_pkuseg-1.0.0-cp310-cp310-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "0474908e883df7cbfa288b9df47e396dee0b66e459dcecd1de087b29c812ba11",
"md5": "c79e930dddc314d18d88a6cc0c64e668",
"sha256": "e8a80894b6faf8cb73bec19918fb8b2cd4e5eb54a5c1e75ddd285a9d629a1953"
},
"downloads": -1,
"filename": "spacy_pkuseg-1.0.0-cp311-cp311-macosx_10_9_x86_64.whl",
"has_sig": false,
"md5_digest": "c79e930dddc314d18d88a6cc0c64e668",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.9",
"size": 2459976,
"upload_time": "2024-09-04T18:34:50",
"upload_time_iso_8601": "2024-09-04T18:34:50.037125Z",
"url": "https://files.pythonhosted.org/packages/04/74/908e883df7cbfa288b9df47e396dee0b66e459dcecd1de087b29c812ba11/spacy_pkuseg-1.0.0-cp311-cp311-macosx_10_9_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1b3979742bf28c3119a66f42c3be0ff8dac84c3178e5f2dfc78a7ab5f784953a",
"md5": "b7aec4b8a58a0650d5d617a395b8c655",
"sha256": "ecfd222c3a2f97724336a0fd6635315853697fd4c32b005facea6b58f305d961"
},
"downloads": -1,
"filename": "spacy_pkuseg-1.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "b7aec4b8a58a0650d5d617a395b8c655",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.9",
"size": 4225122,
"upload_time": "2024-09-04T18:34:51",
"upload_time_iso_8601": "2024-09-04T18:34:51.821362Z",
"url": "https://files.pythonhosted.org/packages/1b/39/79742bf28c3119a66f42c3be0ff8dac84c3178e5f2dfc78a7ab5f784953a/spacy_pkuseg-1.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7678326ffe802d97cc218aaf0348ddebf4077d31447aa212c20374d162f53fed",
"md5": "4deb2881da15074870b70a796809c0d2",
"sha256": "ec1e87c4d3cca440b354632364e571617916a8d94b1fc06d7c2edb92c1dc12cb"
},
"downloads": -1,
"filename": "spacy_pkuseg-1.0.0-cp311-cp311-musllinux_1_2_x86_64.whl",
"has_sig": false,
"md5_digest": "4deb2881da15074870b70a796809c0d2",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.9",
"size": 5284826,
"upload_time": "2024-09-04T18:34:53",
"upload_time_iso_8601": "2024-09-04T18:34:53.807751Z",
"url": "https://files.pythonhosted.org/packages/76/78/326ffe802d97cc218aaf0348ddebf4077d31447aa212c20374d162f53fed/spacy_pkuseg-1.0.0-cp311-cp311-musllinux_1_2_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a0b707c159266ced53b7e56d823aa3c2e304ab9726f56c214a901600c9ec94d2",
"md5": "3c4922afe9d3b76454c13b5d993dd864",
"sha256": "4317e33656b9d65fe19687ec9a7b978c1a1a7be5813e6bec722eeba1084f2744"
},
"downloads": -1,
"filename": "spacy_pkuseg-1.0.0-cp311-cp311-win_amd64.whl",
"has_sig": false,
"md5_digest": "3c4922afe9d3b76454c13b5d993dd864",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.9",
"size": 2413959,
"upload_time": "2024-09-04T18:34:55",
"upload_time_iso_8601": "2024-09-04T18:34:55.590218Z",
"url": "https://files.pythonhosted.org/packages/a0/b7/07c159266ced53b7e56d823aa3c2e304ab9726f56c214a901600c9ec94d2/spacy_pkuseg-1.0.0-cp311-cp311-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7b24b21cd80975b5ca7241e5f6b7fbefbf76ec80822e35aa864a14857331b47e",
"md5": "c960a96f04fa94a883d2f79d30d4a444",
"sha256": "5dfd5a9ed53bacb84e1e4fad16c532c965b49632dcea4dee390dc2bb59bc00d6"
},
"downloads": -1,
"filename": "spacy_pkuseg-1.0.0-cp312-cp312-macosx_10_9_x86_64.whl",
"has_sig": false,
"md5_digest": "c960a96f04fa94a883d2f79d30d4a444",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": ">=3.9",
"size": 2456943,
"upload_time": "2024-09-04T18:34:57",
"upload_time_iso_8601": "2024-09-04T18:34:57.312734Z",
"url": "https://files.pythonhosted.org/packages/7b/24/b21cd80975b5ca7241e5f6b7fbefbf76ec80822e35aa864a14857331b47e/spacy_pkuseg-1.0.0-cp312-cp312-macosx_10_9_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4178cf266d0ea4dee349ca3c6bce86e6323e83fc4da711c69d89d0db9062ed54",
"md5": "20c59a93c6c24bd1123733234b9cdde0",
"sha256": "1c1e12acb4135b22d459bc24a63ffe979cdd8570df42d164323ab8a93dd7125b"
},
"downloads": -1,
"filename": "spacy_pkuseg-1.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "20c59a93c6c24bd1123733234b9cdde0",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": ">=3.9",
"size": 4188852,
"upload_time": "2024-09-04T18:34:58",
"upload_time_iso_8601": "2024-09-04T18:34:58.981887Z",
"url": "https://files.pythonhosted.org/packages/41/78/cf266d0ea4dee349ca3c6bce86e6323e83fc4da711c69d89d0db9062ed54/spacy_pkuseg-1.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "dff68596a741994fb3ef8d7fe93a04aca741d9bbcc0da78169292284daa04d29",
"md5": "f30480001f8817b1cc6abf8eba98badf",
"sha256": "703a429559583e8b9819836aefb24c01c0b86e757d1cc6838c7fcc11ef3b6f28"
},
"downloads": -1,
"filename": "spacy_pkuseg-1.0.0-cp312-cp312-musllinux_1_2_x86_64.whl",
"has_sig": false,
"md5_digest": "f30480001f8817b1cc6abf8eba98badf",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": ">=3.9",
"size": 5221304,
"upload_time": "2024-09-04T18:35:00",
"upload_time_iso_8601": "2024-09-04T18:35:00.623613Z",
"url": "https://files.pythonhosted.org/packages/df/f6/8596a741994fb3ef8d7fe93a04aca741d9bbcc0da78169292284daa04d29/spacy_pkuseg-1.0.0-cp312-cp312-musllinux_1_2_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b1a7a6c5ba96e22d3609d23610514667d9de8838885698e00f26c15859b8a43d",
"md5": "8b66ac35c1261d91ad2ef04ed3906cd0",
"sha256": "2f3932c65b5dbbbdd23f6332e13102bd7a00a1563fad6893be73e32a876cd4cc"
},
"downloads": -1,
"filename": "spacy_pkuseg-1.0.0-cp312-cp312-win_amd64.whl",
"has_sig": false,
"md5_digest": "8b66ac35c1261d91ad2ef04ed3906cd0",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": ">=3.9",
"size": 2407747,
"upload_time": "2024-09-04T18:35:02",
"upload_time_iso_8601": "2024-09-04T18:35:02.351648Z",
"url": "https://files.pythonhosted.org/packages/b1/a7/a6c5ba96e22d3609d23610514667d9de8838885698e00f26c15859b8a43d/spacy_pkuseg-1.0.0-cp312-cp312-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d45b8fefee884280cf17b4c9ee1cab1f099cc5eb5bcc0229e4459a5982648d0c",
"md5": "f7246f5f9e4692c6d3e3a2e85714720d",
"sha256": "d9f02792bc91806aeeca855f1e38f621a4b5f0b03dcf110999ae118b571ae111"
},
"downloads": -1,
"filename": "spacy_pkuseg-1.0.0-cp39-cp39-macosx_10_9_x86_64.whl",
"has_sig": false,
"md5_digest": "f7246f5f9e4692c6d3e3a2e85714720d",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 2460878,
"upload_time": "2024-09-04T18:35:03",
"upload_time_iso_8601": "2024-09-04T18:35:03.925880Z",
"url": "https://files.pythonhosted.org/packages/d4/5b/8fefee884280cf17b4c9ee1cab1f099cc5eb5bcc0229e4459a5982648d0c/spacy_pkuseg-1.0.0-cp39-cp39-macosx_10_9_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "921cecf544a3a7aeee45c35bd18df8a5609c4065883a2fb9276ab6e64561d16c",
"md5": "9a5e0df79906f79658bcbb7d438a4f93",
"sha256": "6a4c69766ef4604d63bbca4088c2e3c94be3f19c5edd3766603b146c03028c99"
},
"downloads": -1,
"filename": "spacy_pkuseg-1.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "9a5e0df79906f79658bcbb7d438a4f93",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 4078234,
"upload_time": "2024-09-04T18:35:05",
"upload_time_iso_8601": "2024-09-04T18:35:05.175758Z",
"url": "https://files.pythonhosted.org/packages/92/1c/ecf544a3a7aeee45c35bd18df8a5609c4065883a2fb9276ab6e64561d16c/spacy_pkuseg-1.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "8d6bbffc0094afbf1ca0d1604542e90c3625608aaf273576d1d502d30893552a",
"md5": "86448f15fdb104add1cb72868596da3c",
"sha256": "ca18420c370f768ed71c39f4d76db6c33fb224ddb7dcfc0378076e1bdb6e8d17"
},
"downloads": -1,
"filename": "spacy_pkuseg-1.0.0-cp39-cp39-musllinux_1_2_x86_64.whl",
"has_sig": false,
"md5_digest": "86448f15fdb104add1cb72868596da3c",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 5111454,
"upload_time": "2024-09-04T18:35:06",
"upload_time_iso_8601": "2024-09-04T18:35:06.602379Z",
"url": "https://files.pythonhosted.org/packages/8d/6b/bffc0094afbf1ca0d1604542e90c3625608aaf273576d1d502d30893552a/spacy_pkuseg-1.0.0-cp39-cp39-musllinux_1_2_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "02d0ad01cc6cca5aaf5637d517fa09de278ace874f9f4abf7ac394f3d18dc137",
"md5": "33b8cecfcedbc881d21e59d6c8fbf3fb",
"sha256": "c31741ad2627b6fa9938765ab6b260e7b4aa97f075e6e000797070f838f508f9"
},
"downloads": -1,
"filename": "spacy_pkuseg-1.0.0-cp39-cp39-win_amd64.whl",
"has_sig": false,
"md5_digest": "33b8cecfcedbc881d21e59d6c8fbf3fb",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 2412977,
"upload_time": "2024-09-04T18:35:08",
"upload_time_iso_8601": "2024-09-04T18:35:08.362124Z",
"url": "https://files.pythonhosted.org/packages/02/d0/ad01cc6cca5aaf5637d517fa09de278ace874f9f4abf7ac394f3d18dc137/spacy_pkuseg-1.0.0-cp39-cp39-win_amd64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a733c2370bbe09daf655332a34b263a0e6279e630b30b57364438381a511f964",
"md5": "80d20ba10209017bcf70471c2e3eb9a7",
"sha256": "33531ea8e13fc09ebe3b40bd97e84d07ccd5a1fe67fa8e84173769a25ac03158"
},
"downloads": -1,
"filename": "spacy_pkuseg-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "80d20ba10209017bcf70471c2e3eb9a7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 2136735,
"upload_time": "2024-09-04T18:35:09",
"upload_time_iso_8601": "2024-09-04T18:35:09.928154Z",
"url": "https://files.pythonhosted.org/packages/a7/33/c2370bbe09daf655332a34b263a0e6279e630b30b57364438381a511f964/spacy_pkuseg-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-04 18:35:09",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "explosion",
"github_project": "spacy-pkuseg",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "spacy-pkuseg"
}