# jieba3
“结巴 3”中文分词:做最好的 Modern Python 3 中文分词组件
# 与 jieba 的区别
jieba3 是 [jieba](https://github.com/fxsjy/jieba) 分词模块的 Modern Python 3 重构版本
- 删除 Python 2 兼容代码,支持 type hints 等 Modern Python 3 特性
- 重构分词模块,在纯 Python 实现前提下,提高约 **20%** 的性能,且与 jieba 分词结果对齐
- 暂不支持除分词外的其他 jieba 功能,如关键词提取、词性标注等
# 安装说明
jieba3 仅支持 Python 3.10+ 版本
```bash
pip install jieba3
```
# 算法
- 基于前缀词典实现高效的词图扫描,生成句子中汉字所有可能成词情况所构成的有向无环图 (DAG)
- 采用了动态规划查找最大概率路径, 找出基于词频的最大切分组合
- 对于未登录词,采用了基于汉字成词能力的 HMM 模型,使用了 Viterbi 算法
# 主要功能
构建 `jieba3.jieba3` 分词器实例,支持以下参数:
- `model: Literal["base", "small", "large"] = "base"`
- 分词模型选项,可选值为 `small`、`base`、`large`,默认为 `base`
- `base` 模型是 jieba 提供的默认模型
- `small` 模型是 jieba 提供的占用内存较小的模型
- `large` 模型是 jieba 支持繁体分词更好的模型
- `use_hmm: bool = True`
- 是否开启 HMM 新词发现,可选值为 `True`、`False`,默认为 `True`
示例如下:
```python
import jieba3
tokenizer = jieba3.jieba3() # 默认为 base 模型,开启 HMM 新词发现
tokenizer = jieba3.jieba3(model="small") # 使用 small 模型
tokenizer = jieba3.jieba3(model="base") # 使用 base 模型
tokenizer = jieba3.jieba3(model="large") # 使用 large 模型
tokenizer = jieba3.jieba3(use_hmm=False) # 关闭 HMM 新词发现
tokenizer = jieba3.jieba3(use_hmm=True) # 开启 HMM 新词发现
```
## 文档模式
试图将句子最精确地切开,适合文档分析
> 当使用默认的 `base` 模型时,jieba3 文档模式与 jieba 精确模式的分词结果完全一致
```python
import jieba3
import jieba
# 开启 HMM 新词发现
tokenizer = jieba3.jieba3()
tokenizer.cut_text("小明硕士毕业于中国科学院计算所")
# ["小明", "硕士", "毕业", "于", "中国科学院", "计算所"]
jieba.lcut("小明硕士毕业于中国科学院计算所")
# ["小明", "硕士", "毕业", "于", "中国科学院", "计算所"]
# 关闭 HMM 新词发现
tokenizer = jieba3.jieba3(use_hmm=False)
tokenizer.cut_text("小明硕士毕业于中国科学院计算所")
# ["小", "明", "硕士", "毕业", "于", "中国科学院", "计算所"]
jieba.lcut("小明硕士毕业于中国科学院计算所", HMM=False)
# ["小", "明", "硕士", "毕业", "于", "中国科学院", "计算所"]
```
## 查询模式
在文档模式的基础上,对长词再次切分,提高召回率,适合查询分析
> 当使用默认的 `base` 模型时,jieba3 查询模式与 jieba 搜索引擎模式的分词结果完全一致
```python
import jieba3
import jieba
# 开启 HMM 新词发现
tokenizer = jieba3.jieba3()
tokenizer.cut_query("小明硕士毕业于中国科学院计算所")
# ["小明", "硕士", "毕业", "于", "中国", "科学", "学院", "科学院", "中国科学院", "计算", "计算所"]
jieba.lcut_for_search("小明硕士毕业于中国科学院计算所")
# ["小明", "硕士", "毕业", "于", "中国", "科学", "学院", "科学院", "中国科学院", "计算", "计算所"]
# 关闭 HMM 新词发现
tokenizer = jieba3.jieba3(use_hmm=False)
tokenizer.cut_query("小明硕士毕业于中国科学院计算所")
# ["小", "明", "硕士", "毕业", "于", "中国", "科学", "学院", "科学院", "中国科学院", "计算", "计算所"]
jieba.lcut_for_search("小明硕士毕业于中国科学院计算所", HMM=False)
# ["小", "明", "硕士", "毕业", "于", "中国", "科学", "学院", "科学院", "中国科学院", "计算", "计算所"]
```
# 性能测试
jieba3 均使用默认的 `base` 模型,与 jieba 的默认模型对比
测试环境:MacBookPro18,3,macOS 14.5,Apple M1 Pro @ 3.20 GHz,16 GB
## SIGHAN Bakeoff 2005 测试集(逐行分词)
### `as_test.utf8`(繁体)
| 模式 | jieba 耗时 | jieba 速度 | jieba3 耗时 | jieba3 速度 | 性能提升 |
| -------------------- | ---------- | ---------- | ----------- | ----------- | -------- |
| 文档模式(关闭 HMM) | 0.26 秒 | 2.28 MB/s | 0.20 秒 | 2.94 MB/s | 22% |
| 文档模式(开启 HMM) | 0.60 秒 | 0.98 MB/s | 0.48 秒 | 1.23 MB/s | 20% |
| 查询模式(关闭 HMM) | 0.27 秒 | 2.17 MB/s | 0.21 秒 | 2.79 MB/s | 22% |
| 查询模式(开启 HMM) | 0.63 秒 | 0.93 MB/s | 0.51 秒 | 1.15 MB/s | 20% |
### `cityu_test.utf8`(繁体)
| 模式 | jieba 耗时 | jieba 速度 | jieba3 耗时 | jieba3 速度 | 性能提升 |
| -------------------- | ---------- | ---------- | ----------- | ----------- | -------- |
| 文档模式(关闭 HMM) | 0.09 秒 | 2.22 MB/s | 0.07 秒 | 2.87 MB/s | 23% |
| 文档模式(开启 HMM) | 0.21 秒 | 0.93 MB/s | 0.17 秒 | 1.16 MB/s | 20% |
| 查询模式(关闭 HMM) | 0.09 秒 | 2.11 MB/s | 0.07 秒 | 2.71 MB/s | 22% |
| 查询模式(开启 HMM) | 0.21 秒 | 0.90 MB/s | 0.17 秒 | 1.12 MB/s | 20% |
### `msr_test.utf8`(简体)
| 模式 | jieba 耗时 | jieba 速度 | jieba3 耗时 | jieba3 速度 | 性能提升 |
| -------------------- | ---------- | ---------- | ----------- | ----------- | -------- |
| 文档模式(关闭 HMM) | 0.26 秒 | 2.06 MB/s | 0.20 秒 | 2.69 MB/s | 24% |
| 文档模式(开启 HMM) | 0.30 秒 | 1.79 MB/s | 0.24 秒 | 2.25 MB/s | 20% |
| 查询模式(关闭 HMM) | 0.28 秒 | 1.91 MB/s | 0.22 秒 | 2.47 MB/s | 23% |
| 查询模式(开启 HMM) | 0.32 秒 | 1.67 MB/s | 0.26 秒 | 2.08 MB/s | 20% |
### `pku_test.utf8`(简体)
| 模式 | jieba 耗时 | jieba 速度 | jieba3 耗时 | jieba3 速度 | 性能提升 |
| -------------------- | ---------- | ---------- | ----------- | ----------- | -------- |
| 文档模式(关闭 HMM) | 0.25 秒 | 1.91 MB/s | 0.20 秒 | 2.48 MB/s | 23% |
| 文档模式(开启 HMM) | 0.30 秒 | 1.64 MB/s | 0.24 秒 | 2.04 MB/s | 20% |
| 查询模式(关闭 HMM) | 0.26 秒 | 1.85 MB/s | 0.20 秒 | 2.41 MB/s | 23% |
| 查询模式(开启 HMM) | 0.33 秒 | 1.48 MB/s | 0.27 秒 | 1.82 MB/s | 19% |
## 《围城》(全文分词)
| 模式 | jieba 耗时 | jieba 速度 | jieba3 耗时 | jieba3 速度 | 性能提升 |
| -------------------- | ---------- | ---------- | ----------- | ----------- | -------- |
| 文档模式(关闭 HMM) | 0.35 秒 | 1.85 MB/s | 0.28 秒 | 2.32 MB/s | 20% |
| 文档模式(开启 HMM) | 0.51 秒 | 1.25 MB/s | 0.42 秒 | 1.52 MB/s | 18% |
| 查询模式(关闭 HMM) | 0.33 秒 | 1.93 MB/s | 0.26 秒 | 2.45 MB/s | 21% |
| 查询模式(开启 HMM) | 0.55 秒 | 1.17 MB/s | 0.45 秒 | 1.42 MB/s | 18% |
Raw data
{
"_id": null,
"home_page": null,
"name": "jieba3",
"maintainer": null,
"docs_url": null,
"requires_python": "<4,>=3.10",
"maintainer_email": null,
"keywords": "Chinese, NLP, Analysis, Tokenizer",
"author": null,
"author_email": "Shihong Yan <yansh97@foxmail.com>",
"download_url": "https://files.pythonhosted.org/packages/23/67/8eaae19dd87120dc8a7ba6dff5b24676bd6b92b7570954283edf2bfb5696/jieba3-1.0.2.tar.gz",
"platform": null,
"description": "# jieba3\n\n\u201c\u7ed3\u5df4 3\u201d\u4e2d\u6587\u5206\u8bcd\uff1a\u505a\u6700\u597d\u7684 Modern Python 3 \u4e2d\u6587\u5206\u8bcd\u7ec4\u4ef6\n\n# \u4e0e jieba \u7684\u533a\u522b\n\njieba3 \u662f [jieba](https://github.com/fxsjy/jieba) \u5206\u8bcd\u6a21\u5757\u7684 Modern Python 3 \u91cd\u6784\u7248\u672c\n\n- \u5220\u9664 Python 2 \u517c\u5bb9\u4ee3\u7801\uff0c\u652f\u6301 type hints \u7b49 Modern Python 3 \u7279\u6027\n- \u91cd\u6784\u5206\u8bcd\u6a21\u5757\uff0c\u5728\u7eaf Python \u5b9e\u73b0\u524d\u63d0\u4e0b\uff0c\u63d0\u9ad8\u7ea6 **20%** \u7684\u6027\u80fd\uff0c\u4e14\u4e0e jieba \u5206\u8bcd\u7ed3\u679c\u5bf9\u9f50\n- \u6682\u4e0d\u652f\u6301\u9664\u5206\u8bcd\u5916\u7684\u5176\u4ed6 jieba \u529f\u80fd\uff0c\u5982\u5173\u952e\u8bcd\u63d0\u53d6\u3001\u8bcd\u6027\u6807\u6ce8\u7b49\n\n# \u5b89\u88c5\u8bf4\u660e\n\njieba3 \u4ec5\u652f\u6301 Python 3.10+ \u7248\u672c\n\n```bash\npip install jieba3\n```\n\n# \u7b97\u6cd5\n\n- \u57fa\u4e8e\u524d\u7f00\u8bcd\u5178\u5b9e\u73b0\u9ad8\u6548\u7684\u8bcd\u56fe\u626b\u63cf\uff0c\u751f\u6210\u53e5\u5b50\u4e2d\u6c49\u5b57\u6240\u6709\u53ef\u80fd\u6210\u8bcd\u60c5\u51b5\u6240\u6784\u6210\u7684\u6709\u5411\u65e0\u73af\u56fe (DAG)\n- \u91c7\u7528\u4e86\u52a8\u6001\u89c4\u5212\u67e5\u627e\u6700\u5927\u6982\u7387\u8def\u5f84, \u627e\u51fa\u57fa\u4e8e\u8bcd\u9891\u7684\u6700\u5927\u5207\u5206\u7ec4\u5408\n- \u5bf9\u4e8e\u672a\u767b\u5f55\u8bcd\uff0c\u91c7\u7528\u4e86\u57fa\u4e8e\u6c49\u5b57\u6210\u8bcd\u80fd\u529b\u7684 HMM \u6a21\u578b\uff0c\u4f7f\u7528\u4e86 Viterbi \u7b97\u6cd5\n\n# \u4e3b\u8981\u529f\u80fd\n\n\u6784\u5efa `jieba3.jieba3` \u5206\u8bcd\u5668\u5b9e\u4f8b\uff0c\u652f\u6301\u4ee5\u4e0b\u53c2\u6570\uff1a\n\n- `model: Literal[\"base\", \"small\", \"large\"] = \"base\"`\n - \u5206\u8bcd\u6a21\u578b\u9009\u9879\uff0c\u53ef\u9009\u503c\u4e3a `small`\u3001`base`\u3001`large`\uff0c\u9ed8\u8ba4\u4e3a `base`\n - `base` \u6a21\u578b\u662f jieba \u63d0\u4f9b\u7684\u9ed8\u8ba4\u6a21\u578b\n - `small` \u6a21\u578b\u662f jieba \u63d0\u4f9b\u7684\u5360\u7528\u5185\u5b58\u8f83\u5c0f\u7684\u6a21\u578b\n - `large` \u6a21\u578b\u662f jieba \u652f\u6301\u7e41\u4f53\u5206\u8bcd\u66f4\u597d\u7684\u6a21\u578b\n- `use_hmm: bool = True`\n - \u662f\u5426\u5f00\u542f HMM \u65b0\u8bcd\u53d1\u73b0\uff0c\u53ef\u9009\u503c\u4e3a `True`\u3001`False`\uff0c\u9ed8\u8ba4\u4e3a `True`\n\n\u793a\u4f8b\u5982\u4e0b\uff1a\n\n```python\nimport jieba3\n\ntokenizer = jieba3.jieba3() # \u9ed8\u8ba4\u4e3a base \u6a21\u578b\uff0c\u5f00\u542f HMM \u65b0\u8bcd\u53d1\u73b0\ntokenizer = jieba3.jieba3(model=\"small\") # \u4f7f\u7528 small \u6a21\u578b\ntokenizer = jieba3.jieba3(model=\"base\") # \u4f7f\u7528 base \u6a21\u578b\ntokenizer = jieba3.jieba3(model=\"large\") # \u4f7f\u7528 large \u6a21\u578b\ntokenizer = jieba3.jieba3(use_hmm=False) # \u5173\u95ed HMM \u65b0\u8bcd\u53d1\u73b0\ntokenizer = jieba3.jieba3(use_hmm=True) # \u5f00\u542f HMM \u65b0\u8bcd\u53d1\u73b0\n```\n\n## \u6587\u6863\u6a21\u5f0f\n\n\u8bd5\u56fe\u5c06\u53e5\u5b50\u6700\u7cbe\u786e\u5730\u5207\u5f00\uff0c\u9002\u5408\u6587\u6863\u5206\u6790\n\n> \u5f53\u4f7f\u7528\u9ed8\u8ba4\u7684 `base` \u6a21\u578b\u65f6\uff0cjieba3 \u6587\u6863\u6a21\u5f0f\u4e0e jieba \u7cbe\u786e\u6a21\u5f0f\u7684\u5206\u8bcd\u7ed3\u679c\u5b8c\u5168\u4e00\u81f4\n\n```python\nimport jieba3\nimport jieba\n\n# \u5f00\u542f HMM \u65b0\u8bcd\u53d1\u73b0\n\ntokenizer = jieba3.jieba3()\ntokenizer.cut_text(\"\u5c0f\u660e\u7855\u58eb\u6bd5\u4e1a\u4e8e\u4e2d\u56fd\u79d1\u5b66\u9662\u8ba1\u7b97\u6240\")\n# [\"\u5c0f\u660e\", \"\u7855\u58eb\", \"\u6bd5\u4e1a\", \"\u4e8e\", \"\u4e2d\u56fd\u79d1\u5b66\u9662\", \"\u8ba1\u7b97\u6240\"]\n\njieba.lcut(\"\u5c0f\u660e\u7855\u58eb\u6bd5\u4e1a\u4e8e\u4e2d\u56fd\u79d1\u5b66\u9662\u8ba1\u7b97\u6240\")\n# [\"\u5c0f\u660e\", \"\u7855\u58eb\", \"\u6bd5\u4e1a\", \"\u4e8e\", \"\u4e2d\u56fd\u79d1\u5b66\u9662\", \"\u8ba1\u7b97\u6240\"]\n\n# \u5173\u95ed HMM \u65b0\u8bcd\u53d1\u73b0\n\ntokenizer = jieba3.jieba3(use_hmm=False)\ntokenizer.cut_text(\"\u5c0f\u660e\u7855\u58eb\u6bd5\u4e1a\u4e8e\u4e2d\u56fd\u79d1\u5b66\u9662\u8ba1\u7b97\u6240\")\n# [\"\u5c0f\", \"\u660e\", \"\u7855\u58eb\", \"\u6bd5\u4e1a\", \"\u4e8e\", \"\u4e2d\u56fd\u79d1\u5b66\u9662\", \"\u8ba1\u7b97\u6240\"]\n\njieba.lcut(\"\u5c0f\u660e\u7855\u58eb\u6bd5\u4e1a\u4e8e\u4e2d\u56fd\u79d1\u5b66\u9662\u8ba1\u7b97\u6240\", HMM=False)\n# [\"\u5c0f\", \"\u660e\", \"\u7855\u58eb\", \"\u6bd5\u4e1a\", \"\u4e8e\", \"\u4e2d\u56fd\u79d1\u5b66\u9662\", \"\u8ba1\u7b97\u6240\"]\n```\n\n## \u67e5\u8be2\u6a21\u5f0f\n\n\u5728\u6587\u6863\u6a21\u5f0f\u7684\u57fa\u7840\u4e0a\uff0c\u5bf9\u957f\u8bcd\u518d\u6b21\u5207\u5206\uff0c\u63d0\u9ad8\u53ec\u56de\u7387\uff0c\u9002\u5408\u67e5\u8be2\u5206\u6790\n\n> \u5f53\u4f7f\u7528\u9ed8\u8ba4\u7684 `base` \u6a21\u578b\u65f6\uff0cjieba3 \u67e5\u8be2\u6a21\u5f0f\u4e0e jieba \u641c\u7d22\u5f15\u64ce\u6a21\u5f0f\u7684\u5206\u8bcd\u7ed3\u679c\u5b8c\u5168\u4e00\u81f4\n\n```python\nimport jieba3\nimport jieba\n\n# \u5f00\u542f HMM \u65b0\u8bcd\u53d1\u73b0\n\ntokenizer = jieba3.jieba3()\ntokenizer.cut_query(\"\u5c0f\u660e\u7855\u58eb\u6bd5\u4e1a\u4e8e\u4e2d\u56fd\u79d1\u5b66\u9662\u8ba1\u7b97\u6240\")\n# [\"\u5c0f\u660e\", \"\u7855\u58eb\", \"\u6bd5\u4e1a\", \"\u4e8e\", \"\u4e2d\u56fd\", \"\u79d1\u5b66\", \"\u5b66\u9662\", \"\u79d1\u5b66\u9662\", \"\u4e2d\u56fd\u79d1\u5b66\u9662\", \"\u8ba1\u7b97\", \"\u8ba1\u7b97\u6240\"]\n\njieba.lcut_for_search(\"\u5c0f\u660e\u7855\u58eb\u6bd5\u4e1a\u4e8e\u4e2d\u56fd\u79d1\u5b66\u9662\u8ba1\u7b97\u6240\")\n# [\"\u5c0f\u660e\", \"\u7855\u58eb\", \"\u6bd5\u4e1a\", \"\u4e8e\", \"\u4e2d\u56fd\", \"\u79d1\u5b66\", \"\u5b66\u9662\", \"\u79d1\u5b66\u9662\", \"\u4e2d\u56fd\u79d1\u5b66\u9662\", \"\u8ba1\u7b97\", \"\u8ba1\u7b97\u6240\"]\n\n# \u5173\u95ed HMM \u65b0\u8bcd\u53d1\u73b0\n\ntokenizer = jieba3.jieba3(use_hmm=False)\ntokenizer.cut_query(\"\u5c0f\u660e\u7855\u58eb\u6bd5\u4e1a\u4e8e\u4e2d\u56fd\u79d1\u5b66\u9662\u8ba1\u7b97\u6240\")\n# [\"\u5c0f\", \"\u660e\", \"\u7855\u58eb\", \"\u6bd5\u4e1a\", \"\u4e8e\", \"\u4e2d\u56fd\", \"\u79d1\u5b66\", \"\u5b66\u9662\", \"\u79d1\u5b66\u9662\", \"\u4e2d\u56fd\u79d1\u5b66\u9662\", \"\u8ba1\u7b97\", \"\u8ba1\u7b97\u6240\"]\n\njieba.lcut_for_search(\"\u5c0f\u660e\u7855\u58eb\u6bd5\u4e1a\u4e8e\u4e2d\u56fd\u79d1\u5b66\u9662\u8ba1\u7b97\u6240\", HMM=False)\n# [\"\u5c0f\", \"\u660e\", \"\u7855\u58eb\", \"\u6bd5\u4e1a\", \"\u4e8e\", \"\u4e2d\u56fd\", \"\u79d1\u5b66\", \"\u5b66\u9662\", \"\u79d1\u5b66\u9662\", \"\u4e2d\u56fd\u79d1\u5b66\u9662\", \"\u8ba1\u7b97\", \"\u8ba1\u7b97\u6240\"]\n```\n\n# \u6027\u80fd\u6d4b\u8bd5\n\njieba3 \u5747\u4f7f\u7528\u9ed8\u8ba4\u7684 `base` \u6a21\u578b\uff0c\u4e0e jieba \u7684\u9ed8\u8ba4\u6a21\u578b\u5bf9\u6bd4\n\n\u6d4b\u8bd5\u73af\u5883\uff1aMacBookPro18,3\uff0cmacOS 14.5\uff0cApple M1 Pro @ 3.20 GHz\uff0c16 GB\n\n## SIGHAN Bakeoff 2005 \u6d4b\u8bd5\u96c6\uff08\u9010\u884c\u5206\u8bcd\uff09\n\n### `as_test.utf8`\uff08\u7e41\u4f53\uff09\n\n| \u6a21\u5f0f | jieba \u8017\u65f6 | jieba \u901f\u5ea6 | jieba3 \u8017\u65f6 | jieba3 \u901f\u5ea6 | \u6027\u80fd\u63d0\u5347 |\n| -------------------- | ---------- | ---------- | ----------- | ----------- | -------- |\n| \u6587\u6863\u6a21\u5f0f\uff08\u5173\u95ed HMM\uff09 | 0.26 \u79d2 | 2.28 MB/s | 0.20 \u79d2 | 2.94 MB/s | 22% |\n| \u6587\u6863\u6a21\u5f0f\uff08\u5f00\u542f HMM\uff09 | 0.60 \u79d2 | 0.98 MB/s | 0.48 \u79d2 | 1.23 MB/s | 20% |\n| \u67e5\u8be2\u6a21\u5f0f\uff08\u5173\u95ed HMM\uff09 | 0.27 \u79d2 | 2.17 MB/s | 0.21 \u79d2 | 2.79 MB/s | 22% |\n| \u67e5\u8be2\u6a21\u5f0f\uff08\u5f00\u542f HMM\uff09 | 0.63 \u79d2 | 0.93 MB/s | 0.51 \u79d2 | 1.15 MB/s | 20% |\n\n### `cityu_test.utf8`\uff08\u7e41\u4f53\uff09\n\n| \u6a21\u5f0f | jieba \u8017\u65f6 | jieba \u901f\u5ea6 | jieba3 \u8017\u65f6 | jieba3 \u901f\u5ea6 | \u6027\u80fd\u63d0\u5347 |\n| -------------------- | ---------- | ---------- | ----------- | ----------- | -------- |\n| \u6587\u6863\u6a21\u5f0f\uff08\u5173\u95ed HMM\uff09 | 0.09 \u79d2 | 2.22 MB/s | 0.07 \u79d2 | 2.87 MB/s | 23% |\n| \u6587\u6863\u6a21\u5f0f\uff08\u5f00\u542f HMM\uff09 | 0.21 \u79d2 | 0.93 MB/s | 0.17 \u79d2 | 1.16 MB/s | 20% |\n| \u67e5\u8be2\u6a21\u5f0f\uff08\u5173\u95ed HMM\uff09 | 0.09 \u79d2 | 2.11 MB/s | 0.07 \u79d2 | 2.71 MB/s | 22% |\n| \u67e5\u8be2\u6a21\u5f0f\uff08\u5f00\u542f HMM\uff09 | 0.21 \u79d2 | 0.90 MB/s | 0.17 \u79d2 | 1.12 MB/s | 20% |\n\n### `msr_test.utf8`\uff08\u7b80\u4f53\uff09\n\n| \u6a21\u5f0f | jieba \u8017\u65f6 | jieba \u901f\u5ea6 | jieba3 \u8017\u65f6 | jieba3 \u901f\u5ea6 | \u6027\u80fd\u63d0\u5347 |\n| -------------------- | ---------- | ---------- | ----------- | ----------- | -------- |\n| \u6587\u6863\u6a21\u5f0f\uff08\u5173\u95ed HMM\uff09 | 0.26 \u79d2 | 2.06 MB/s | 0.20 \u79d2 | 2.69 MB/s | 24% |\n| \u6587\u6863\u6a21\u5f0f\uff08\u5f00\u542f HMM\uff09 | 0.30 \u79d2 | 1.79 MB/s | 0.24 \u79d2 | 2.25 MB/s | 20% |\n| \u67e5\u8be2\u6a21\u5f0f\uff08\u5173\u95ed HMM\uff09 | 0.28 \u79d2 | 1.91 MB/s | 0.22 \u79d2 | 2.47 MB/s | 23% |\n| \u67e5\u8be2\u6a21\u5f0f\uff08\u5f00\u542f HMM\uff09 | 0.32 \u79d2 | 1.67 MB/s | 0.26 \u79d2 | 2.08 MB/s | 20% |\n\n### `pku_test.utf8`\uff08\u7b80\u4f53\uff09\n\n| \u6a21\u5f0f | jieba \u8017\u65f6 | jieba \u901f\u5ea6 | jieba3 \u8017\u65f6 | jieba3 \u901f\u5ea6 | \u6027\u80fd\u63d0\u5347 |\n| -------------------- | ---------- | ---------- | ----------- | ----------- | -------- |\n| \u6587\u6863\u6a21\u5f0f\uff08\u5173\u95ed HMM\uff09 | 0.25 \u79d2 | 1.91 MB/s | 0.20 \u79d2 | 2.48 MB/s | 23% |\n| \u6587\u6863\u6a21\u5f0f\uff08\u5f00\u542f HMM\uff09 | 0.30 \u79d2 | 1.64 MB/s | 0.24 \u79d2 | 2.04 MB/s | 20% |\n| \u67e5\u8be2\u6a21\u5f0f\uff08\u5173\u95ed HMM\uff09 | 0.26 \u79d2 | 1.85 MB/s | 0.20 \u79d2 | 2.41 MB/s | 23% |\n| \u67e5\u8be2\u6a21\u5f0f\uff08\u5f00\u542f HMM\uff09 | 0.33 \u79d2 | 1.48 MB/s | 0.27 \u79d2 | 1.82 MB/s | 19% |\n\n## \u300a\u56f4\u57ce\u300b\uff08\u5168\u6587\u5206\u8bcd\uff09\n\n| \u6a21\u5f0f | jieba \u8017\u65f6 | jieba \u901f\u5ea6 | jieba3 \u8017\u65f6 | jieba3 \u901f\u5ea6 | \u6027\u80fd\u63d0\u5347 |\n| -------------------- | ---------- | ---------- | ----------- | ----------- | -------- |\n| \u6587\u6863\u6a21\u5f0f\uff08\u5173\u95ed HMM\uff09 | 0.35 \u79d2 | 1.85 MB/s | 0.28 \u79d2 | 2.32 MB/s | 20% |\n| \u6587\u6863\u6a21\u5f0f\uff08\u5f00\u542f HMM\uff09 | 0.51 \u79d2 | 1.25 MB/s | 0.42 \u79d2 | 1.52 MB/s | 18% |\n| \u67e5\u8be2\u6a21\u5f0f\uff08\u5173\u95ed HMM\uff09 | 0.33 \u79d2 | 1.93 MB/s | 0.26 \u79d2 | 2.45 MB/s | 21% |\n| \u67e5\u8be2\u6a21\u5f0f\uff08\u5f00\u542f HMM\uff09 | 0.55 \u79d2 | 1.17 MB/s | 0.45 \u79d2 | 1.42 MB/s | 18% |\n",
"bugtrack_url": null,
"license": null,
"summary": "\u201c\u7ed3\u5df4 3\u201d\u4e2d\u6587\u5206\u8bcd\uff1a\u505a\u6700\u597d\u7684 Modern Python 3 \u4e2d\u6587\u5206\u8bcd\u7ec4\u4ef6",
"version": "1.0.2",
"project_urls": {
"Home": "https://github.com/yansh97/jieba3"
},
"split_keywords": [
"chinese",
" nlp",
" analysis",
" tokenizer"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "665d83622ef6020c83766b4d0c9675722c7b6ed00a1a37289c4f6e707c695d8f",
"md5": "78cf69b18ea5d7d3bf5fabfe43a69ac4",
"sha256": "b6f33845d8dc32a7a55db95611efef7de5d42ba9964716d26f8d22d20cee9785"
},
"downloads": -1,
"filename": "jieba3-1.0.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "78cf69b18ea5d7d3bf5fabfe43a69ac4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4,>=3.10",
"size": 6910285,
"upload_time": "2024-10-12T06:09:44",
"upload_time_iso_8601": "2024-10-12T06:09:44.147888Z",
"url": "https://files.pythonhosted.org/packages/66/5d/83622ef6020c83766b4d0c9675722c7b6ed00a1a37289c4f6e707c695d8f/jieba3-1.0.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "23678eaae19dd87120dc8a7ba6dff5b24676bd6b92b7570954283edf2bfb5696",
"md5": "f33e9d5eced410c7567c449e5942d077",
"sha256": "80054b147115ac6a09f50d54d68abcf55f2cb8d435ab71128da40effd0f4e2cb"
},
"downloads": -1,
"filename": "jieba3-1.0.2.tar.gz",
"has_sig": false,
"md5_digest": "f33e9d5eced410c7567c449e5942d077",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4,>=3.10",
"size": 6850234,
"upload_time": "2024-10-12T06:09:52",
"upload_time_iso_8601": "2024-10-12T06:09:52.684254Z",
"url": "https://files.pythonhosted.org/packages/23/67/8eaae19dd87120dc8a7ba6dff5b24676bd6b92b7570954283edf2bfb5696/jieba3-1.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-10-12 06:09:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yansh97",
"github_project": "jieba3",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "pydantic",
"specs": []
}
],
"lcname": "jieba3"
}