pyhanlp


Namepyhanlp JSON
Version 0.1.85 PyPI version JSON
download
home_pagehttps://github.com/hankcs/pyhanlp
SummaryPython wrapper for HanLP: Han Language Processing
upload_time2023-12-23 04:16:14
maintainer
docs_urlNone
authorhankcs
requires_python
licenseApache License 2.0
keywords corpus machine-learning nlu nlp
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # pyhanlp: Python interfaces for HanLP1.x

![pypi](https://img.shields.io/pypi/v/pyhanlp) [![Downloads](https://pepy.tech/badge/pyhanlp)](https://pepy.tech/project/pyhanlp) [![GitHub license](https://img.shields.io/github/license/hankcs/pyhanlp)](https://github.com/hankcs/pyhanlp/blob/master/LICENSE) [![Run Jupyter](https://img.shields.io/badge/Run-Jupyter-orange?style=flat&logo=Jupyter)](https://mybinder.org/v2/gh/hankcs/pyhanlp.git/master?filepath=tests%2Fbook%2Findex.ipynb) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/hankcs/pyhanlp.git/master?filepath=tests%2Fbook%2Findex.ipynb)

[HanLP1.x](https://github.com/hankcs/HanLP/tree/1.x)的Python接口,支持自动下载与升级[HanLP1.x](https://github.com/hankcs/HanLP/tree/1.x),兼容Python<=3.8。内部算法经过工业界和学术界考验,配套书籍[《自然语言处理入门》](http://nlp.hankcs.com/book.php)已经出版,欢迎查阅[随书代码](https://github.com/hankcs/pyhanlp/tree/master/tests/book)或点击[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/hankcs/pyhanlp.git/master?filepath=tests%2Fbook%2Findex.ipynb)在线运行。基于深度学习的[HanLP2.x](https://github.com/hankcs/HanLP/tree/doc-zh)已于2020年初发布,次世代最先进的多语种NLP技术,与1.x相辅相成,平行发展。

## 安装

**懒人**请点击[![Run Jupyter](https://img.shields.io/badge/Run-Jupyter-orange?style=flat&logo=Jupyter)](https://mybinder.org/v2/gh/hankcs/pyhanlp.git/master?filepath=tests%2Fbook%2Findex.ipynb);**小白**可直接使用[傻瓜安装包](https://nlp.hankcs.com/download.php?file=exe);**工程师**请先安装[conda](https://docs.conda.io/en/latest/miniconda.html),然后执行:

```bash
conda install -c conda-forge openjdk python=3.8 jpype1=0.7.0 -y
pip install pyhanlp
```

使用命令`hanlp`来验证安装,如因网络等原因自动安装失败,可参考[手动配置](https://github.com/hankcs/pyhanlp/wiki/%E6%89%8B%E5%8A%A8%E9%85%8D%E7%BD%AE)或[Windows指南](https://github.com/hankcs/pyhanlp/wiki/Windows)。

## 命令行

### 中文分词

使用命令`hanlp segment`进入交互分词模式,输入一个句子并回车,[HanLP1.x](https://github.com/hankcs/HanLP/tree/1.x)会输出分词结果:

```python
$ hanlp segment
商品和服务
商品/n 和/cc 服务/vn
当下雨天地面积水分外严重
当/p 下雨天/n 地面/n 积水/n 分外/d 严重/a
龚学平等领导说,邓颖超生前杜绝超生
龚学平/nr 等/udeng 领导/n 说/v ,/w 邓颖超/nr 生前/t 杜绝/v 超生/vi
```

还可以重定向输入输出到文件等:

```python
$ hanlp segment <<< '欢迎新老师生前来就餐'               
欢迎/v 新/a 老/a 师生/n 前来/vi 就餐/vi
```

### 依存句法分析

命令为`hanlp parse`,同样支持交互模式和重定向:

```python
$ hanlp parse <<< '徐先生还具体帮助他确定了把画雄鹰、松鼠和麻雀作为主攻目标。'         
1	徐先生	徐先生	nh	nr	_	4	主谓关系	_	_
2	还	还	d	d	_	4	状中结构	_	_
3	具体	具体	a	a	_	4	状中结构	_	_
4	帮助	帮助	v	v	_	0	核心关系	_	_
5	他	他	r	rr	_	4	兼语	_	_
6	确定	确定	v	v	_	4	动宾关系	_	_
7	了	了	u	ule	_	6	右附加关系	_	_
8	把	把	p	pba	_	15	状中结构	_	_
9	画	画	v	v	_	8	介宾关系	_	_
10	雄鹰	雄鹰	n	n	_	9	动宾关系	_	_
11	、	、	wp	w	_	12	标点符号	_	_
12	松鼠	松鼠	n	n	_	10	并列关系	_	_
13	和	和	c	cc	_	14	左附加关系	_	_
14	麻雀	麻雀	n	n	_	10	并列关系	_	_
15	作为	作为	p	p	_	6	动宾关系	_	_
16	主攻	主攻	v	vn	_	17	定中关系	_	_
17	目标	目标	n	n	_	15	动宾关系	_	_
18	。	。	wp	w	_	4	标点符号	_	_
```

### 服务器

通过`hanlp serve`来启动内置的http服务器,默认本地访问地址为:http://localhost:8765 ;也可以访问官网演示页面:http://hanlp.hankcs.com/ 。

### 升级

通过`hanlp update`命令来将[HanLP1.x](https://github.com/hankcs/HanLP/tree/1.x)升级到最新版。该命令会获取[HanLP主项目最新版本](https://github.com/hankcs/HanLP/releases)并自动下载安装。

欢迎通过`hanlp --help`查看最新帮助手册。

## API

通过工具类[`HanLP`](https://github.com/hankcs/HanLP/blob/1.x/src/main/java/com/hankcs/hanlp/HanLP.java#L55)调用常用接口:

```python
from pyhanlp import *

print(HanLP.segment('你好,欢迎在Python中调用HanLP的API'))
for term in HanLP.segment('下雨天地面积水'):
    print('{}\t{}'.format(term.word, term.nature)) # 获取单词与词性
testCases = [
    "商品和服务",
    "结婚的和尚未结婚的确实在干扰分词啊",
    "买水果然后来世博园最后去世博会",
    "中国的首都是北京",
    "欢迎新老师生前来就餐",
    "工信处女干事每月经过下属科室都要亲口交代24口交换机等技术性器件的安装工作",
    "随着页游兴起到现在的页游繁盛,依赖于存档进行逻辑判断的设计减少了,但这块也不能完全忽略掉。"]
for sentence in testCases: print(HanLP.segment(sentence))
# 关键词提取
document = "水利部水资源司司长陈明忠9月29日在国务院新闻办举行的新闻发布会上透露," \
           "根据刚刚完成了水资源管理制度的考核,有部分省接近了红线的指标," \
           "有部分省超过红线的指标。对一些超过红线的地方,陈明忠表示,对一些取用水项目进行区域的限批," \
           "严格地进行水资源论证和取水许可的批准。"
print(HanLP.extractKeyword(document, 2))
# 自动摘要
print(HanLP.extractSummary(document, 3))
# 依存句法分析
print(HanLP.parseDependency("徐先生还具体帮助他确定了把画雄鹰、松鼠和麻雀作为主攻目标。"))
```

### 更多功能

更多功能,包括但不限于:

- 自定义词典
- 极速词典分词
- 索引分词
- CRF分词
- 感知机词法分析
- 臺灣正體、香港繁體
- 关键词提取、自动摘要
- 文本分类、情感分析

请阅读[HanLP主项目文档](https://github.com/hankcs/HanLP/blob/1.x/README.md)和[demos目录](https://github.com/hankcs/pyhanlp/tree/master/tests/demos)以了解更多。调用更底层的API需要参考Java语法用JClass引入更深的类路径。以感知机词法分析器为例,这个类位于包名[`com.hankcs.hanlp.model.perceptron.PerceptronLexicalAnalyzer`](https://github.com/hankcs/HanLP/blob/1.x/src/main/java/com/hankcs/hanlp/model/perceptron/PerceptronLexicalAnalyzer.java)下,所以先用`JClass`得到类,然后就可以调用了:

```
PerceptronLexicalAnalyzer = JClass('com.hankcs.hanlp.model.perceptron.PerceptronLexicalAnalyzer')
analyzer = PerceptronLexicalAnalyzer()
print(analyzer.analyze("上海华安工业(集团)公司董事长谭旭光和秘书胡花蕊来到美国纽约现代艺术博物馆参观"))
```

输出:

```
[上海/ns 华安/nz 工业/n (/w 集团/n )/w 公司/n]/nt 董事长/n 谭旭光/nr 和/c 秘书/n 胡花蕊/nr 来到/v [美国/ns 纽约/ns 现代/t 艺术/n 博物馆/n]/ns 参观/v
```

如果你需要多线程安全性,可使用`SafeJClass`;如果你需要延迟加载,可使用`LazyLoadingJClass`。如果你经常使用某个类,欢迎将其写入`pyhanlp/__init__.py`中并提交pull request,谢谢!

## 与其他项目共享data

[HanLP1.x](https://github.com/hankcs/HanLP/tree/1.x)具备高度可自定义的特点,所有模型和词典都可以自由替换。如果你希望与别的项目共享同一套data,只需将该项目的配置文件`hanlp.properties`拷贝到pyhanlp的安装目录下即可。本机安装目录可以通过`hanlp --version`获取。

同时,还可以通过`--config`临时加载另一个配置文件:

```
hanlp segment --config path/to/another/hanlp.properties
```

## 测试

```
git clone https://github.com/hankcs/pyhanlp.git
cd pyhanlp
pip install -e .
python tests/test_hanlp.py
```

## 反馈

任何bug,请前往[HanLP issue区](https://github.com/hankcs/HanLP/issues)。提问请上[论坛](https://bbs.hankcs.com/)反馈,谢谢。

## [《自然语言处理入门》](http://nlp.hankcs.com/book.php)

自然语言处理是一门博大精深的学科,掌握理论才能发挥出工具的全部性能。新手可考虑这本入门书:

![img](http://file.hankcs.com/img/nlp-book-squre.jpg)

一本配套HanLP的NLP入门书,基础理论与生产代码并重,Python与Java双实现。从基本概念出发,逐步介绍中文分词、词性标注、命名实体识别、信息抽取、文本聚类、文本分类、句法分析这几个热门问题的算法原理与工程实现。书中通过对多种算法的讲解,比较了它们的优缺点和适用场景,同时详细演示生产级成熟代码,助你真正将自然语言处理应用在生产环境中。

[《自然语言处理入门》](http://nlp.hankcs.com/book.php)由南方科技大学数学系创系主任夏志宏、微软亚洲研究院副院长周明、字节跳动人工智能实验室总监李航、华为诺亚方舟实验室语音语义首席科学家刘群、小米人工智能实验室主任兼NLP首席科学家王斌、中国科学院自动化研究所研究员宗成庆、清华大学副教授刘知远、北京理工大学副教授张华平和52nlp作序推荐。感谢各位前辈老师,希望这个项目和这本书能成为大家工程和学习上的“蝴蝶效应”,帮助大家在NLP之路上蜕变成蝶。

## 授权协议

Apache License 2.0




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/hankcs/pyhanlp",
    "name": "pyhanlp",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "corpus,machine-learning,NLU,NLP",
    "author": "hankcs",
    "author_email": "hankcshe@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a6/2b/6837c3135cf020df5c3ec728633299ca6d5683ea703438ded0b86b204153/pyhanlp-0.1.85.tar.gz",
    "platform": null,
    "description": "# pyhanlp: Python interfaces for HanLP1.x\n\n![pypi](https://img.shields.io/pypi/v/pyhanlp) [![Downloads](https://pepy.tech/badge/pyhanlp)](https://pepy.tech/project/pyhanlp) [![GitHub license](https://img.shields.io/github/license/hankcs/pyhanlp)](https://github.com/hankcs/pyhanlp/blob/master/LICENSE) [![Run Jupyter](https://img.shields.io/badge/Run-Jupyter-orange?style=flat&logo=Jupyter)](https://mybinder.org/v2/gh/hankcs/pyhanlp.git/master?filepath=tests%2Fbook%2Findex.ipynb) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/hankcs/pyhanlp.git/master?filepath=tests%2Fbook%2Findex.ipynb)\n\n[HanLP1.x](https://github.com/hankcs/HanLP/tree/1.x)\u7684Python\u63a5\u53e3\uff0c\u652f\u6301\u81ea\u52a8\u4e0b\u8f7d\u4e0e\u5347\u7ea7[HanLP1.x](https://github.com/hankcs/HanLP/tree/1.x)\uff0c\u517c\u5bb9Python<=3.8\u3002\u5185\u90e8\u7b97\u6cd5\u7ecf\u8fc7\u5de5\u4e1a\u754c\u548c\u5b66\u672f\u754c\u8003\u9a8c\uff0c\u914d\u5957\u4e66\u7c4d[\u300a\u81ea\u7136\u8bed\u8a00\u5904\u7406\u5165\u95e8\u300b](http://nlp.hankcs.com/book.php)\u5df2\u7ecf\u51fa\u7248\uff0c\u6b22\u8fce\u67e5\u9605[\u968f\u4e66\u4ee3\u7801](https://github.com/hankcs/pyhanlp/tree/master/tests/book)\u6216\u70b9\u51fb[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/hankcs/pyhanlp.git/master?filepath=tests%2Fbook%2Findex.ipynb)\u5728\u7ebf\u8fd0\u884c\u3002\u57fa\u4e8e\u6df1\u5ea6\u5b66\u4e60\u7684[HanLP2.x](https://github.com/hankcs/HanLP/tree/doc-zh)\u5df2\u4e8e2020\u5e74\u521d\u53d1\u5e03\uff0c\u6b21\u4e16\u4ee3\u6700\u5148\u8fdb\u7684\u591a\u8bed\u79cdNLP\u6280\u672f\uff0c\u4e0e1.x\u76f8\u8f85\u76f8\u6210\uff0c\u5e73\u884c\u53d1\u5c55\u3002\n\n## \u5b89\u88c5\n\n**\u61d2\u4eba**\u8bf7\u70b9\u51fb[![Run Jupyter](https://img.shields.io/badge/Run-Jupyter-orange?style=flat&logo=Jupyter)](https://mybinder.org/v2/gh/hankcs/pyhanlp.git/master?filepath=tests%2Fbook%2Findex.ipynb)\uff1b**\u5c0f\u767d**\u53ef\u76f4\u63a5\u4f7f\u7528[\u50bb\u74dc\u5b89\u88c5\u5305](https://nlp.hankcs.com/download.php?file=exe)\uff1b**\u5de5\u7a0b\u5e08**\u8bf7\u5148\u5b89\u88c5[conda](https://docs.conda.io/en/latest/miniconda.html)\uff0c\u7136\u540e\u6267\u884c\uff1a\n\n```bash\nconda install -c conda-forge openjdk python=3.8 jpype1=0.7.0 -y\npip install pyhanlp\n```\n\n\u4f7f\u7528\u547d\u4ee4`hanlp`\u6765\u9a8c\u8bc1\u5b89\u88c5\uff0c\u5982\u56e0\u7f51\u7edc\u7b49\u539f\u56e0\u81ea\u52a8\u5b89\u88c5\u5931\u8d25\uff0c\u53ef\u53c2\u8003[\u624b\u52a8\u914d\u7f6e](https://github.com/hankcs/pyhanlp/wiki/%E6%89%8B%E5%8A%A8%E9%85%8D%E7%BD%AE)\u6216[Windows\u6307\u5357](https://github.com/hankcs/pyhanlp/wiki/Windows)\u3002\n\n## \u547d\u4ee4\u884c\n\n### \u4e2d\u6587\u5206\u8bcd\n\n\u4f7f\u7528\u547d\u4ee4`hanlp segment`\u8fdb\u5165\u4ea4\u4e92\u5206\u8bcd\u6a21\u5f0f\uff0c\u8f93\u5165\u4e00\u4e2a\u53e5\u5b50\u5e76\u56de\u8f66\uff0c[HanLP1.x](https://github.com/hankcs/HanLP/tree/1.x)\u4f1a\u8f93\u51fa\u5206\u8bcd\u7ed3\u679c\uff1a\n\n```python\n$ hanlp segment\n\u5546\u54c1\u548c\u670d\u52a1\n\u5546\u54c1/n \u548c/cc \u670d\u52a1/vn\n\u5f53\u4e0b\u96e8\u5929\u5730\u9762\u79ef\u6c34\u5206\u5916\u4e25\u91cd\n\u5f53/p \u4e0b\u96e8\u5929/n \u5730\u9762/n \u79ef\u6c34/n \u5206\u5916/d \u4e25\u91cd/a\n\u9f9a\u5b66\u5e73\u7b49\u9886\u5bfc\u8bf4,\u9093\u9896\u8d85\u751f\u524d\u675c\u7edd\u8d85\u751f\n\u9f9a\u5b66\u5e73/nr \u7b49/udeng \u9886\u5bfc/n \u8bf4/v ,/w \u9093\u9896\u8d85/nr \u751f\u524d/t \u675c\u7edd/v \u8d85\u751f/vi\n```\n\n\u8fd8\u53ef\u4ee5\u91cd\u5b9a\u5411\u8f93\u5165\u8f93\u51fa\u5230\u6587\u4ef6\u7b49\uff1a\n\n```python\n$ hanlp segment <<< '\u6b22\u8fce\u65b0\u8001\u5e08\u751f\u524d\u6765\u5c31\u9910'               \n\u6b22\u8fce/v \u65b0/a \u8001/a \u5e08\u751f/n \u524d\u6765/vi \u5c31\u9910/vi\n```\n\n### \u4f9d\u5b58\u53e5\u6cd5\u5206\u6790\n\n\u547d\u4ee4\u4e3a`hanlp parse`\uff0c\u540c\u6837\u652f\u6301\u4ea4\u4e92\u6a21\u5f0f\u548c\u91cd\u5b9a\u5411\uff1a\n\n```python\n$ hanlp parse <<< '\u5f90\u5148\u751f\u8fd8\u5177\u4f53\u5e2e\u52a9\u4ed6\u786e\u5b9a\u4e86\u628a\u753b\u96c4\u9e70\u3001\u677e\u9f20\u548c\u9ebb\u96c0\u4f5c\u4e3a\u4e3b\u653b\u76ee\u6807\u3002'         \n1\t\u5f90\u5148\u751f\t\u5f90\u5148\u751f\tnh\tnr\t_\t4\t\u4e3b\u8c13\u5173\u7cfb\t_\t_\n2\t\u8fd8\t\u8fd8\td\td\t_\t4\t\u72b6\u4e2d\u7ed3\u6784\t_\t_\n3\t\u5177\u4f53\t\u5177\u4f53\ta\ta\t_\t4\t\u72b6\u4e2d\u7ed3\u6784\t_\t_\n4\t\u5e2e\u52a9\t\u5e2e\u52a9\tv\tv\t_\t0\t\u6838\u5fc3\u5173\u7cfb\t_\t_\n5\t\u4ed6\t\u4ed6\tr\trr\t_\t4\t\u517c\u8bed\t_\t_\n6\t\u786e\u5b9a\t\u786e\u5b9a\tv\tv\t_\t4\t\u52a8\u5bbe\u5173\u7cfb\t_\t_\n7\t\u4e86\t\u4e86\tu\tule\t_\t6\t\u53f3\u9644\u52a0\u5173\u7cfb\t_\t_\n8\t\u628a\t\u628a\tp\tpba\t_\t15\t\u72b6\u4e2d\u7ed3\u6784\t_\t_\n9\t\u753b\t\u753b\tv\tv\t_\t8\t\u4ecb\u5bbe\u5173\u7cfb\t_\t_\n10\t\u96c4\u9e70\t\u96c4\u9e70\tn\tn\t_\t9\t\u52a8\u5bbe\u5173\u7cfb\t_\t_\n11\t\u3001\t\u3001\twp\tw\t_\t12\t\u6807\u70b9\u7b26\u53f7\t_\t_\n12\t\u677e\u9f20\t\u677e\u9f20\tn\tn\t_\t10\t\u5e76\u5217\u5173\u7cfb\t_\t_\n13\t\u548c\t\u548c\tc\tcc\t_\t14\t\u5de6\u9644\u52a0\u5173\u7cfb\t_\t_\n14\t\u9ebb\u96c0\t\u9ebb\u96c0\tn\tn\t_\t10\t\u5e76\u5217\u5173\u7cfb\t_\t_\n15\t\u4f5c\u4e3a\t\u4f5c\u4e3a\tp\tp\t_\t6\t\u52a8\u5bbe\u5173\u7cfb\t_\t_\n16\t\u4e3b\u653b\t\u4e3b\u653b\tv\tvn\t_\t17\t\u5b9a\u4e2d\u5173\u7cfb\t_\t_\n17\t\u76ee\u6807\t\u76ee\u6807\tn\tn\t_\t15\t\u52a8\u5bbe\u5173\u7cfb\t_\t_\n18\t\u3002\t\u3002\twp\tw\t_\t4\t\u6807\u70b9\u7b26\u53f7\t_\t_\n```\n\n### \u670d\u52a1\u5668\n\n\u901a\u8fc7`hanlp serve`\u6765\u542f\u52a8\u5185\u7f6e\u7684http\u670d\u52a1\u5668\uff0c\u9ed8\u8ba4\u672c\u5730\u8bbf\u95ee\u5730\u5740\u4e3a\uff1ahttp://localhost:8765 \uff1b\u4e5f\u53ef\u4ee5\u8bbf\u95ee\u5b98\u7f51\u6f14\u793a\u9875\u9762\uff1ahttp://hanlp.hankcs.com/ \u3002\n\n### \u5347\u7ea7\n\n\u901a\u8fc7`hanlp update`\u547d\u4ee4\u6765\u5c06[HanLP1.x](https://github.com/hankcs/HanLP/tree/1.x)\u5347\u7ea7\u5230\u6700\u65b0\u7248\u3002\u8be5\u547d\u4ee4\u4f1a\u83b7\u53d6[HanLP\u4e3b\u9879\u76ee\u6700\u65b0\u7248\u672c](https://github.com/hankcs/HanLP/releases)\u5e76\u81ea\u52a8\u4e0b\u8f7d\u5b89\u88c5\u3002\n\n\u6b22\u8fce\u901a\u8fc7`hanlp --help`\u67e5\u770b\u6700\u65b0\u5e2e\u52a9\u624b\u518c\u3002\n\n## API\n\n\u901a\u8fc7\u5de5\u5177\u7c7b[`HanLP`](https://github.com/hankcs/HanLP/blob/1.x/src/main/java/com/hankcs/hanlp/HanLP.java#L55)\u8c03\u7528\u5e38\u7528\u63a5\u53e3\uff1a\n\n```python\nfrom pyhanlp import *\n\nprint(HanLP.segment('\u4f60\u597d\uff0c\u6b22\u8fce\u5728Python\u4e2d\u8c03\u7528HanLP\u7684API'))\nfor term in HanLP.segment('\u4e0b\u96e8\u5929\u5730\u9762\u79ef\u6c34'):\n    print('{}\\t{}'.format(term.word, term.nature)) # \u83b7\u53d6\u5355\u8bcd\u4e0e\u8bcd\u6027\ntestCases = [\n    \"\u5546\u54c1\u548c\u670d\u52a1\",\n    \"\u7ed3\u5a5a\u7684\u548c\u5c1a\u672a\u7ed3\u5a5a\u7684\u786e\u5b9e\u5728\u5e72\u6270\u5206\u8bcd\u554a\",\n    \"\u4e70\u6c34\u679c\u7136\u540e\u6765\u4e16\u535a\u56ed\u6700\u540e\u53bb\u4e16\u535a\u4f1a\",\n    \"\u4e2d\u56fd\u7684\u9996\u90fd\u662f\u5317\u4eac\",\n    \"\u6b22\u8fce\u65b0\u8001\u5e08\u751f\u524d\u6765\u5c31\u9910\",\n    \"\u5de5\u4fe1\u5904\u5973\u5e72\u4e8b\u6bcf\u6708\u7ecf\u8fc7\u4e0b\u5c5e\u79d1\u5ba4\u90fd\u8981\u4eb2\u53e3\u4ea4\u4ee324\u53e3\u4ea4\u6362\u673a\u7b49\u6280\u672f\u6027\u5668\u4ef6\u7684\u5b89\u88c5\u5de5\u4f5c\",\n    \"\u968f\u7740\u9875\u6e38\u5174\u8d77\u5230\u73b0\u5728\u7684\u9875\u6e38\u7e41\u76db\uff0c\u4f9d\u8d56\u4e8e\u5b58\u6863\u8fdb\u884c\u903b\u8f91\u5224\u65ad\u7684\u8bbe\u8ba1\u51cf\u5c11\u4e86\uff0c\u4f46\u8fd9\u5757\u4e5f\u4e0d\u80fd\u5b8c\u5168\u5ffd\u7565\u6389\u3002\"]\nfor sentence in testCases: print(HanLP.segment(sentence))\n# \u5173\u952e\u8bcd\u63d0\u53d6\ndocument = \"\u6c34\u5229\u90e8\u6c34\u8d44\u6e90\u53f8\u53f8\u957f\u9648\u660e\u5fe09\u670829\u65e5\u5728\u56fd\u52a1\u9662\u65b0\u95fb\u529e\u4e3e\u884c\u7684\u65b0\u95fb\u53d1\u5e03\u4f1a\u4e0a\u900f\u9732\uff0c\" \\\n           \"\u6839\u636e\u521a\u521a\u5b8c\u6210\u4e86\u6c34\u8d44\u6e90\u7ba1\u7406\u5236\u5ea6\u7684\u8003\u6838\uff0c\u6709\u90e8\u5206\u7701\u63a5\u8fd1\u4e86\u7ea2\u7ebf\u7684\u6307\u6807\uff0c\" \\\n           \"\u6709\u90e8\u5206\u7701\u8d85\u8fc7\u7ea2\u7ebf\u7684\u6307\u6807\u3002\u5bf9\u4e00\u4e9b\u8d85\u8fc7\u7ea2\u7ebf\u7684\u5730\u65b9\uff0c\u9648\u660e\u5fe0\u8868\u793a\uff0c\u5bf9\u4e00\u4e9b\u53d6\u7528\u6c34\u9879\u76ee\u8fdb\u884c\u533a\u57df\u7684\u9650\u6279\uff0c\" \\\n           \"\u4e25\u683c\u5730\u8fdb\u884c\u6c34\u8d44\u6e90\u8bba\u8bc1\u548c\u53d6\u6c34\u8bb8\u53ef\u7684\u6279\u51c6\u3002\"\nprint(HanLP.extractKeyword(document, 2))\n# \u81ea\u52a8\u6458\u8981\nprint(HanLP.extractSummary(document, 3))\n# \u4f9d\u5b58\u53e5\u6cd5\u5206\u6790\nprint(HanLP.parseDependency(\"\u5f90\u5148\u751f\u8fd8\u5177\u4f53\u5e2e\u52a9\u4ed6\u786e\u5b9a\u4e86\u628a\u753b\u96c4\u9e70\u3001\u677e\u9f20\u548c\u9ebb\u96c0\u4f5c\u4e3a\u4e3b\u653b\u76ee\u6807\u3002\"))\n```\n\n### \u66f4\u591a\u529f\u80fd\n\n\u66f4\u591a\u529f\u80fd\uff0c\u5305\u62ec\u4f46\u4e0d\u9650\u4e8e\uff1a\n\n- \u81ea\u5b9a\u4e49\u8bcd\u5178\n- \u6781\u901f\u8bcd\u5178\u5206\u8bcd\n- \u7d22\u5f15\u5206\u8bcd\n- CRF\u5206\u8bcd\n- \u611f\u77e5\u673a\u8bcd\u6cd5\u5206\u6790\n- \u81fa\u7063\u6b63\u9ad4\u3001\u9999\u6e2f\u7e41\u9ad4\n- \u5173\u952e\u8bcd\u63d0\u53d6\u3001\u81ea\u52a8\u6458\u8981\n- \u6587\u672c\u5206\u7c7b\u3001\u60c5\u611f\u5206\u6790\n\n\u8bf7\u9605\u8bfb[HanLP\u4e3b\u9879\u76ee\u6587\u6863](https://github.com/hankcs/HanLP/blob/1.x/README.md)\u548c[demos\u76ee\u5f55](https://github.com/hankcs/pyhanlp/tree/master/tests/demos)\u4ee5\u4e86\u89e3\u66f4\u591a\u3002\u8c03\u7528\u66f4\u5e95\u5c42\u7684API\u9700\u8981\u53c2\u8003Java\u8bed\u6cd5\u7528JClass\u5f15\u5165\u66f4\u6df1\u7684\u7c7b\u8def\u5f84\u3002\u4ee5\u611f\u77e5\u673a\u8bcd\u6cd5\u5206\u6790\u5668\u4e3a\u4f8b\uff0c\u8fd9\u4e2a\u7c7b\u4f4d\u4e8e\u5305\u540d[`com.hankcs.hanlp.model.perceptron.PerceptronLexicalAnalyzer`](https://github.com/hankcs/HanLP/blob/1.x/src/main/java/com/hankcs/hanlp/model/perceptron/PerceptronLexicalAnalyzer.java)\u4e0b\uff0c\u6240\u4ee5\u5148\u7528`JClass`\u5f97\u5230\u7c7b\uff0c\u7136\u540e\u5c31\u53ef\u4ee5\u8c03\u7528\u4e86\uff1a\n\n```\nPerceptronLexicalAnalyzer = JClass('com.hankcs.hanlp.model.perceptron.PerceptronLexicalAnalyzer')\nanalyzer = PerceptronLexicalAnalyzer()\nprint(analyzer.analyze(\"\u4e0a\u6d77\u534e\u5b89\u5de5\u4e1a\uff08\u96c6\u56e2\uff09\u516c\u53f8\u8463\u4e8b\u957f\u8c2d\u65ed\u5149\u548c\u79d8\u4e66\u80e1\u82b1\u854a\u6765\u5230\u7f8e\u56fd\u7ebd\u7ea6\u73b0\u4ee3\u827a\u672f\u535a\u7269\u9986\u53c2\u89c2\"))\n```\n\n\u8f93\u51fa\uff1a\n\n```\n[\u4e0a\u6d77/ns \u534e\u5b89/nz \u5de5\u4e1a/n \uff08/w \u96c6\u56e2/n \uff09/w \u516c\u53f8/n]/nt \u8463\u4e8b\u957f/n \u8c2d\u65ed\u5149/nr \u548c/c \u79d8\u4e66/n \u80e1\u82b1\u854a/nr \u6765\u5230/v [\u7f8e\u56fd/ns \u7ebd\u7ea6/ns \u73b0\u4ee3/t \u827a\u672f/n \u535a\u7269\u9986/n]/ns \u53c2\u89c2/v\n```\n\n\u5982\u679c\u4f60\u9700\u8981\u591a\u7ebf\u7a0b\u5b89\u5168\u6027\uff0c\u53ef\u4f7f\u7528`SafeJClass`\uff1b\u5982\u679c\u4f60\u9700\u8981\u5ef6\u8fdf\u52a0\u8f7d\uff0c\u53ef\u4f7f\u7528`LazyLoadingJClass`\u3002\u5982\u679c\u4f60\u7ecf\u5e38\u4f7f\u7528\u67d0\u4e2a\u7c7b\uff0c\u6b22\u8fce\u5c06\u5176\u5199\u5165`pyhanlp/__init__.py`\u4e2d\u5e76\u63d0\u4ea4pull request\uff0c\u8c22\u8c22\uff01\n\n## \u4e0e\u5176\u4ed6\u9879\u76ee\u5171\u4eabdata\n\n[HanLP1.x](https://github.com/hankcs/HanLP/tree/1.x)\u5177\u5907\u9ad8\u5ea6\u53ef\u81ea\u5b9a\u4e49\u7684\u7279\u70b9\uff0c\u6240\u6709\u6a21\u578b\u548c\u8bcd\u5178\u90fd\u53ef\u4ee5\u81ea\u7531\u66ff\u6362\u3002\u5982\u679c\u4f60\u5e0c\u671b\u4e0e\u522b\u7684\u9879\u76ee\u5171\u4eab\u540c\u4e00\u5957data\uff0c\u53ea\u9700\u5c06\u8be5\u9879\u76ee\u7684\u914d\u7f6e\u6587\u4ef6`hanlp.properties`\u62f7\u8d1d\u5230pyhanlp\u7684\u5b89\u88c5\u76ee\u5f55\u4e0b\u5373\u53ef\u3002\u672c\u673a\u5b89\u88c5\u76ee\u5f55\u53ef\u4ee5\u901a\u8fc7`hanlp --version`\u83b7\u53d6\u3002\n\n\u540c\u65f6\uff0c\u8fd8\u53ef\u4ee5\u901a\u8fc7`--config`\u4e34\u65f6\u52a0\u8f7d\u53e6\u4e00\u4e2a\u914d\u7f6e\u6587\u4ef6\uff1a\n\n```\nhanlp segment --config path/to/another/hanlp.properties\n```\n\n## \u6d4b\u8bd5\n\n```\ngit clone https://github.com/hankcs/pyhanlp.git\ncd pyhanlp\npip install -e .\npython tests/test_hanlp.py\n```\n\n## \u53cd\u9988\n\n\u4efb\u4f55bug\uff0c\u8bf7\u524d\u5f80[HanLP issue\u533a](https://github.com/hankcs/HanLP/issues)\u3002\u63d0\u95ee\u8bf7\u4e0a[\u8bba\u575b](https://bbs.hankcs.com/)\u53cd\u9988\uff0c\u8c22\u8c22\u3002\n\n## [\u300a\u81ea\u7136\u8bed\u8a00\u5904\u7406\u5165\u95e8\u300b](http://nlp.hankcs.com/book.php)\n\n\u81ea\u7136\u8bed\u8a00\u5904\u7406\u662f\u4e00\u95e8\u535a\u5927\u7cbe\u6df1\u7684\u5b66\u79d1\uff0c\u638c\u63e1\u7406\u8bba\u624d\u80fd\u53d1\u6325\u51fa\u5de5\u5177\u7684\u5168\u90e8\u6027\u80fd\u3002\u65b0\u624b\u53ef\u8003\u8651\u8fd9\u672c\u5165\u95e8\u4e66\uff1a\n\n![img](http://file.hankcs.com/img/nlp-book-squre.jpg)\n\n\u4e00\u672c\u914d\u5957HanLP\u7684NLP\u5165\u95e8\u4e66\uff0c\u57fa\u7840\u7406\u8bba\u4e0e\u751f\u4ea7\u4ee3\u7801\u5e76\u91cd\uff0cPython\u4e0eJava\u53cc\u5b9e\u73b0\u3002\u4ece\u57fa\u672c\u6982\u5ff5\u51fa\u53d1\uff0c\u9010\u6b65\u4ecb\u7ecd\u4e2d\u6587\u5206\u8bcd\u3001\u8bcd\u6027\u6807\u6ce8\u3001\u547d\u540d\u5b9e\u4f53\u8bc6\u522b\u3001\u4fe1\u606f\u62bd\u53d6\u3001\u6587\u672c\u805a\u7c7b\u3001\u6587\u672c\u5206\u7c7b\u3001\u53e5\u6cd5\u5206\u6790\u8fd9\u51e0\u4e2a\u70ed\u95e8\u95ee\u9898\u7684\u7b97\u6cd5\u539f\u7406\u4e0e\u5de5\u7a0b\u5b9e\u73b0\u3002\u4e66\u4e2d\u901a\u8fc7\u5bf9\u591a\u79cd\u7b97\u6cd5\u7684\u8bb2\u89e3\uff0c\u6bd4\u8f83\u4e86\u5b83\u4eec\u7684\u4f18\u7f3a\u70b9\u548c\u9002\u7528\u573a\u666f\uff0c\u540c\u65f6\u8be6\u7ec6\u6f14\u793a\u751f\u4ea7\u7ea7\u6210\u719f\u4ee3\u7801\uff0c\u52a9\u4f60\u771f\u6b63\u5c06\u81ea\u7136\u8bed\u8a00\u5904\u7406\u5e94\u7528\u5728\u751f\u4ea7\u73af\u5883\u4e2d\u3002\n\n[\u300a\u81ea\u7136\u8bed\u8a00\u5904\u7406\u5165\u95e8\u300b](http://nlp.hankcs.com/book.php)\u7531\u5357\u65b9\u79d1\u6280\u5927\u5b66\u6570\u5b66\u7cfb\u521b\u7cfb\u4e3b\u4efb\u590f\u5fd7\u5b8f\u3001\u5fae\u8f6f\u4e9a\u6d32\u7814\u7a76\u9662\u526f\u9662\u957f\u5468\u660e\u3001\u5b57\u8282\u8df3\u52a8\u4eba\u5de5\u667a\u80fd\u5b9e\u9a8c\u5ba4\u603b\u76d1\u674e\u822a\u3001\u534e\u4e3a\u8bfa\u4e9a\u65b9\u821f\u5b9e\u9a8c\u5ba4\u8bed\u97f3\u8bed\u4e49\u9996\u5e2d\u79d1\u5b66\u5bb6\u5218\u7fa4\u3001\u5c0f\u7c73\u4eba\u5de5\u667a\u80fd\u5b9e\u9a8c\u5ba4\u4e3b\u4efb\u517cNLP\u9996\u5e2d\u79d1\u5b66\u5bb6\u738b\u658c\u3001\u4e2d\u56fd\u79d1\u5b66\u9662\u81ea\u52a8\u5316\u7814\u7a76\u6240\u7814\u7a76\u5458\u5b97\u6210\u5e86\u3001\u6e05\u534e\u5927\u5b66\u526f\u6559\u6388\u5218\u77e5\u8fdc\u3001\u5317\u4eac\u7406\u5de5\u5927\u5b66\u526f\u6559\u6388\u5f20\u534e\u5e73\u548c52nlp\u4f5c\u5e8f\u63a8\u8350\u3002\u611f\u8c22\u5404\u4f4d\u524d\u8f88\u8001\u5e08\uff0c\u5e0c\u671b\u8fd9\u4e2a\u9879\u76ee\u548c\u8fd9\u672c\u4e66\u80fd\u6210\u4e3a\u5927\u5bb6\u5de5\u7a0b\u548c\u5b66\u4e60\u4e0a\u7684\u201c\u8774\u8776\u6548\u5e94\u201d\uff0c\u5e2e\u52a9\u5927\u5bb6\u5728NLP\u4e4b\u8def\u4e0a\u8715\u53d8\u6210\u8776\u3002\n\n## \u6388\u6743\u534f\u8bae\n\nApache License 2.0\n\n\n\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "Python wrapper for HanLP: Han Language Processing",
    "version": "0.1.85",
    "project_urls": {
        "Homepage": "https://github.com/hankcs/pyhanlp"
    },
    "split_keywords": [
        "corpus",
        "machine-learning",
        "nlu",
        "nlp"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a62b6837c3135cf020df5c3ec728633299ca6d5683ea703438ded0b86b204153",
                "md5": "685147fb2fa8579d91f3240d00e90d1a",
                "sha256": "06d2d922da2c9309c8e23fcf770cfe66333e36a08543d139b37d0ed6f4a1d7ae"
            },
            "downloads": -1,
            "filename": "pyhanlp-0.1.85.tar.gz",
            "has_sig": false,
            "md5_digest": "685147fb2fa8579d91f3240d00e90d1a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 35436,
            "upload_time": "2023-12-23T04:16:14",
            "upload_time_iso_8601": "2023-12-23T04:16:14.997200Z",
            "url": "https://files.pythonhosted.org/packages/a6/2b/6837c3135cf020df5c3ec728633299ca6d5683ea703438ded0b86b204153/pyhanlp-0.1.85.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-23 04:16:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "hankcs",
    "github_project": "pyhanlp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "pyhanlp"
}
        
Elapsed time: 0.19342s