# textana4sc
中文文本分析库,可对文本进行词频统计、情绪分析、话题分析等
- [github地址](https://github.com/martin6336/textanalyze4sc) ``https://github.com/hidadeng/cntext``
- [pypi地址](https://pypi.org/project/textanalyze4sc/) ``https://pypi.org/project/cntext/``
功能模块含
- **word_cloud** 文本统计,可读性等
- **get_keyword** 获取文本关键词
- **get_entity** 获取文本实体
- **get_emotion** 获取文本情绪
- **get_cosemantic** 获取词语共现语义图
- **get_topic** 获取话题
- **visualization** 可视化,如词云图
<br>
## 安装
```
pip install textanalyze4sc
```
<br><br>
## 一、读取数据
```python
from texttool import analyze
df_data = analyze.load_data(the path of your data)
```
<br><br>
## 二、提取关键词
```python
df_data_key=analyze.get_keyword(df_data)
```
<br>
## 三、提取实体
```python
df_data_entity=analyze.get_entity(df_data)
```
## 四、情感分析
这里提供两种粒度的情感分析。
1,这里分为三种“积极”,“负面”,“中立”
```python
analyze.get_emotion('我很开心,你是这么认为的吗')
```
结果
```
'pos'
```
2,这里进行更为细粒度的区分,分为“好”,“乐”,“哀”,“怒”,“惧”,“恶”,“惊” 七类情绪。
```python
analyze.get_emotion_sp('我很开心,你是这么认为的吗')
```
结果
```
{'words': 10,
'sentences': 1,
'好': 0,
'乐': 1,
'哀': 0,
'怒': 0,
'惧': 0,
'恶': 0,
'惊': 0}
```
## 五、词语共现图
本文使用筛选出现频率出现前50的实体,并作出共现图
<br>
```
analyze.get_cosemantic(df_data,top50_all)
```
## 六、抽取三元组
```
text = "他叫汤姆去拿外衣。"
get_graph(text)
```
结果
```
[['他', '叫', '汤姆'], ['汤姆', '拿', '外衣']]
```
## 七、生成摘要
本文应用抽取式摘要技术,可以设置sent_num参数控制输出摘要局数。
```
text='2013年,信号与信息处理专业硕士毕业的张超凡进入国铁南宁局南宁电务段工作。那一年,广西同时开通多条高铁线路,高铁营业里程从0公里跃升至1000多公里。10年间,伴随着中国铁路高速发展,张超凡收获颇多。'
get_summary(text,sent_num=1)
```
结果
```
'2013年,信号与信息处理专业硕士毕业的张超凡进入国铁南宁局南宁电务段工作。'
```
## 八、可视化
本文提供各类可视化工具,柱状图,趋势图,词云图等。
Raw data
{
"_id": null,
"home_page": "https://github.com/martin6336/textanalyze4sc",
"name": "textanalyze4sc",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "text mining,sentiment analysis,natural language processing,text similarity",
"author": "bqw",
"author_email": "beerbull@126.com",
"download_url": "https://files.pythonhosted.org/packages/18/a8/bc272010c9a596ba6e2ae5c85420b55bc2e75992ed3306b788bb019c8a80/textanalyze4sc-2.0.tar.gz",
"platform": null,
"description": "# textana4sc\n\n\u4e2d\u6587\u6587\u672c\u5206\u6790\u5e93\uff0c\u53ef\u5bf9\u6587\u672c\u8fdb\u884c\u8bcd\u9891\u7edf\u8ba1\u3001\u60c5\u7eea\u5206\u6790\u3001\u8bdd\u9898\u5206\u6790\u7b49\n\n- [github\u5730\u5740](https://github.com/martin6336/textanalyze4sc) ``https://github.com/hidadeng/cntext``\n- [pypi\u5730\u5740](https://pypi.org/project/textanalyze4sc/) ``https://pypi.org/project/cntext/``\n\n\n\u529f\u80fd\u6a21\u5757\u542b\n\n\n- **word_cloud** \u6587\u672c\u7edf\u8ba1,\u53ef\u8bfb\u6027\u7b49\n- **get_keyword** \u83b7\u53d6\u6587\u672c\u5173\u952e\u8bcd\n- **get_entity** \u83b7\u53d6\u6587\u672c\u5b9e\u4f53\n- **get_emotion** \u83b7\u53d6\u6587\u672c\u60c5\u7eea\n- **get_cosemantic** \u83b7\u53d6\u8bcd\u8bed\u5171\u73b0\u8bed\u4e49\u56fe\n- **get_topic** \u83b7\u53d6\u8bdd\u9898\n- **visualization** \u53ef\u89c6\u5316\uff0c\u5982\u8bcd\u4e91\u56fe\n\n\n\n<br>\n\n## \u5b89\u88c5\n\n```\npip install textanalyze4sc\n```\n\n\n<br><br>\n\n## \u4e00\u3001\u8bfb\u53d6\u6570\u636e\n\n\n\n```python\nfrom texttool import analyze\n\ndf_data = analyze.load_data(the path of your data)\n```\n\n\n\n\n<br><br>\n\n## \u4e8c\u3001\u63d0\u53d6\u5173\u952e\u8bcd\n\n\n```python\n\n\ndf_data_key=analyze.get_keyword(df_data)\n```\n\n\n<br>\n\n## \u4e09\u3001\u63d0\u53d6\u5b9e\u4f53\n\n```python\ndf_data_entity=analyze.get_entity(df_data)\n\n```\n\n\n\n\n## \u56db\u3001\u60c5\u611f\u5206\u6790\n\u8fd9\u91cc\u63d0\u4f9b\u4e24\u79cd\u7c92\u5ea6\u7684\u60c5\u611f\u5206\u6790\u3002\n\n1\uff0c\u8fd9\u91cc\u5206\u4e3a\u4e09\u79cd\u201c\u79ef\u6781\u201d\uff0c\u201c\u8d1f\u9762\u201d\uff0c\u201c\u4e2d\u7acb\u201d\n```python\nanalyze.get_emotion('\u6211\u5f88\u5f00\u5fc3\uff0c\u4f60\u662f\u8fd9\u4e48\u8ba4\u4e3a\u7684\u5417')\n\n```\n\n\u7ed3\u679c\n\n```\n'pos'\n```\n\n2\uff0c\u8fd9\u91cc\u8fdb\u884c\u66f4\u4e3a\u7ec6\u7c92\u5ea6\u7684\u533a\u5206\uff0c\u5206\u4e3a\u201c\u597d\u201d\uff0c\u201c\u4e50\u201d\uff0c\u201c\u54c0\u201d\uff0c\u201c\u6012\u201d\uff0c\u201c\u60e7\u201d\uff0c\u201c\u6076\u201d\uff0c\u201c\u60ca\u201d \u4e03\u7c7b\u60c5\u7eea\u3002\n```python\nanalyze.get_emotion_sp('\u6211\u5f88\u5f00\u5fc3\uff0c\u4f60\u662f\u8fd9\u4e48\u8ba4\u4e3a\u7684\u5417')\n\n```\n\n\u7ed3\u679c\n\n```\n{'words': 10,\n 'sentences': 1,\n '\u597d': 0,\n '\u4e50': 1,\n '\u54c0': 0,\n '\u6012': 0,\n '\u60e7': 0,\n '\u6076': 0,\n '\u60ca': 0}\n```\n\n\n## \u4e94\u3001\u8bcd\u8bed\u5171\u73b0\u56fe\n\n\u672c\u6587\u4f7f\u7528\u7b5b\u9009\u51fa\u73b0\u9891\u7387\u51fa\u73b0\u524d50\u7684\u5b9e\u4f53\uff0c\u5e76\u4f5c\u51fa\u5171\u73b0\u56fe\n<br>\n\n```\n\nanalyze.get_cosemantic(df_data,top50_all)\n```\n\n## \u516d\u3001\u62bd\u53d6\u4e09\u5143\u7ec4\n\n```\ntext = \"\u4ed6\u53eb\u6c64\u59c6\u53bb\u62ff\u5916\u8863\u3002\"\nget_graph(text)\n```\n\u7ed3\u679c\n\n```\n[['\u4ed6', '\u53eb', '\u6c64\u59c6'], ['\u6c64\u59c6', '\u62ff', '\u5916\u8863']]\n```\n\n\n## \u4e03\u3001\u751f\u6210\u6458\u8981\n\n\u672c\u6587\u5e94\u7528\u62bd\u53d6\u5f0f\u6458\u8981\u6280\u672f\uff0c\u53ef\u4ee5\u8bbe\u7f6esent_num\u53c2\u6570\u63a7\u5236\u8f93\u51fa\u6458\u8981\u5c40\u6570\u3002\n```\ntext='2013\u5e74\uff0c\u4fe1\u53f7\u4e0e\u4fe1\u606f\u5904\u7406\u4e13\u4e1a\u7855\u58eb\u6bd5\u4e1a\u7684\u5f20\u8d85\u51e1\u8fdb\u5165\u56fd\u94c1\u5357\u5b81\u5c40\u5357\u5b81\u7535\u52a1\u6bb5\u5de5\u4f5c\u3002\u90a3\u4e00\u5e74\uff0c\u5e7f\u897f\u540c\u65f6\u5f00\u901a\u591a\u6761\u9ad8\u94c1\u7ebf\u8def\uff0c\u9ad8\u94c1\u8425\u4e1a\u91cc\u7a0b\u4ece0\u516c\u91cc\u8dc3\u5347\u81f31000\u591a\u516c\u91cc\u300210\u5e74\u95f4\uff0c\u4f34\u968f\u7740\u4e2d\u56fd\u94c1\u8def\u9ad8\u901f\u53d1\u5c55\uff0c\u5f20\u8d85\u51e1\u6536\u83b7\u9887\u591a\u3002'\nget_summary(text,sent_num=1)\n```\n\n\u7ed3\u679c\n\n```\n'2013\u5e74\uff0c\u4fe1\u53f7\u4e0e\u4fe1\u606f\u5904\u7406\u4e13\u4e1a\u7855\u58eb\u6bd5\u4e1a\u7684\u5f20\u8d85\u51e1\u8fdb\u5165\u56fd\u94c1\u5357\u5b81\u5c40\u5357\u5b81\u7535\u52a1\u6bb5\u5de5\u4f5c\u3002'\n```\n\n\n## \u516b\u3001\u53ef\u89c6\u5316\n\u672c\u6587\u63d0\u4f9b\u5404\u7c7b\u53ef\u89c6\u5316\u5de5\u5177\uff0c\u67f1\u72b6\u56fe\uff0c\u8d8b\u52bf\u56fe\uff0c\u8bcd\u4e91\u56fe\u7b49\u3002",
"bugtrack_url": null,
"license": "MIT",
"summary": "\u6587\u672c\u5206\u6790\u5e93\uff0c\u53ef\u5bf9\u6587\u672c\u8fdb\u884c\u8bcd\u9891\u7edf\u8ba1\u3001\u8bcd\u5178\u6269\u5145\u3001\u60c5\u7eea\u5206\u6790\u7b49",
"version": "2.0",
"split_keywords": [
"text mining",
"sentiment analysis",
"natural language processing",
"text similarity"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "18a8bc272010c9a596ba6e2ae5c85420b55bc2e75992ed3306b788bb019c8a80",
"md5": "8e683fbea385e716b774abda87eee0ad",
"sha256": "2c8a1e5df2da0d87b65ba74c18c401615fd53cf34793dee32da3f2f6168f0a11"
},
"downloads": -1,
"filename": "textanalyze4sc-2.0.tar.gz",
"has_sig": false,
"md5_digest": "8e683fbea385e716b774abda87eee0ad",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 2989,
"upload_time": "2023-03-23T13:35:13",
"upload_time_iso_8601": "2023-03-23T13:35:13.879983Z",
"url": "https://files.pythonhosted.org/packages/18/a8/bc272010c9a596ba6e2ae5c85420b55bc2e75992ed3306b788bb019c8a80/textanalyze4sc-2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-03-23 13:35:13",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "martin6336",
"github_project": "textanalyze4sc",
"lcname": "textanalyze4sc"
}