zenlp


Namezenlp JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
Summary一个为Python设计的极简、优雅的自然语言处理(NLP)工具包。
upload_time2025-07-19 16:42:54
maintainerNone
docs_urlNone
authorHellohistory
requires_python>=3.8
licenseNone
keywords
VCS
bugtrack_url
requirements pytest tqdm
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ZenLP - NLP工具包

![PyPI](https://img.shields.io/pypi/v/zenlp?label=PyPI&logo=pypi&color=blue)
![Python Versions](https://img.shields.io/pypi/pyversions/zenlp?logo=python&label=Python)
![Build Status](https://img.shields.io/github/actions/workflow/status/Hellohistory/zenlp/publish.yml)
![License](https://img.shields.io/github/license/Hellohistory/zenlp)

**大道至简,返璞归真。**

`ZenLP` 是一个为Python设计的极简、优雅的自然语言处理(NLP)工具包。

在大语言模型(LLM)席卷业界的今天,我们常常被其复杂的结构和高昂的资源需求所困扰。
`ZenLP` 回归初心,专注于那些历久弥新、效果稳健且具有高度可解释性的经典NLP算法。
我们相信,优雅的工程实现能让这些经典算法在现代NLP工作流中焕发新的光彩。

本项目旨在为每个算法提供:
* 清晰、完备的文档和注释。
* 符合直觉、易于使用的API。
* 生产就绪的性能和稳定性。

欢迎来到ZenNLP的世界,在这里,我们用最少的代码,解决最核心的问题。

## ✨ 特性

* **极简设计**: 摒弃繁杂的依赖和配置,每个模块都力求小而美,专注做好一件事。
* **无监督与自适应**: 核心算法多为无监督实现,能自动从原始文本中学习,轻松适应不同领域。
* **生产就绪**: 所有代码均经过严格测试,并提供类型提示,确保在生产环境中的稳定性和可维护性。
* **教育友好**: 不仅是工具,更是学习经典NLP算法的优秀参考资料。每一行代码、每一篇文档都在传递算法的精髓。

## 🚀 快速开始

### 1. 安装

通过 pip 可以轻松安装 `ZenLP`:
```bash
pip install zenlp
```

### 2. 使用示例:发现新词
只需几行代码,即可从您的语料中发现新词。

```python
from zenlp import discover
from pprint import pprint

corpus = [
    "大语言模型正在引领新一轮的技术革命。",
    "生成式AI的快速发展对内容创作产生了深远影响。",
    "遥遥领先的技术优势使得这家公司备受瞩目。",
    "赛博朋克风格的艺术作品充满了对未来的想象。",
    "这家公司的遥遥领先,得益于其强大的自研芯片。",
    "许多人对生成式AI的未来既期待又担忧。",
    "赛博朋克不仅仅是一种美学,更是一种文化现象。",
]

# 调用 discover 函数
# min_freq: 词语出现的最小频率
# min_pmi: 最小凝聚度 (内部关联性)
# min_entropy: 最小自由度 (外部多样性)
new_words = discover(
    corpus_source=corpus,
    min_freq=2,
    min_pmi=1.0, 
    min_entropy=0.5
)

pprint(new_words)

```

## 📖 功能模块

`ZenLP` 将会逐步涵盖NLP中的多个核心领域,每个模块都遵循“禅”的设计哲学。

* ✅ **`zenlp.discovery` - 新词发现**
    * **功能**: 基于 `PMI + 左右熵` 的无监督新词发现。
    * **状态**: 已完成。


## 🤝 贡献 (Contributing)

我们热烈欢迎任何形式的贡献!无论您是想修复一个Bug、增加一个新功能,还是改进文档,都请不要犹豫。

请在提交前确保您的代码通过了测试,并遵循了项目的编码风格。

## 📜 许可证 (License)

本项目采用 [MIT License](https://github.com/Hellohistory/zenlp/License) 授权。

---
Made with ❤️ and Zenlp.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "zenlp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": "Hellohistory",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/7e/5e/1a9e58557990859f2463fa190b9f6bed97f4e1c753895484596325d0ae26/zenlp-0.1.0.tar.gz",
    "platform": null,
    "description": "# ZenLP - NLP\u5de5\u5177\u5305\n\n![PyPI](https://img.shields.io/pypi/v/zenlp?label=PyPI&logo=pypi&color=blue)\n![Python Versions](https://img.shields.io/pypi/pyversions/zenlp?logo=python&label=Python)\n![Build Status](https://img.shields.io/github/actions/workflow/status/Hellohistory/zenlp/publish.yml)\n![License](https://img.shields.io/github/license/Hellohistory/zenlp)\n\n**\u5927\u9053\u81f3\u7b80\uff0c\u8fd4\u749e\u5f52\u771f\u3002**\n\n`ZenLP` \u662f\u4e00\u4e2a\u4e3aPython\u8bbe\u8ba1\u7684\u6781\u7b80\u3001\u4f18\u96c5\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u5de5\u5177\u5305\u3002\n\n\u5728\u5927\u8bed\u8a00\u6a21\u578b\uff08LLM\uff09\u5e2d\u5377\u4e1a\u754c\u7684\u4eca\u5929\uff0c\u6211\u4eec\u5e38\u5e38\u88ab\u5176\u590d\u6742\u7684\u7ed3\u6784\u548c\u9ad8\u6602\u7684\u8d44\u6e90\u9700\u6c42\u6240\u56f0\u6270\u3002\n`ZenLP` \u56de\u5f52\u521d\u5fc3\uff0c\u4e13\u6ce8\u4e8e\u90a3\u4e9b\u5386\u4e45\u5f25\u65b0\u3001\u6548\u679c\u7a33\u5065\u4e14\u5177\u6709\u9ad8\u5ea6\u53ef\u89e3\u91ca\u6027\u7684\u7ecf\u5178NLP\u7b97\u6cd5\u3002\n\u6211\u4eec\u76f8\u4fe1\uff0c\u4f18\u96c5\u7684\u5de5\u7a0b\u5b9e\u73b0\u80fd\u8ba9\u8fd9\u4e9b\u7ecf\u5178\u7b97\u6cd5\u5728\u73b0\u4ee3NLP\u5de5\u4f5c\u6d41\u4e2d\u7115\u53d1\u65b0\u7684\u5149\u5f69\u3002\n\n\u672c\u9879\u76ee\u65e8\u5728\u4e3a\u6bcf\u4e2a\u7b97\u6cd5\u63d0\u4f9b\uff1a\n* \u6e05\u6670\u3001\u5b8c\u5907\u7684\u6587\u6863\u548c\u6ce8\u91ca\u3002\n* \u7b26\u5408\u76f4\u89c9\u3001\u6613\u4e8e\u4f7f\u7528\u7684API\u3002\n* \u751f\u4ea7\u5c31\u7eea\u7684\u6027\u80fd\u548c\u7a33\u5b9a\u6027\u3002\n\n\u6b22\u8fce\u6765\u5230ZenNLP\u7684\u4e16\u754c\uff0c\u5728\u8fd9\u91cc\uff0c\u6211\u4eec\u7528\u6700\u5c11\u7684\u4ee3\u7801\uff0c\u89e3\u51b3\u6700\u6838\u5fc3\u7684\u95ee\u9898\u3002\n\n## \u2728 \u7279\u6027\n\n* **\u6781\u7b80\u8bbe\u8ba1**: \u6452\u5f03\u7e41\u6742\u7684\u4f9d\u8d56\u548c\u914d\u7f6e\uff0c\u6bcf\u4e2a\u6a21\u5757\u90fd\u529b\u6c42\u5c0f\u800c\u7f8e\uff0c\u4e13\u6ce8\u505a\u597d\u4e00\u4ef6\u4e8b\u3002\n* **\u65e0\u76d1\u7763\u4e0e\u81ea\u9002\u5e94**: \u6838\u5fc3\u7b97\u6cd5\u591a\u4e3a\u65e0\u76d1\u7763\u5b9e\u73b0\uff0c\u80fd\u81ea\u52a8\u4ece\u539f\u59cb\u6587\u672c\u4e2d\u5b66\u4e60\uff0c\u8f7b\u677e\u9002\u5e94\u4e0d\u540c\u9886\u57df\u3002\n* **\u751f\u4ea7\u5c31\u7eea**: \u6240\u6709\u4ee3\u7801\u5747\u7ecf\u8fc7\u4e25\u683c\u6d4b\u8bd5\uff0c\u5e76\u63d0\u4f9b\u7c7b\u578b\u63d0\u793a\uff0c\u786e\u4fdd\u5728\u751f\u4ea7\u73af\u5883\u4e2d\u7684\u7a33\u5b9a\u6027\u548c\u53ef\u7ef4\u62a4\u6027\u3002\n* **\u6559\u80b2\u53cb\u597d**: \u4e0d\u4ec5\u662f\u5de5\u5177\uff0c\u66f4\u662f\u5b66\u4e60\u7ecf\u5178NLP\u7b97\u6cd5\u7684\u4f18\u79c0\u53c2\u8003\u8d44\u6599\u3002\u6bcf\u4e00\u884c\u4ee3\u7801\u3001\u6bcf\u4e00\u7bc7\u6587\u6863\u90fd\u5728\u4f20\u9012\u7b97\u6cd5\u7684\u7cbe\u9ad3\u3002\n\n## \ud83d\ude80 \u5feb\u901f\u5f00\u59cb\n\n### 1. \u5b89\u88c5\n\n\u901a\u8fc7 pip \u53ef\u4ee5\u8f7b\u677e\u5b89\u88c5 `ZenLP`\uff1a\n```bash\npip install zenlp\n```\n\n### 2. \u4f7f\u7528\u793a\u4f8b\uff1a\u53d1\u73b0\u65b0\u8bcd\n\u53ea\u9700\u51e0\u884c\u4ee3\u7801\uff0c\u5373\u53ef\u4ece\u60a8\u7684\u8bed\u6599\u4e2d\u53d1\u73b0\u65b0\u8bcd\u3002\n\n```python\nfrom zenlp import discover\nfrom pprint import pprint\n\ncorpus = [\n    \"\u5927\u8bed\u8a00\u6a21\u578b\u6b63\u5728\u5f15\u9886\u65b0\u4e00\u8f6e\u7684\u6280\u672f\u9769\u547d\u3002\",\n    \"\u751f\u6210\u5f0fAI\u7684\u5feb\u901f\u53d1\u5c55\u5bf9\u5185\u5bb9\u521b\u4f5c\u4ea7\u751f\u4e86\u6df1\u8fdc\u5f71\u54cd\u3002\",\n    \"\u9065\u9065\u9886\u5148\u7684\u6280\u672f\u4f18\u52bf\u4f7f\u5f97\u8fd9\u5bb6\u516c\u53f8\u5907\u53d7\u77a9\u76ee\u3002\",\n    \"\u8d5b\u535a\u670b\u514b\u98ce\u683c\u7684\u827a\u672f\u4f5c\u54c1\u5145\u6ee1\u4e86\u5bf9\u672a\u6765\u7684\u60f3\u8c61\u3002\",\n    \"\u8fd9\u5bb6\u516c\u53f8\u7684\u9065\u9065\u9886\u5148\uff0c\u5f97\u76ca\u4e8e\u5176\u5f3a\u5927\u7684\u81ea\u7814\u82af\u7247\u3002\",\n    \"\u8bb8\u591a\u4eba\u5bf9\u751f\u6210\u5f0fAI\u7684\u672a\u6765\u65e2\u671f\u5f85\u53c8\u62c5\u5fe7\u3002\",\n    \"\u8d5b\u535a\u670b\u514b\u4e0d\u4ec5\u4ec5\u662f\u4e00\u79cd\u7f8e\u5b66\uff0c\u66f4\u662f\u4e00\u79cd\u6587\u5316\u73b0\u8c61\u3002\",\n]\n\n# \u8c03\u7528 discover \u51fd\u6570\n# min_freq: \u8bcd\u8bed\u51fa\u73b0\u7684\u6700\u5c0f\u9891\u7387\n# min_pmi: \u6700\u5c0f\u51dd\u805a\u5ea6 (\u5185\u90e8\u5173\u8054\u6027)\n# min_entropy: \u6700\u5c0f\u81ea\u7531\u5ea6 (\u5916\u90e8\u591a\u6837\u6027)\nnew_words = discover(\n    corpus_source=corpus,\n    min_freq=2,\n    min_pmi=1.0, \n    min_entropy=0.5\n)\n\npprint(new_words)\n\n```\n\n## \ud83d\udcd6 \u529f\u80fd\u6a21\u5757\n\n`ZenLP` \u5c06\u4f1a\u9010\u6b65\u6db5\u76d6NLP\u4e2d\u7684\u591a\u4e2a\u6838\u5fc3\u9886\u57df\uff0c\u6bcf\u4e2a\u6a21\u5757\u90fd\u9075\u5faa\u201c\u7985\u201d\u7684\u8bbe\u8ba1\u54f2\u5b66\u3002\n\n* \u2705 **`zenlp.discovery` - \u65b0\u8bcd\u53d1\u73b0**\n    * **\u529f\u80fd**: \u57fa\u4e8e `PMI + \u5de6\u53f3\u71b5` \u7684\u65e0\u76d1\u7763\u65b0\u8bcd\u53d1\u73b0\u3002\n    * **\u72b6\u6001**: \u5df2\u5b8c\u6210\u3002\n\n\n## \ud83e\udd1d \u8d21\u732e (Contributing)\n\n\u6211\u4eec\u70ed\u70c8\u6b22\u8fce\u4efb\u4f55\u5f62\u5f0f\u7684\u8d21\u732e\uff01\u65e0\u8bba\u60a8\u662f\u60f3\u4fee\u590d\u4e00\u4e2aBug\u3001\u589e\u52a0\u4e00\u4e2a\u65b0\u529f\u80fd\uff0c\u8fd8\u662f\u6539\u8fdb\u6587\u6863\uff0c\u90fd\u8bf7\u4e0d\u8981\u72b9\u8c6b\u3002\n\n\u8bf7\u5728\u63d0\u4ea4\u524d\u786e\u4fdd\u60a8\u7684\u4ee3\u7801\u901a\u8fc7\u4e86\u6d4b\u8bd5\uff0c\u5e76\u9075\u5faa\u4e86\u9879\u76ee\u7684\u7f16\u7801\u98ce\u683c\u3002\n\n## \ud83d\udcdc \u8bb8\u53ef\u8bc1 (License)\n\n\u672c\u9879\u76ee\u91c7\u7528 [MIT License](https://github.com/Hellohistory/zenlp/License) \u6388\u6743\u3002\n\n---\nMade with \u2764\ufe0f and Zenlp.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "\u4e00\u4e2a\u4e3aPython\u8bbe\u8ba1\u7684\u6781\u7b80\u3001\u4f18\u96c5\u7684\u81ea\u7136\u8bed\u8a00\u5904\u7406\uff08NLP\uff09\u5de5\u5177\u5305\u3002",
    "version": "0.1.0",
    "project_urls": {
        "Bug Tracker": "https://github.com/Hellohistory/zenlp/issues",
        "Homepage": "https://github.com/Hellohistory/zenlp"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "33a19cefe8e562446a3c8d4e16d358d073fe81f9d08dc8590705e029e1148557",
                "md5": "be0f3ea72bd29d4e18848658a132209a",
                "sha256": "b0ae106afb64cafc40296095dd44ecd5590fab1ca2fce8198d54c84a2e465f66"
            },
            "downloads": -1,
            "filename": "zenlp-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "be0f3ea72bd29d4e18848658a132209a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 7398,
            "upload_time": "2025-07-19T16:42:53",
            "upload_time_iso_8601": "2025-07-19T16:42:53.190263Z",
            "url": "https://files.pythonhosted.org/packages/33/a1/9cefe8e562446a3c8d4e16d358d073fe81f9d08dc8590705e029e1148557/zenlp-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7e5e1a9e58557990859f2463fa190b9f6bed97f4e1c753895484596325d0ae26",
                "md5": "d6e3eaa837f9dc1ec50e236ffcdcfd26",
                "sha256": "e0a1a3f1331403b219d41b94c69fb9e7f7ae603ec4cf41d8f33eeb1ee4e52f0a"
            },
            "downloads": -1,
            "filename": "zenlp-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "d6e3eaa837f9dc1ec50e236ffcdcfd26",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 7216,
            "upload_time": "2025-07-19T16:42:54",
            "upload_time_iso_8601": "2025-07-19T16:42:54.341601Z",
            "url": "https://files.pythonhosted.org/packages/7e/5e/1a9e58557990859f2463fa190b9f6bed97f4e1c753895484596325d0ae26/zenlp-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-19 16:42:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Hellohistory",
    "github_project": "zenlp",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "pytest",
            "specs": [
                [
                    "~=",
                    "8.4.1"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "~=",
                    "4.67.1"
                ]
            ]
        }
    ],
    "lcname": "zenlp"
}
        
Elapsed time: 2.27418s