====
Zhon
====
.. image:: https://badge.fury.io/py/zhon.svg
:target: https://pypi.org/project/zhon
.. image:: https://github.com/tsroten/zhon/actions/workflows/ci.yml/badge.svg
:target: https://github.com/tsroten/zhon/actions/workflows/ci.yml
Zhon is a Python library that provides constants commonly used in Chinese text
processing.
* Documentation: https://tsroten.github.io/zhon/
* GitHub: https://github.com/tsroten/zhon
* Support: https://github.com/tsroten/zhon/issues
* Free software: `MIT license <http://opensource.org/licenses/MIT>`_
About
-----
Zhon's constants can be used in Chinese text processing, for example:
* Find CJK characters in a string:
.. code:: python
>>> re.findall('[{}]'.format(zhon.hanzi.characters), 'I broke a plate: 我打破了一个盘子.')
['我', '打', '破', '了', '一', '个', '盘', '子']
* Validate Pinyin syllables, words, or sentences:
.. code:: python
>>> re.findall(zhon.pinyin.syllable, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I)
['Yuàn', 'zi', 'lǐ', 'tíng', 'zhe', 'yí', 'liàng', 'chē']
>>> re.findall(zhon.pinyin.word, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I)
['Yuànzi', 'lǐ', 'tíngzhe', 'yí', 'liàng', 'chē']
>>> re.findall(zhon.pinyin.sentence, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I)
['Yuànzi lǐ tíngzhe yí liàng chē.']
Features
--------
Zhon includes the following commonly-used constants:
* CJK characters and radicals
* Chinese punctuation marks
* Chinese sentence regular expression pattern
* Pinyin vowels, consonants, lowercase, uppercase, and punctuation
* Pinyin syllable, word, and sentence regular expression patterns
* Zhuyin characters and marks
* Zhuyin syllable regular expression pattern
* CC-CEDICT characters
Getting Started
---------------
* `Install Zhon <https://tsroten.github.io/zhon/installation.html>`_
* `Learn how to use Zhon <https://tsroten.github.io/zhon/api.html>`_
* `Contribute <https://github.com/tsroten/zhon/blob/develop/CONTRIBUTING.rst>`_ documentation, code, or feedback
Raw data
{
"_id": null,
"home_page": null,
"name": "zhon",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "cc-cedict, cedict, characters, chinese, cjk, han, hanzi, mandarin, pinyin, punctuation, radicals, segmentation, simplified, tokenization, traditional, unicode, zhuyin",
"author": "Thomas Roten",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/2a/8d/e6c8ee4df75940ddecb6f5e7b2753a9a3914e6cd357e7238987661723d81/zhon-2.1.1.tar.gz",
"platform": null,
"description": "====\nZhon\n====\n\n.. image:: https://badge.fury.io/py/zhon.svg\n :target: https://pypi.org/project/zhon\n\n.. image:: https://github.com/tsroten/zhon/actions/workflows/ci.yml/badge.svg\n :target: https://github.com/tsroten/zhon/actions/workflows/ci.yml\n\nZhon is a Python library that provides constants commonly used in Chinese text\nprocessing.\n\n* Documentation: https://tsroten.github.io/zhon/\n* GitHub: https://github.com/tsroten/zhon\n* Support: https://github.com/tsroten/zhon/issues\n* Free software: `MIT license <http://opensource.org/licenses/MIT>`_\n\nAbout\n-----\n\nZhon's constants can be used in Chinese text processing, for example:\n\n* Find CJK characters in a string:\n\n .. code:: python\n\n >>> re.findall('[{}]'.format(zhon.hanzi.characters), 'I broke a plate: \u6211\u6253\u7834\u4e86\u4e00\u4e2a\u76d8\u5b50.')\n ['\u6211', '\u6253', '\u7834', '\u4e86', '\u4e00', '\u4e2a', '\u76d8', '\u5b50']\n\n* Validate Pinyin syllables, words, or sentences:\n\n .. code:: python\n\n >>> re.findall(zhon.pinyin.syllable, 'Yu\u00e0nzi l\u01d0 t\u00edngzhe y\u00ed li\u00e0ng ch\u0113.', re.I)\n ['Yu\u00e0n', 'zi', 'l\u01d0', 't\u00edng', 'zhe', 'y\u00ed', 'li\u00e0ng', 'ch\u0113']\n\n >>> re.findall(zhon.pinyin.word, 'Yu\u00e0nzi l\u01d0 t\u00edngzhe y\u00ed li\u00e0ng ch\u0113.', re.I)\n ['Yu\u00e0nzi', 'l\u01d0', 't\u00edngzhe', 'y\u00ed', 'li\u00e0ng', 'ch\u0113']\n\n >>> re.findall(zhon.pinyin.sentence, 'Yu\u00e0nzi l\u01d0 t\u00edngzhe y\u00ed li\u00e0ng ch\u0113.', re.I)\n ['Yu\u00e0nzi l\u01d0 t\u00edngzhe y\u00ed li\u00e0ng ch\u0113.']\n\nFeatures\n--------\n\nZhon includes the following commonly-used constants:\n\n* CJK characters and radicals\n* Chinese punctuation marks\n* Chinese sentence regular expression pattern\n* Pinyin vowels, consonants, lowercase, uppercase, and punctuation\n* Pinyin syllable, word, and sentence regular expression patterns\n* Zhuyin characters and marks\n* Zhuyin syllable regular expression pattern\n* CC-CEDICT characters\n\nGetting Started\n---------------\n\n* `Install Zhon <https://tsroten.github.io/zhon/installation.html>`_\n* `Learn how to use Zhon <https://tsroten.github.io/zhon/api.html>`_\n* `Contribute <https://github.com/tsroten/zhon/blob/develop/CONTRIBUTING.rst>`_ documentation, code, or feedback\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Zhon provides constants used in Chinese text processing.",
"version": "2.1.1",
"project_urls": {
"Changes": "https://tsroten.github.io/zhon/history.html",
"Documentation": "https://tsroten.github.io/zhon",
"Issue Tracker": "https://github.com/tsroten/zhon/issues",
"Source Code": "https://github.com/tsroten/zhon"
},
"split_keywords": [
"cc-cedict",
" cedict",
" characters",
" chinese",
" cjk",
" han",
" hanzi",
" mandarin",
" pinyin",
" punctuation",
" radicals",
" segmentation",
" simplified",
" tokenization",
" traditional",
" unicode",
" zhuyin"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "be13b8d6ca3a41bf5959bc3915678837385bf668162a7295140d81ccae0b96b7",
"md5": "dc7ab2199e4095e133f2ecec9c0e4073",
"sha256": "9ec0d42229ae60708a045270f755ccefb48f2689f2a6ce0bc6597abf6c4a9871"
},
"downloads": -1,
"filename": "zhon-2.1.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "dc7ab2199e4095e133f2ecec9c0e4073",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 83933,
"upload_time": "2024-11-20T00:29:08",
"upload_time_iso_8601": "2024-11-20T00:29:08.209712Z",
"url": "https://files.pythonhosted.org/packages/be/13/b8d6ca3a41bf5959bc3915678837385bf668162a7295140d81ccae0b96b7/zhon-2.1.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "2a8de6c8ee4df75940ddecb6f5e7b2753a9a3914e6cd357e7238987661723d81",
"md5": "0a76747d186de6a60b8c8fdef42c5903",
"sha256": "c8da424fad4aa698ddd3e1735515b49ff522c84dd1091e8fafd986ac7c6898f0"
},
"downloads": -1,
"filename": "zhon-2.1.1.tar.gz",
"has_sig": false,
"md5_digest": "0a76747d186de6a60b8c8fdef42c5903",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 84448,
"upload_time": "2024-11-20T00:29:10",
"upload_time_iso_8601": "2024-11-20T00:29:10.119160Z",
"url": "https://files.pythonhosted.org/packages/2a/8d/e6c8ee4df75940ddecb6f5e7b2753a9a3914e6cd357e7238987661723d81/zhon-2.1.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-20 00:29:10",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "tsroten",
"github_project": "zhon",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "zhon"
}