================
Hanzi Identifier
================
.. image:: https://badge.fury.io/py/hanzidentifier.svg
:target: https://pypi.org/project/hanzidentifier
.. image:: https://github.com/tsroten/hanzidentifier/actions/workflows/ci.yml/badge.svg
:target: https://github.com/tsroten/hanzidentifier/actions/workflows/ci.yml
Hanzi Identifier is a simple Python module that identifies a string of text as
having Simplified or Traditional characters.
* GitHub: https://github.com/tsroten/hanzidentifier
* Free software: MIT license
About
-----
Easy-to-use helper functions for identifying strings:
.. code:: python
>>> import hanzidentifier
>>> hanzidentifier.has_chinese('Hello my name is John.')
False
>>> hanzidentifier.is_simplified('John说:你好!')
True
>>> hanzidentifier.is_traditional('John說:你好!')
True
>>> hanzidentifier.has_chinese('Country in Simplified: 国家. Country in Traditional: 國家.')
True
Here it is without the helper functions:
.. code:: python
>>> hanzidentifier.identify('Hello my name is Thomas.') is hanzidentifier.UNKNOWN
True
>>> hanzidentifier.identify('Thomas 说:你好!') is hanzidentifier.SIMPLIFIED
True
>>> hanzidentifier.identify('Thomas 說:你好!') is hanzidentifier.TRADITIONAL
True
>>> hanzidentifier.identify('你好!') is hanzidentifier.BOTH
True
>>> hanzidentifier.identify('Country in Simplified: 国家. Country in Traditional: 國家.' ) is hanzidentifier.MIXED
True
``hanzidentifier.identify`` has five possible return values:
* ``hanzidentifier.UNKNOWN``: there are no recognized Chinese characters in the string.
* ``hanzidentifier.BOTH``: the string is compatible with both Simplified and Traditional character systems.
* ``hanzidentifier.TRADITIONAL``: the string consists of Traditional characters.
* ``hanzidentifier.SIMPLIFIED``: the string consists of Simplified characters.
* ``hanzidentifier.MIXED``: the string consists of characters recognized solely as Traditional characters and also consists of characters recognized solely as Simplified characters.
Characters that aren't found in CC-CEDICT are ignored when determining a string's identity.
Hanzi Identifier uses the CC-CEDICT data provided by `Zhon <https://github.com/tsroten/zhon>`_ to identify Chinese characters.
Because the Traditional and Simplified Chinese character systems overlap, a
string containing Simplified characters could identify as
``hanzidentifier.SIMPLIFIED`` or ``hanzidentifier.BOTH`` depending on if the
characters are also Traditional characters.
Getting Started
---------------
* Install Hanzi Identifier: ``$ pip install hanzidentifier``
* Report bugs and ask questions via `GitHub Issues <https://github.com/tsroten/hanzidentifier/issues>`_
* `Contribute features or bug fixes <https://github.com/tsroten/hanzidentifier/pulls>`_
Raw data
{
"_id": null,
"home_page": null,
"name": "hanzidentifier",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "characters, chinese, cjk, hanzi, mandarin, simplified, traditional",
"author": "Thomas Roten",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/cb/32/c1f5f7cf4d4372c7778e34f116c52539db717b73e1ec0b149ce3717b1c9c/hanzidentifier-1.3.0.tar.gz",
"platform": null,
"description": "================\nHanzi Identifier\n================\n\n.. image:: https://badge.fury.io/py/hanzidentifier.svg\n :target: https://pypi.org/project/hanzidentifier\n\n.. image:: https://github.com/tsroten/hanzidentifier/actions/workflows/ci.yml/badge.svg\n :target: https://github.com/tsroten/hanzidentifier/actions/workflows/ci.yml\n\nHanzi Identifier is a simple Python module that identifies a string of text as \nhaving Simplified or Traditional characters.\n\n* GitHub: https://github.com/tsroten/hanzidentifier\n* Free software: MIT license\n\nAbout\n-----\n\nEasy-to-use helper functions for identifying strings:\n\n.. code:: python\n\n >>> import hanzidentifier\n >>> hanzidentifier.has_chinese('Hello my name is John.')\n False\n >>> hanzidentifier.is_simplified('John\u8bf4\uff1a\u4f60\u597d\uff01')\n True\n >>> hanzidentifier.is_traditional('John\u8aaa\uff1a\u4f60\u597d\uff01')\n True\n >>> hanzidentifier.has_chinese('Country in Simplified: \u56fd\u5bb6. Country in Traditional: \u570b\u5bb6.')\n True\n\nHere it is without the helper functions:\n\n.. code:: python\n\n >>> hanzidentifier.identify('Hello my name is Thomas.') is hanzidentifier.UNKNOWN\n True\n >>> hanzidentifier.identify('Thomas \u8bf4\uff1a\u4f60\u597d\uff01') is hanzidentifier.SIMPLIFIED\n True\n >>> hanzidentifier.identify('Thomas \u8aaa\uff1a\u4f60\u597d\uff01') is hanzidentifier.TRADITIONAL\n True\n >>> hanzidentifier.identify('\u4f60\u597d\uff01') is hanzidentifier.BOTH\n True\n >>> hanzidentifier.identify('Country in Simplified: \u56fd\u5bb6. Country in Traditional: \u570b\u5bb6.' ) is hanzidentifier.MIXED\n True\n\n``hanzidentifier.identify`` has five possible return values:\n\n* ``hanzidentifier.UNKNOWN``: there are no recognized Chinese characters in the string.\n* ``hanzidentifier.BOTH``: the string is compatible with both Simplified and Traditional character systems.\n* ``hanzidentifier.TRADITIONAL``: the string consists of Traditional characters.\n* ``hanzidentifier.SIMPLIFIED``: the string consists of Simplified characters.\n* ``hanzidentifier.MIXED``: the string consists of characters recognized solely as Traditional characters and also consists of characters recognized solely as Simplified characters.\n\nCharacters that aren't found in CC-CEDICT are ignored when determining a string's identity.\nHanzi Identifier uses the CC-CEDICT data provided by `Zhon <https://github.com/tsroten/zhon>`_ to identify Chinese characters.\n\nBecause the Traditional and Simplified Chinese character systems overlap, a\nstring containing Simplified characters could identify as\n``hanzidentifier.SIMPLIFIED`` or ``hanzidentifier.BOTH`` depending on if the\ncharacters are also Traditional characters.\n\nGetting Started\n---------------\n\n* Install Hanzi Identifier: ``$ pip install hanzidentifier``\n* Report bugs and ask questions via `GitHub Issues <https://github.com/tsroten/hanzidentifier/issues>`_\n* `Contribute features or bug fixes <https://github.com/tsroten/hanzidentifier/pulls>`_",
"bugtrack_url": null,
"license": "MIT",
"summary": "Python module that identifies Chinese text as Simplified or Traditional.",
"version": "1.3.0",
"project_urls": {
"Changes": "https://github.com/tsroten/hanzidentifier/blob/main/CHANGES.rst",
"Issue Tracker": "https://github.com/tsroten/hanzidentifier/issues",
"Source Code": "https://github.com/tsroten/hanzidentifier"
},
"split_keywords": [
"characters",
" chinese",
" cjk",
" hanzi",
" mandarin",
" simplified",
" traditional"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "14f694c3975b5e06ea4084056413da975395924ff0ddd89e975427bd46684359",
"md5": "badcb3fe552f89917b32098d425aa567",
"sha256": "8e36204a96072c328c8bc6ae9342fc6f9faea10bc519ff51da3ef6e6e0cee5a6"
},
"downloads": -1,
"filename": "hanzidentifier-1.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "badcb3fe552f89917b32098d425aa567",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 4777,
"upload_time": "2024-11-20T00:56:40",
"upload_time_iso_8601": "2024-11-20T00:56:40.668873Z",
"url": "https://files.pythonhosted.org/packages/14/f6/94c3975b5e06ea4084056413da975395924ff0ddd89e975427bd46684359/hanzidentifier-1.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "cb32c1f5f7cf4d4372c7778e34f116c52539db717b73e1ec0b149ce3717b1c9c",
"md5": "b8ec32e35389bc674c7173176d813d7f",
"sha256": "7acd99ef00c69ab57f3441e2b3568348b58ef115500e5a01f66cedea603b3101"
},
"downloads": -1,
"filename": "hanzidentifier-1.3.0.tar.gz",
"has_sig": false,
"md5_digest": "b8ec32e35389bc674c7173176d813d7f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 3890,
"upload_time": "2024-11-20T00:56:39",
"upload_time_iso_8601": "2024-11-20T00:56:39.286411Z",
"url": "https://files.pythonhosted.org/packages/cb/32/c1f5f7cf4d4372c7778e34f116c52539db717b73e1ec0b149ce3717b1c9c/hanzidentifier-1.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-20 00:56:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "tsroten",
"github_project": "hanzidentifier",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "hanzidentifier"
}