hanzidentifier


Namehanzidentifier JSON
Version 1.2.0 PyPI version JSON
download
home_pageNone
SummaryPython module that identifies Chinese text as Simplified or Traditional.
upload_time2023-06-30 12:01:43
maintainerNone
docs_urlNone
authorNone
requires_python>=3.7
licenseNone
keywords characters chinese cjk hanzi mandarin simplified traditional
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ================
Hanzi Identifier
================

.. image:: https://badge.fury.io/py/hanzidentifier.svg
    :target: https://pypi.org/project/hanzidentifier

.. image:: https://github.com/tsroten/hanzidentifier/actions/workflows/ci.yml/badge.svg
    :target: https://github.com/tsroten/hanzidentifier/actions/workflows/ci.yml

Hanzi Identifier is a simple Python module that identifies a string of text as 
having Simplified or Traditional characters.

* GitHub: https://github.com/tsroten/hanzidentifier
* Free software: MIT license

About
-----

Easy-to-use helper functions for identifying strings:

.. code:: python

    >>> import hanzidentifier
    >>> hanzidentifier.has_chinese('Hello my name is John.')
    False
    >>> hanzidentifier.is_simplified('John说:你好!')
    True
    >>> hanzidentifier.is_traditional('John說:你好!')
    True
    >>> hanzidentifier.has_chinese('Country in Simplified: 国家. Country in Traditional: 國家.')
    True

Here it is without the helper functions:

.. code:: python

    >>> hanzidentifier.identify('Hello my name is Thomas.') is hanzidentifier.UNKNOWN
    True
    >>> hanzidentifier.identify('Thomas 说:你好!') is hanzidentifier.SIMPLIFIED
    True
    >>> hanzidentifier.identify('Thomas 說:你好!') is hanzidentifier.TRADITIONAL
    True
    >>> hanzidentifier.identify('你好!') is hanzidentifier.BOTH
    True
    >>> hanzidentifier.identify('Country in Simplified: 国家. Country in Traditional: 國家.' ) is hanzidentifier.MIXED
    True

``hanzidentifier.identify`` has five possible return values:

* ``hanzidentifier.UNKNOWN``: there are no recognized Chinese characters in the string.
* ``hanzidentifier.BOTH``: the string is compatible with both Simplified and Traditional character systems.
* ``hanzidentifier.TRADITIONAL``: the string consists of Traditional characters.
* ``hanzidentifier.SIMPLIFIED``: the string consists of Simplified characters.
* ``hanzidentifier.MIXED``: the string consists of characters recognized solely as Traditional characters and also consists of characters recognized solely as Simplified characters.

Characters that aren't found in CC-CEDICT are ignored when determining a string's identity.
Hanzi Identifier uses the CC-CEDICT data provided by `Zhon <https://github.com/tsroten/zhon>`_ to identify Chinese characters.

Because the Traditional and Simplified Chinese character systems overlap, a
string containing Simplified characters could identify as
``hanzidentifier.SIMPLIFIED`` or ``hanzidentifier.BOTH`` depending on if the
characters are also Traditional characters.

Getting Started
---------------

* Install Hanzi Identifier: ``$ pip install hanzidentifier``
* Report bugs and ask questions via `GitHub Issues <https://github.com/tsroten/hanzidentifier/issues>`_
* `Contribute features or bug fixes <https://github.com/tsroten/hanzidentifier/pulls>`_
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "hanzidentifier",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": null,
    "keywords": "characters,chinese,cjk,hanzi,mandarin,simplified,traditional",
    "author": null,
    "author_email": "Thomas Roten <thomas@roten.us>",
    "download_url": "https://files.pythonhosted.org/packages/b1/c6/1f42864ea272c5497ba78858f6b4082cca37cf4f59b60effdc60273e2c1f/hanzidentifier-1.2.0.tar.gz",
    "platform": null,
    "description": "================\nHanzi Identifier\n================\n\n.. image:: https://badge.fury.io/py/hanzidentifier.svg\n    :target: https://pypi.org/project/hanzidentifier\n\n.. image:: https://github.com/tsroten/hanzidentifier/actions/workflows/ci.yml/badge.svg\n    :target: https://github.com/tsroten/hanzidentifier/actions/workflows/ci.yml\n\nHanzi Identifier is a simple Python module that identifies a string of text as \nhaving Simplified or Traditional characters.\n\n* GitHub: https://github.com/tsroten/hanzidentifier\n* Free software: MIT license\n\nAbout\n-----\n\nEasy-to-use helper functions for identifying strings:\n\n.. code:: python\n\n    >>> import hanzidentifier\n    >>> hanzidentifier.has_chinese('Hello my name is John.')\n    False\n    >>> hanzidentifier.is_simplified('John\u8bf4\uff1a\u4f60\u597d\uff01')\n    True\n    >>> hanzidentifier.is_traditional('John\u8aaa\uff1a\u4f60\u597d\uff01')\n    True\n    >>> hanzidentifier.has_chinese('Country in Simplified: \u56fd\u5bb6. Country in Traditional: \u570b\u5bb6.')\n    True\n\nHere it is without the helper functions:\n\n.. code:: python\n\n    >>> hanzidentifier.identify('Hello my name is Thomas.') is hanzidentifier.UNKNOWN\n    True\n    >>> hanzidentifier.identify('Thomas \u8bf4\uff1a\u4f60\u597d\uff01') is hanzidentifier.SIMPLIFIED\n    True\n    >>> hanzidentifier.identify('Thomas \u8aaa\uff1a\u4f60\u597d\uff01') is hanzidentifier.TRADITIONAL\n    True\n    >>> hanzidentifier.identify('\u4f60\u597d\uff01') is hanzidentifier.BOTH\n    True\n    >>> hanzidentifier.identify('Country in Simplified: \u56fd\u5bb6. Country in Traditional: \u570b\u5bb6.' ) is hanzidentifier.MIXED\n    True\n\n``hanzidentifier.identify`` has five possible return values:\n\n* ``hanzidentifier.UNKNOWN``: there are no recognized Chinese characters in the string.\n* ``hanzidentifier.BOTH``: the string is compatible with both Simplified and Traditional character systems.\n* ``hanzidentifier.TRADITIONAL``: the string consists of Traditional characters.\n* ``hanzidentifier.SIMPLIFIED``: the string consists of Simplified characters.\n* ``hanzidentifier.MIXED``: the string consists of characters recognized solely as Traditional characters and also consists of characters recognized solely as Simplified characters.\n\nCharacters that aren't found in CC-CEDICT are ignored when determining a string's identity.\nHanzi Identifier uses the CC-CEDICT data provided by `Zhon <https://github.com/tsroten/zhon>`_ to identify Chinese characters.\n\nBecause the Traditional and Simplified Chinese character systems overlap, a\nstring containing Simplified characters could identify as\n``hanzidentifier.SIMPLIFIED`` or ``hanzidentifier.BOTH`` depending on if the\ncharacters are also Traditional characters.\n\nGetting Started\n---------------\n\n* Install Hanzi Identifier: ``$ pip install hanzidentifier``\n* Report bugs and ask questions via `GitHub Issues <https://github.com/tsroten/hanzidentifier/issues>`_\n* `Contribute features or bug fixes <https://github.com/tsroten/hanzidentifier/pulls>`_",
    "bugtrack_url": null,
    "license": null,
    "summary": "Python module that identifies Chinese text as Simplified or Traditional.",
    "version": "1.2.0",
    "project_urls": {
        "Changes": "https://github.com/tsroten/hanzidentifier/blob/main/CHANGES.rst",
        "Issue Tracker": "https://github.com/tsroten/hanzidentifier/issues",
        "Source Code": "https://github.com/tsroten/hanzidentifier"
    },
    "split_keywords": [
        "characters",
        "chinese",
        "cjk",
        "hanzi",
        "mandarin",
        "simplified",
        "traditional"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c67c1884b2e27fb81fbe429c390e5bde41ea3742c2ce0875fbf0be766d7586a0",
                "md5": "ba1d1ef8d3ae0b4e9650de5c19da2dee",
                "sha256": "022cbb3aa01ff87b41caa7dbb6e917463a09f399d6d2b9d5499f34d6f6cc1218"
            },
            "downloads": -1,
            "filename": "hanzidentifier-1.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ba1d1ef8d3ae0b4e9650de5c19da2dee",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 4817,
            "upload_time": "2023-06-30T12:01:42",
            "upload_time_iso_8601": "2023-06-30T12:01:42.166925Z",
            "url": "https://files.pythonhosted.org/packages/c6/7c/1884b2e27fb81fbe429c390e5bde41ea3742c2ce0875fbf0be766d7586a0/hanzidentifier-1.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b1c61f42864ea272c5497ba78858f6b4082cca37cf4f59b60effdc60273e2c1f",
                "md5": "fc0c73e34e87d5f8b8e2630b7ad471a7",
                "sha256": "8e4198ae87c1da80d77cde46d7e90cb50d1a7561ee8b33e725058a2b0d70e83d"
            },
            "downloads": -1,
            "filename": "hanzidentifier-1.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "fc0c73e34e87d5f8b8e2630b7ad471a7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 3934,
            "upload_time": "2023-06-30T12:01:43",
            "upload_time_iso_8601": "2023-06-30T12:01:43.696773Z",
            "url": "https://files.pythonhosted.org/packages/b1/c6/1f42864ea272c5497ba78858f6b4082cca37cf4f59b60effdc60273e2c1f/hanzidentifier-1.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-30 12:01:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tsroten",
    "github_project": "hanzidentifier",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "hanzidentifier"
}
        
Elapsed time: 0.08059s