yitizi


Nameyitizi JSON
Version 0.1.2 PyPI version JSON
download
home_pagehttps://github.com/nk2028/yitizi
SummaryInput a Chinese character. Output all the variant characters of it.
upload_time2024-08-29 15:21:53
maintainerNone
docs_urlNone
authorNgiox Khyen 2028 Project
requires_python<4,>=3.5
licenseNone
keywords chinese chinese-character natural-language-processing
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Yitizi

[![](https://badge.fury.io/py/yitizi.svg)](https://pypi.org/project/yitizi/) [![](https://badge.fury.io/js/yitizi.svg)](https://www.npmjs.com/package/yitizi) [![](https://data.jsdelivr.com/v1/package/npm/yitizi/badge)](https://www.jsdelivr.com/package/npm/yitizi) [![](https://github.com/nk2028/yitizi/workflows/Package/badge.svg)](https://github.com/nk2028/yitizi/actions?query=workflow%3APackage)

Input a Chinese character. Output all the variant characters of it.<br>
輸入一個漢字,輸出它的全部異體字。<br>
输入一个汉字,输出它的全部异体字。

## Usage

### Python

```sh
pip install yitizi
```

```python
>>> import yitizi
>>> yitizi.get('和')
['咊', '龢']
```

### JavaScript (Node.js)

```sh
npm install yitizi
```

```javascript
> const Yitizi = require('yitizi');
> Yitizi.get('和');
[ '咊', '龢' ]
```

### JavaScript (browser)

```html
<script src="https://cdn.jsdelivr.net/npm/yitizi@0.1.2"></script>
```

```javascript
> Yitizi.get('和');
[ '咊', '龢' ]
```

## Design

Connections between variant characters can be modeled as an _graph_ with characters as vertices, where two characters are variants of each other if they are _directly_ connected by an edge.

To reduce data redundancy, only several types of basic connections are stored in data tables located in `data/`, from which the full graph `yitizi.json` is computed by invoking `build/main.py`.

### Basic connections

A basic connection between two variant characters can be classified into one of the three types: equivalent, intersecting, simplification.

- Equivalent "全等": Two characters are equivalent only if they are interchangable in most texts without change in the meaning. When computing the full graph, it is considered both commutative and transitive, i.e.

  - If A is an equivalent variant of B, then B is an equivalent variant of A;
  - If A is an equivalent variant of B, and B is an equivalent variant of C, then A is an equivalent variant of C.

- Intersecting "語義交疊": Two characters are intersecting variants if they are interchangable in certain cases. It is also commutative, but not necessarily transitive. Characters with intersecting variants are arranged in groups (rows in data files), with each group having specific meanings shared by its listed characters. A character can belong to multiple groups.

  Example: "閒" has two intersecting variants: "閑" and "間", listed in two groups:

  ```conf
  閒閑  # meaning "vacant"
  閒間  # meaning "in the middle"
  閑>闲  # simplified form (same below)
  間>间
  ```

  Then in the computed `yitizi.json`:

  - 閒 and 閑 (闲) are variants of each other;
  - 閒 and 間 (间) are variants of each other;
  - 閑 (闲) and 間 (间) are unrelated.

  ![Example I-1](demo/example-i-1.png)

  A more complex (though abstract) example:

  ```conf
  =AB  # "=" means equivalent variants
  ACD
  AEFG
  ```

  - A, B, C and D are variants of one another;
  - A, B, E, F and G are variants of one another;
  - No connections between C (or D) and E (or F/G).

  ![Example I-2](demo/example-i-2.png)

- Simplification "簡體": A non-transitive and asymmetric connection. A simplified character is associated only with its traditional form.

  Example 1: "么" is 1) a simplified form of "麼", 2) an equivalent variant of "幺"; "麼" has an equivalent variant "麽", then:

  - 麼, 麽 and 么 are variants of one another;
  - 幺 and 么 are variants of each other;
  - 麼 or 麽 is unrelated to 幺.

  ![Example S-1](demo/example-s-1.png)

  Example 2: "苧" is 1) a simplified form of "薴", 2) a traditional form of "苎", then:

  - 苧 is a variant of 薴 and 苎;
  - 薴 and 苎 are unrelated.

  ![Example S-2](demo/example-s-2.png)

  Example 3: "芸" is a simplified form of "藝" (Japanese _Shinjitai_) and "蕓" (Chinese), and "艺" is also a simplified form of "藝" (Chinese), then:

  - 藝, 芸 and 艺 are variants of one another;
  - 蕓 and 芸 are variants of each other;
  - 藝 or 艺 is unrelated to 蕓.

  ![Example S-3](demo/example-s-3.png)

### Data source

- `data/ytenx`: From [BYVoid/ytenx](https://github.com/BYVoid/ytenx/tree/d95d2477f031377e9a1ef022fa574287184bcce8/ytenx/sync/jihthex)
- `data/opencc`: From [BYVoid/OpenCC](https://github.com/BYVoid/OpenCC/tree/556ed22496d650bd0b13b6c163be9814637970ae/data/dictionary)

## Note for developers

You need to substitute all the occurrences of the version string before publishing a new release.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/nk2028/yitizi",
    "name": "yitizi",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4,>=3.5",
    "maintainer_email": null,
    "keywords": "chinese chinese-character natural-language-processing",
    "author": "Ngiox Khyen 2028 Project",
    "author_email": "support@nk2028.shn.hk",
    "download_url": "https://files.pythonhosted.org/packages/2d/76/d7e2090c1e381f75c3b0b73d53ffbf237c9bf80ed5000d269ecd37d1cfea/yitizi-0.1.2.tar.gz",
    "platform": null,
    "description": "# Yitizi\n\n[![](https://badge.fury.io/py/yitizi.svg)](https://pypi.org/project/yitizi/) [![](https://badge.fury.io/js/yitizi.svg)](https://www.npmjs.com/package/yitizi) [![](https://data.jsdelivr.com/v1/package/npm/yitizi/badge)](https://www.jsdelivr.com/package/npm/yitizi) [![](https://github.com/nk2028/yitizi/workflows/Package/badge.svg)](https://github.com/nk2028/yitizi/actions?query=workflow%3APackage)\n\nInput a Chinese character. Output all the variant characters of it.<br>\n\u8f38\u5165\u4e00\u500b\u6f22\u5b57\uff0c\u8f38\u51fa\u5b83\u7684\u5168\u90e8\u7570\u9ad4\u5b57\u3002<br>\n\u8f93\u5165\u4e00\u4e2a\u6c49\u5b57\uff0c\u8f93\u51fa\u5b83\u7684\u5168\u90e8\u5f02\u4f53\u5b57\u3002\n\n## Usage\n\n### Python\n\n```sh\npip install yitizi\n```\n\n```python\n>>> import yitizi\n>>> yitizi.get('\u548c')\n['\u548a', '\u9fa2']\n```\n\n### JavaScript (Node.js)\n\n```sh\nnpm install yitizi\n```\n\n```javascript\n> const Yitizi = require('yitizi');\n> Yitizi.get('\u548c');\n[ '\u548a', '\u9fa2' ]\n```\n\n### JavaScript (browser)\n\n```html\n<script src=\"https://cdn.jsdelivr.net/npm/yitizi@0.1.2\"></script>\n```\n\n```javascript\n> Yitizi.get('\u548c');\n[ '\u548a', '\u9fa2' ]\n```\n\n## Design\n\nConnections between variant characters can be modeled as an _graph_ with characters as vertices, where two characters are variants of each other if they are _directly_ connected by an edge.\n\nTo reduce data redundancy, only several types of basic connections are stored in data tables located in `data/`, from which the full graph `yitizi.json` is computed by invoking `build/main.py`.\n\n### Basic connections\n\nA basic connection between two variant characters can be classified into one of the three types: equivalent, intersecting, simplification.\n\n- Equivalent \"\u5168\u7b49\": Two characters are equivalent only if they are interchangable in most texts without change in the meaning. When computing the full graph, it is considered both commutative and transitive, i.e.\n\n  - If A is an equivalent variant of B, then B is an equivalent variant of A;\n  - If A is an equivalent variant of B, and B is an equivalent variant of C, then A is an equivalent variant of C.\n\n- Intersecting \"\u8a9e\u7fa9\u4ea4\u758a\": Two characters are intersecting variants if they are interchangable in certain cases. It is also commutative, but not necessarily transitive. Characters with intersecting variants are arranged in groups (rows in data files), with each group having specific meanings shared by its listed characters. A character can belong to multiple groups.\n\n  Example: \"\u9592\" has two intersecting variants: \"\u9591\" and \"\u9593\", listed in two groups:\n\n  ```conf\n  \u9592\u9591  # meaning \"vacant\"\n  \u9592\u9593  # meaning \"in the middle\"\n  \u9591>\u95f2  # simplified form (same below)\n  \u9593>\u95f4\n  ```\n\n  Then in the computed `yitizi.json`:\n\n  - \u9592 and \u9591 (\u95f2) are variants of each other;\n  - \u9592 and \u9593 (\u95f4) are variants of each other;\n  - \u9591 (\u95f2) and \u9593 (\u95f4) are unrelated.\n\n  ![Example I-1](demo/example-i-1.png)\n\n  A more complex (though abstract) example:\n\n  ```conf\n  =\uff21\uff22  # \"=\" means equivalent variants\n  \uff21\uff23\uff24\n  \uff21\uff25\uff26\uff27\n  ```\n\n  - A, B, C and D are variants of one another;\n  - A, B, E, F and G are variants of one another;\n  - No connections between C (or D) and E (or F/G).\n\n  ![Example I-2](demo/example-i-2.png)\n\n- Simplification \"\u7c21\u9ad4\": A non-transitive and asymmetric connection. A simplified character is associated only with its traditional form.\n\n  Example 1: \"\u4e48\" is 1) a simplified form of \"\u9ebc\", 2) an equivalent variant of \"\u5e7a\"; \"\u9ebc\" has an equivalent variant \"\u9ebd\", then:\n\n  - \u9ebc, \u9ebd and \u4e48 are variants of one another;\n  - \u5e7a and \u4e48 are variants of each other;\n  - \u9ebc or \u9ebd is unrelated to \u5e7a.\n\n  ![Example S-1](demo/example-s-1.png)\n\n  Example 2: \"\u82e7\" is 1) a simplified form of \"\u85b4\", 2) a traditional form of \"\u82ce\", then:\n\n  - \u82e7 is a variant of \u85b4 and \u82ce;\n  - \u85b4 and \u82ce are unrelated.\n\n  ![Example S-2](demo/example-s-2.png)\n\n  Example 3: \"\u82b8\" is a simplified form of \"\u85dd\" (Japanese _Shinjitai_) and \"\u8553\" (Chinese), and \"\u827a\" is also a simplified form of \"\u85dd\" (Chinese), then:\n\n  - \u85dd, \u82b8 and \u827a are variants of one another;\n  - \u8553 and \u82b8 are variants of each other;\n  - \u85dd or \u827a is unrelated to \u8553.\n\n  ![Example S-3](demo/example-s-3.png)\n\n### Data source\n\n- `data/ytenx`: From [BYVoid/ytenx](https://github.com/BYVoid/ytenx/tree/d95d2477f031377e9a1ef022fa574287184bcce8/ytenx/sync/jihthex)\n- `data/opencc`: From [BYVoid/OpenCC](https://github.com/BYVoid/OpenCC/tree/556ed22496d650bd0b13b6c163be9814637970ae/data/dictionary)\n\n## Note for developers\n\nYou need to substitute all the occurrences of the version string before publishing a new release.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Input a Chinese character. Output all the variant characters of it.",
    "version": "0.1.2",
    "project_urls": {
        "Bug Reports": "https://github.com/nk2028/yitizi/issues",
        "Homepage": "https://github.com/nk2028/yitizi",
        "Source": "https://github.com/nk2028/yitizi"
    },
    "split_keywords": [
        "chinese",
        "chinese-character",
        "natural-language-processing"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3f3356f7eb8096aee8358786078b4044e0b3ee9d7530694e4b8b63814d90e23a",
                "md5": "8f5ff516a6c5a40d753fdbabb92afdd2",
                "sha256": "78478facf94daadeef0fdc4fde852b9d8fc46fb8daad55fb89a2662fed8302c9"
            },
            "downloads": -1,
            "filename": "yitizi-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8f5ff516a6c5a40d753fdbabb92afdd2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.5",
            "size": 80753,
            "upload_time": "2024-08-29T15:21:51",
            "upload_time_iso_8601": "2024-08-29T15:21:51.402982Z",
            "url": "https://files.pythonhosted.org/packages/3f/33/56f7eb8096aee8358786078b4044e0b3ee9d7530694e4b8b63814d90e23a/yitizi-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2d76d7e2090c1e381f75c3b0b73d53ffbf237c9bf80ed5000d269ecd37d1cfea",
                "md5": "34c360afd68760967c2d20298f60e55e",
                "sha256": "463efa8240736e5548dbda83cff6c11e1fed97778af18f8847355133bbaee922"
            },
            "downloads": -1,
            "filename": "yitizi-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "34c360afd68760967c2d20298f60e55e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4,>=3.5",
            "size": 84465,
            "upload_time": "2024-08-29T15:21:53",
            "upload_time_iso_8601": "2024-08-29T15:21:53.331643Z",
            "url": "https://files.pythonhosted.org/packages/2d/76/d7e2090c1e381f75c3b0b73d53ffbf237c9bf80ed5000d269ecd37d1cfea/yitizi-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-29 15:21:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "nk2028",
    "github_project": "yitizi",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "yitizi"
}
        
Elapsed time: 0.35800s