[![Current PyPI packages](https://badge.fury.io/py/udkundoku.svg)](https://pypi.org/project/udkundoku/)
# UD-Kundoku
Classical Chinese to Modern Japanese Translator, working on [Universal Dependencies](https://universaldependencies.org/format.html).
## Basic usage
```py
>>> import udkundoku
>>> lzh=udkundoku.load()
>>> s=lzh("不入虎穴不得虎子")
>>> t=udkundoku.translate(s)
>>> print(t)
# text = 虎の穴に入らずして虎の子を得ず
1 虎 虎 NOUN n,名詞,主体,動物 _ 3 nmod _ Gloss=tiger|SpaceAfter=No
2 の _ ADP _ _ 1 case _ SpaceAfter=No
3 穴 穴 NOUN n,名詞,固定物,地形 Case=Loc 5 obj _ Gloss=cave|SpaceAfter=No
4 に _ ADP _ _ 3 case _ SpaceAfter=No
5 入ら 入 VERB v,動詞,行為,移動 _ 0 root _ Gloss=enter|SpaceAfter=No
6 ずして 不 AUX v,副詞,否定,無界 Polarity=Neg 5 advmod _ Gloss=not|SpaceAfter=No
7 虎 虎 NOUN n,名詞,主体,動物 _ 9 nmod _ Gloss=tiger|SpaceAfter=No
8 の _ ADP _ _ 7 case _ SpaceAfter=No
9 子 子 NOUN n,名詞,人,関係 _ 11 obj _ Gloss=child|SpaceAfter=No
10 を _ ADP _ _ 9 case _ SpaceAfter=No
11 得 得 VERB v,動詞,行為,得失 _ 5 parataxis _ Gloss=get|SpaceAfter=No
12 ず 不 AUX v,副詞,否定,無界 Polarity=Neg 11 advmod _ Gloss=not|SpaceAfter=No
>>> print(t.sentence())
虎の穴に入らずして虎の子を得ず
>>> print(s.to_tree())
不 <════╗ advmod
入 ═══╗═╝═╗ root
虎 <╗ ║ ║ nmod
穴 ═╝<╝ ║ obj
不 <════╗ ║ advmod
得 ═══╗═╝<╝ parataxis
虎 <╗ ║ nmod
子 ═╝<╝ obj
>>> print(t.to_tree())
虎 ═╗<╗ nmod(体言による連体修飾語)
の <╝ ║ case(格表示)
穴 ═╗═╝<╗ obj(目的語)
に <╝ ║ case(格表示)
入 ═╗═══╝═╗ root(親)
ら ║ ║
ず <╝ ║ advmod(連用修飾語)
し ║
て ║
虎 ═╗<╗ ║ nmod(体言による連体修飾語)
の <╝ ║ ║ case(格表示)
子 ═╗═╝<╗ ║ obj(目的語)
を <╝ ║ ║ case(格表示)
得 ═╗═══╝<╝ parataxis(隣接表現)
ず <╝ advmod(連用修飾語)
```
`udkundoku.load()` is an alias for `udkanbun.load()` of [UD-Kanbun](https://github.com/KoichiYasuoka/UD-Kanbun/). `udkundoku.translate()` is a transcriptive converter from Classical Chinese (under Universal Dependencies of UD-Kanbun) into Modern Japanese (under Universal Dependencies of [UniDic2UD](https://github.com/KoichiYasuoka/UniDic2UD/)). `udkundoku.reorder()` is called to rearrange Classical Chinese into Japanese word-order inside `udkundoku.translate()`. `to_tree()` and `to_svg()` are borrowed from those of UD-Kanbun.
You can simply use `udkundoku` on the command line:
```sh
echo 不入虎穴不得虎子 | udkundoku -j
```
## HTTP-server usage
```sh
python -m udkundoku.server 5000
```
Try to connect `http://127.0.0.1:5000` with your local browser. Input a Classical Chinese sentence there and push 解析-button (at least) three times.
![不入虎穴不得虎子](https://raw.githubusercontent.com/KoichiYasuoka/UD-Kundoku/master/example.png)
## Installation for Linux
Tar-ball is available for Linux, and is installed by default when you use `pip`:
```sh
pip install udkundoku
```
[旧仮名口語UniDic](https://clrd.ninjal.ac.jp/unidic/download_all.html#unidic_qkana) is automatically downloaded for UniDic2UD.
## Installation for Cygwin
Make sure to get `gcc-g++` `python37-pip` `python37-devel` packages, and then:
```sh
pip3.7 install udkundoku
```
Use `python3.7` command in [Cygwin](https://www.cygwin.com/install.html) instead of `python`.
## Installation for Jupyter Notebook (Google Colaboratory)
```py
!pip install udkundoku
```
Try [notebook](https://colab.research.google.com/github/KoichiYasuoka/UD-Kundoku/blob/master/udkundoku.ipynb) for Google Colaboratory.
## Author
Koichi Yasuoka (安岡孝一)
## References
* 安岡孝一: [漢文の依存文法解析にもとづく自動訓読システム](http://hdl.handle.net/2433/259315), 日本漢字学会第3回研究大会予稿集(2020年11月), pp.60-73.
* 安岡孝一: [漢文の依存文法解析と返り点の関係について](http://hdl.handle.net/2433/235609), 日本漢字学会第1回研究大会予稿集(2018年12月), pp.33-48.
Raw data
{
"_id": null,
"home_page": "https://github.com/KoichiYasuoka/UD-Kundoku",
"name": "udkundoku",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6",
"maintainer_email": "",
"keywords": "udkanbun nlp",
"author": "Koichi Yasuoka",
"author_email": "yasuoka@kanji.zinbun.kyoto-u.ac.jp",
"download_url": "https://files.pythonhosted.org/packages/ca/17/0f291a85738cef37a6ec867520a350a636c1e73c0da3ff824fdd970aac6b/udkundoku-2.2.9.tar.gz",
"platform": null,
"description": "[![Current PyPI packages](https://badge.fury.io/py/udkundoku.svg)](https://pypi.org/project/udkundoku/)\n\n# UD-Kundoku\n\nClassical Chinese to Modern Japanese Translator, working on [Universal Dependencies](https://universaldependencies.org/format.html).\n\n## Basic usage\n\n```py\n>>> import udkundoku\n>>> lzh=udkundoku.load()\n>>> s=lzh(\"\u4e0d\u5165\u864e\u7a74\u4e0d\u5f97\u864e\u5b50\")\n>>> t=udkundoku.translate(s)\n>>> print(t)\n# text = \u864e\u306e\u7a74\u306b\u5165\u3089\u305a\u3057\u3066\u864e\u306e\u5b50\u3092\u5f97\u305a\n1\t\u864e\t\u864e\tNOUN\tn,\u540d\u8a5e,\u4e3b\u4f53,\u52d5\u7269\t_\t3\tnmod\t_\tGloss=tiger|SpaceAfter=No\n2\t\u306e\t_\tADP\t_\t_\t1\tcase\t_\tSpaceAfter=No\n3\t\u7a74\t\u7a74\tNOUN\tn,\u540d\u8a5e,\u56fa\u5b9a\u7269,\u5730\u5f62\tCase=Loc\t5\tobj\t_\tGloss=cave|SpaceAfter=No\n4\t\u306b\t_\tADP\t_\t_\t3\tcase\t_\tSpaceAfter=No\n5\t\u5165\u3089\t\u5165\tVERB\tv,\u52d5\u8a5e,\u884c\u70ba,\u79fb\u52d5\t_\t0\troot\t_\tGloss=enter|SpaceAfter=No\n6\t\u305a\u3057\u3066\t\u4e0d\tAUX\tv,\u526f\u8a5e,\u5426\u5b9a,\u7121\u754c\tPolarity=Neg\t5\tadvmod\t_\tGloss=not|SpaceAfter=No\n7\t\u864e\t\u864e\tNOUN\tn,\u540d\u8a5e,\u4e3b\u4f53,\u52d5\u7269\t_\t9\tnmod\t_\tGloss=tiger|SpaceAfter=No\n8\t\u306e\t_\tADP\t_\t_\t7\tcase\t_\tSpaceAfter=No\n9\t\u5b50\t\u5b50\tNOUN\tn,\u540d\u8a5e,\u4eba,\u95a2\u4fc2\t_\t11\tobj\t_\tGloss=child|SpaceAfter=No\n10\t\u3092\t_\tADP\t_\t_\t9\tcase\t_\tSpaceAfter=No\n11\t\u5f97\t\u5f97\tVERB\tv,\u52d5\u8a5e,\u884c\u70ba,\u5f97\u5931\t_\t5\tparataxis\t_\tGloss=get|SpaceAfter=No\n12\t\u305a\t\u4e0d\tAUX\tv,\u526f\u8a5e,\u5426\u5b9a,\u7121\u754c\tPolarity=Neg\t11\tadvmod\t_\tGloss=not|SpaceAfter=No\n\n>>> print(t.sentence())\n\u864e\u306e\u7a74\u306b\u5165\u3089\u305a\u3057\u3066\u864e\u306e\u5b50\u3092\u5f97\u305a\n\n>>> print(s.to_tree())\n\u4e0d <\u2550\u2550\u2550\u2550\u2557 advmod\n\u5165 \u2550\u2550\u2550\u2557\u2550\u255d\u2550\u2557 root\n\u864e <\u2557 \u2551 \u2551 nmod\n\u7a74 \u2550\u255d<\u255d \u2551 obj\n\u4e0d <\u2550\u2550\u2550\u2550\u2557 \u2551 advmod\n\u5f97 \u2550\u2550\u2550\u2557\u2550\u255d<\u255d parataxis\n\u864e <\u2557 \u2551 nmod\n\u5b50 \u2550\u255d<\u255d obj\n\n>>> print(t.to_tree())\n\u864e \u2550\u2557<\u2557 nmod(\u4f53\u8a00\u306b\u3088\u308b\u9023\u4f53\u4fee\u98fe\u8a9e)\n\u306e <\u255d \u2551 case(\u683c\u8868\u793a)\n\u7a74 \u2550\u2557\u2550\u255d<\u2557 obj(\u76ee\u7684\u8a9e)\n\u306b <\u255d \u2551 case(\u683c\u8868\u793a)\n\u5165 \u2550\u2557\u2550\u2550\u2550\u255d\u2550\u2557 root(\u89aa)\n\u3089 \u2551 \u2551\n\u305a <\u255d \u2551 advmod(\u9023\u7528\u4fee\u98fe\u8a9e)\n\u3057 \u2551\n\u3066 \u2551\n\u864e \u2550\u2557<\u2557 \u2551 nmod(\u4f53\u8a00\u306b\u3088\u308b\u9023\u4f53\u4fee\u98fe\u8a9e)\n\u306e <\u255d \u2551 \u2551 case(\u683c\u8868\u793a)\n\u5b50 \u2550\u2557\u2550\u255d<\u2557 \u2551 obj(\u76ee\u7684\u8a9e)\n\u3092 <\u255d \u2551 \u2551 case(\u683c\u8868\u793a)\n\u5f97 \u2550\u2557\u2550\u2550\u2550\u255d<\u255d parataxis(\u96a3\u63a5\u8868\u73fe)\n\u305a <\u255d advmod(\u9023\u7528\u4fee\u98fe\u8a9e)\n```\n\n`udkundoku.load()` is an alias for `udkanbun.load()` of [UD-Kanbun](https://github.com/KoichiYasuoka/UD-Kanbun/). `udkundoku.translate()` is a transcriptive converter from Classical Chinese (under Universal Dependencies of UD-Kanbun) into Modern Japanese (under Universal Dependencies of [UniDic2UD](https://github.com/KoichiYasuoka/UniDic2UD/)). `udkundoku.reorder()` is called to rearrange Classical Chinese into Japanese word-order inside `udkundoku.translate()`. `to_tree()` and `to_svg()` are borrowed from those of UD-Kanbun.\n\nYou can simply use `udkundoku` on the command line:\n```sh\necho \u4e0d\u5165\u864e\u7a74\u4e0d\u5f97\u864e\u5b50 | udkundoku -j\n```\n\n## HTTP-server usage\n\n```sh\npython -m udkundoku.server 5000\n```\nTry to connect `http://127.0.0.1:5000` with your local browser. Input a Classical Chinese sentence there and push \u89e3\u6790-button (at least) three times.\n\n![\u4e0d\u5165\u864e\u7a74\u4e0d\u5f97\u864e\u5b50](https://raw.githubusercontent.com/KoichiYasuoka/UD-Kundoku/master/example.png)\n\n## Installation for Linux\n\nTar-ball is available for Linux, and is installed by default when you use `pip`:\n```sh\npip install udkundoku\n```\n[\u65e7\u4eee\u540d\u53e3\u8a9eUniDic](https://clrd.ninjal.ac.jp/unidic/download_all.html#unidic_qkana) is automatically downloaded for UniDic2UD.\n\n## Installation for Cygwin\n\nMake sure to get `gcc-g++` `python37-pip` `python37-devel` packages, and then:\n```sh\npip3.7 install udkundoku\n```\nUse `python3.7` command in [Cygwin](https://www.cygwin.com/install.html) instead of `python`.\n\n## Installation for Jupyter Notebook (Google Colaboratory)\n\n```py\n!pip install udkundoku\n```\n\nTry [notebook](https://colab.research.google.com/github/KoichiYasuoka/UD-Kundoku/blob/master/udkundoku.ipynb) for Google Colaboratory.\n\n## Author\n\nKoichi Yasuoka (\u5b89\u5ca1\u5b5d\u4e00)\n\n## References\n\n* \u5b89\u5ca1\u5b5d\u4e00: [\u6f22\u6587\u306e\u4f9d\u5b58\u6587\u6cd5\u89e3\u6790\u306b\u3082\u3068\u3065\u304f\u81ea\u52d5\u8a13\u8aad\u30b7\u30b9\u30c6\u30e0](http://hdl.handle.net/2433/259315), \u65e5\u672c\u6f22\u5b57\u5b66\u4f1a\u7b2c3\u56de\u7814\u7a76\u5927\u4f1a\u4e88\u7a3f\u96c6(2020\u5e7411\u6708), pp.60-73.\n* \u5b89\u5ca1\u5b5d\u4e00: [\u6f22\u6587\u306e\u4f9d\u5b58\u6587\u6cd5\u89e3\u6790\u3068\u8fd4\u308a\u70b9\u306e\u95a2\u4fc2\u306b\u3064\u3044\u3066](http://hdl.handle.net/2433/235609), \u65e5\u672c\u6f22\u5b57\u5b66\u4f1a\u7b2c1\u56de\u7814\u7a76\u5927\u4f1a\u4e88\u7a3f\u96c6(2018\u5e7412\u6708), pp.33-48.\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Classical Chinese to Modern Japanese Translator",
"version": "2.2.9",
"project_urls": {
"Homepage": "https://github.com/KoichiYasuoka/UD-Kundoku",
"Source": "https://github.com/KoichiYasuoka/UD-Kundoku",
"Tracker": "https://github.com/KoichiYasuoka/UD-Kundoku/issues",
"ud-ja-kanbun": "https://corpus.kanji.zinbun.kyoto-u.ac.jp/gitlab/Kanbun/ud-ja-kanbun"
},
"split_keywords": [
"udkanbun",
"nlp"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ca170f291a85738cef37a6ec867520a350a636c1e73c0da3ff824fdd970aac6b",
"md5": "663040a9f6ae05ca66a178ac2c41b28e",
"sha256": "821e23fdc9d62ccafe2763caa17168be8bf54a58402360f3984379033a15b24e"
},
"downloads": -1,
"filename": "udkundoku-2.2.9.tar.gz",
"has_sig": false,
"md5_digest": "663040a9f6ae05ca66a178ac2c41b28e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6",
"size": 21235,
"upload_time": "2024-01-10T15:22:18",
"upload_time_iso_8601": "2024-01-10T15:22:18.467733Z",
"url": "https://files.pythonhosted.org/packages/ca/17/0f291a85738cef37a6ec867520a350a636c1e73c0da3ff824fdd970aac6b/udkundoku-2.2.9.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-10 15:22:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "KoichiYasuoka",
"github_project": "UD-Kundoku",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "udkundoku"
}