Simple Toolkit for Hangul
=========================
base code forked from https://github.com/rhobot/Hangulpy
한글 자모 분해, 조합(오토마타), 조사 붙이기, 초/중/종 분해조합, 한글/한자/영문 여부 체크 등을 지원합니다.
## INSTALL
```
pip install hgtk
```
## Samples
### Letter
#### Decompose character
```python
>>> hgtk.letter.decompose('감')
('ㄱ', 'ㅏ', 'ㅁ')
```
#### Compose character
```python
>>> hgtk.letter.compose('ㄱ', 'ㅏ', 'ㅁ')
'감'
```
### Text
#### Decompose text
```python
>>> hgtk.text.decompose('학교종이 땡땡땡! hello world 1234567890 ㅋㅋ!')
'ㅎㅏㄱᴥㄱㅛᴥㅈㅗㅇᴥㅇㅣᴥ ㄸㅐㅇᴥㄸㅐㅇᴥㄸㅐㅇᴥ! hello world 1234567890 ㅋᴥㅋᴥ!'
```
기본 조합 완료 기호는 ᴥ이고, 아래와 같이 compose_code 옵션으로 변경 가능합니다.
```python
>>> hgtk.text.decompose('학교종이 땡땡땡! hello world 1234567890 ㅋㅋ!', compose_code='/')
'ㅎㅏㄱ/ㄱㅛ/ㅈㅗㅇ/ㅇㅣ/ㄸㅐㅇ/ㄸㅐㅇ/ㄸㅐㅇ/! hello world 1234567890 ㅋ/ㅋ/!'
```
기본 조합기호의 의미는 곰돌이 입니다. 👇
<img src='https://user-images.githubusercontent.com/3307964/136328328-a5dea3b0-4731-48a5-881a-fae9b2c83dba.png' width=300/>
#### Compose text (Automata)
```python
>>> hgtk.text.compose('ㅎㅏㄱᴥㄱㅛᴥㅈㅗㅇᴥㅇㅣᴥ ㄸㅐㅇᴥㄸㅐㅇᴥㄸㅐㅇᴥ! hello world 1234567890 ㅋᴥㅋᴥ!')
'학교종이 땡땡땡! hello world 1234567890 ㅋㅋ!'
```
### Checker
#### is hangul text
```python
>>> hgtk.checker.is_hangul('한글입니다')
True
>>> hgtk.checker.is_hangul('no한글입니다')
False
>>> hgtk.checker.is_hangul('it is english')
False
```
#### is hanja text
```python
>>> hgtk.checker.is_hanja('大韓民國')
True
>>> hgtk.checker.is_hanja('大한민국')
False
>>> hgtk.checker.is_hanja('대한민국')
False
```
#### is latin1 text
```python
>>> hgtk.checker.is_latin1('abcdefghijklmnopqrstuvwxyz')
True
>>> hgtk.checker.is_latin1('한글latin1한')
False
````
#### has batchim
```python
>>> hgtk.checker.has_batchim('한') # '한' has batchim 'ㄴ'
True
>>> hgtk.checker.has_batchim('하')
False
```
### Josa
#### EUN_NEUN - 은/는
```python
>>> hgtk.josa.attach('하늘', hgtk.josa.EUN_NEUN)
'하늘은'
>>> hgtk.josa.attach('바다', hgtk.josa.EUN_NEUN)
'바다는'
```
#### I_GA - 이/가
```python
>>> hgtk.josa.attach('하늘', hgtk.josa.I_GA)
'하늘이'
>>> hgtk.josa.attach('바다', hgtk.josa.I_GA)
'바다가'
```
#### EUL_REUL - 을/를
```python
>>> hgtk.josa.attach('하늘', hgtk.josa.EUL_REUL)
'하늘을'
>>> hgtk.josa.attach('바다', hgtk.josa.EUL_REUL)
'바다를'
```
#### GWA_WA - 과/와
```python
>>> hgtk.josa.attach('하늘', hgtk.josa.GWA_WA)
'하늘과'
>>> hgtk.josa.attach('바다', hgtk.josa.GWA_WA)
'바다와'
```
#### IDA_DA - 이다/다
```python
>>> hgtk.josa.attach('하늘', hgtk.josa.IDA_DA)
'하늘이다'
>>> hgtk.josa.attach('바다', hgtk.josa.IDA_DA)
'바다다'
```
#### EURO_RO - 로/으로
```python
>>> hgtk.josa.attach('하늘', hgtk.josa.EURO_RO)
'하늘로'
>>> hgtk.josa.attach('바다', hgtk.josa.EURO_RO)
'바다로'
>>> hgtk.josa.attach('태양', hgtk.josa.EURO_RO)
'태양으로'
```
#### RYUL_YUL - 율/률
```python
>>> hgtk.josa.attach('방어', hgtk.josa.RYUL_YUL)
'방어율'
>>> hgtk.josa.attach('공격', hgtk.josa.RYUL_YUL)
'공격률'
>>> hgtk.josa.attach('반환', hgtk.josa.RYUL_YUL)
'반환율'
```
### Const
* CHO: 초성 리스트
* JOONG: 중성 리스트
* JONG: 종성 리스트, 종성이 없는 경우를 대비해 공백 문자가 추가됨
* JAMO: 공백을 제외한 모든 자모(비조합문자)
* NUM_CHO: 초성 개수
* NUM_JOONG: 중성 개수
* NUM_JONG: 종성 개수
* FIRST_HANGUL_UNICODE: 유니코드 상의 한글 코드(조합문자) 시작 시점
* LAST_HANGUL_UNICODE: 유니코드 상의 한글 코드(조합문자) 종료 시점
### Exception
예외 처리를 위한 Exception들, 의미는 보이는 대로..
* NotHangulException
* NotLetterException
* NotWordException
##Tested in
- python 2.6
- python 2.7
- python 3.3
- python 3.4
- python 3.5
- python 3.6
- python nightly build
- PyPy 2.2.5.
- Pypy 3 2.4.
- PyPy 5.3.1
----
Apache 2.0 License
Raw data
{
"_id": null,
"home_page": "https://github.com/bluedisk/hangul-toolkit",
"name": "hgtk",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "hangul charactorjamo automada composing decomposing josa",
"author": "Wonwoo, lee",
"author_email": "bluedisk@gmail.com",
"download_url": "",
"platform": null,
"description": "Simple Toolkit for Hangul\n=========================\nbase code forked from https://github.com/rhobot/Hangulpy\n\n\ud55c\uae00 \uc790\ubaa8 \ubd84\ud574, \uc870\ud569(\uc624\ud1a0\ub9c8\ud0c0), \uc870\uc0ac \ubd99\uc774\uae30, \ucd08/\uc911/\uc885 \ubd84\ud574\uc870\ud569, \ud55c\uae00/\ud55c\uc790/\uc601\ubb38 \uc5ec\ubd80 \uccb4\ud06c \ub4f1\uc744 \uc9c0\uc6d0\ud569\ub2c8\ub2e4.\n\n## INSTALL\n```\npip install hgtk\n```\n\n## Samples\n### Letter\n#### Decompose character\n```python\n>>> hgtk.letter.decompose('\uac10')\n('\u3131', '\u314f', '\u3141')\n```\n#### Compose character\n```python\n>>> hgtk.letter.compose('\u3131', '\u314f', '\u3141')\n'\uac10'\n```\n\n### Text\n#### Decompose text\n```python\n>>> hgtk.text.decompose('\ud559\uad50\uc885\uc774 \ub561\ub561\ub561! hello world 1234567890 \u314b\u314b!')\n'\u314e\u314f\u3131\u1d25\u3131\u315b\u1d25\u3148\u3157\u3147\u1d25\u3147\u3163\u1d25 \u3138\u3150\u3147\u1d25\u3138\u3150\u3147\u1d25\u3138\u3150\u3147\u1d25! hello world 1234567890 \u314b\u1d25\u314b\u1d25!'\n```\n\n\uae30\ubcf8 \uc870\ud569 \uc644\ub8cc \uae30\ud638\ub294 \u1d25\uc774\uace0, \uc544\ub798\uc640 \uac19\uc774 compose_code \uc635\uc158\uc73c\ub85c \ubcc0\uacbd \uac00\ub2a5\ud569\ub2c8\ub2e4.\n```python\n>>> hgtk.text.decompose('\ud559\uad50\uc885\uc774 \ub561\ub561\ub561! hello world 1234567890 \u314b\u314b!', compose_code='/')\n'\u314e\u314f\u3131/\u3131\u315b/\u3148\u3157\u3147/\u3147\u3163/\u3138\u3150\u3147/\u3138\u3150\u3147/\u3138\u3150\u3147/! hello world 1234567890 \u314b/\u314b/!'\n```\n\uae30\ubcf8 \uc870\ud569\uae30\ud638\uc758 \uc758\ubbf8\ub294 \uacf0\ub3cc\uc774 \uc785\ub2c8\ub2e4. \ud83d\udc47 \n<img src='https://user-images.githubusercontent.com/3307964/136328328-a5dea3b0-4731-48a5-881a-fae9b2c83dba.png' width=300/>\n\n#### Compose text (Automata)\n```python\n>>> hgtk.text.compose('\u314e\u314f\u3131\u1d25\u3131\u315b\u1d25\u3148\u3157\u3147\u1d25\u3147\u3163\u1d25 \u3138\u3150\u3147\u1d25\u3138\u3150\u3147\u1d25\u3138\u3150\u3147\u1d25! hello world 1234567890 \u314b\u1d25\u314b\u1d25!')\n'\ud559\uad50\uc885\uc774 \ub561\ub561\ub561! hello world 1234567890 \u314b\u314b!'\n```\n\n### Checker\n\n#### is hangul text\n```python\n>>> hgtk.checker.is_hangul('\ud55c\uae00\uc785\ub2c8\ub2e4')\nTrue\n>>> hgtk.checker.is_hangul('no\ud55c\uae00\uc785\ub2c8\ub2e4')\nFalse\n>>> hgtk.checker.is_hangul('it is english')\nFalse\n```\n\n#### is hanja text\n```python\n>>> hgtk.checker.is_hanja('\u5927\u97d3\u6c11\u570b')\nTrue\n>>> hgtk.checker.is_hanja('\u5927\ud55c\ubbfc\uad6d')\nFalse\n>>> hgtk.checker.is_hanja('\ub300\ud55c\ubbfc\uad6d')\nFalse\n```\n\n#### is latin1 text\n```python\n>>> hgtk.checker.is_latin1('abcdefghijklmnopqrstuvwxyz')\nTrue\n>>> hgtk.checker.is_latin1('\ud55c\uae00latin1\ud55c')\nFalse\n````\n\n#### has batchim\n```python\n>>> hgtk.checker.has_batchim('\ud55c') # '\ud55c' has batchim '\u3134'\nTrue\n>>> hgtk.checker.has_batchim('\ud558')\nFalse\n```\n\n\n### Josa\n#### EUN_NEUN - \uc740/\ub294\n```python\n>>> hgtk.josa.attach('\ud558\ub298', hgtk.josa.EUN_NEUN)\n'\ud558\ub298\uc740'\n>>> hgtk.josa.attach('\ubc14\ub2e4', hgtk.josa.EUN_NEUN)\n'\ubc14\ub2e4\ub294'\n```\n#### I_GA - \uc774/\uac00\n```python\n>>> hgtk.josa.attach('\ud558\ub298', hgtk.josa.I_GA)\n'\ud558\ub298\uc774'\n>>> hgtk.josa.attach('\ubc14\ub2e4', hgtk.josa.I_GA)\n'\ubc14\ub2e4\uac00'\n```\n#### EUL_REUL - \uc744/\ub97c \n```python\n>>> hgtk.josa.attach('\ud558\ub298', hgtk.josa.EUL_REUL)\n'\ud558\ub298\uc744'\n>>> hgtk.josa.attach('\ubc14\ub2e4', hgtk.josa.EUL_REUL)\n'\ubc14\ub2e4\ub97c'\n```\n#### GWA_WA - \uacfc/\uc640 \n```python\n>>> hgtk.josa.attach('\ud558\ub298', hgtk.josa.GWA_WA)\n'\ud558\ub298\uacfc'\n>>> hgtk.josa.attach('\ubc14\ub2e4', hgtk.josa.GWA_WA)\n'\ubc14\ub2e4\uc640'\n```\n#### IDA_DA - \uc774\ub2e4/\ub2e4 \n```python\n>>> hgtk.josa.attach('\ud558\ub298', hgtk.josa.IDA_DA)\n'\ud558\ub298\uc774\ub2e4'\n>>> hgtk.josa.attach('\ubc14\ub2e4', hgtk.josa.IDA_DA)\n'\ubc14\ub2e4\ub2e4'\n```\n#### EURO_RO - \ub85c/\uc73c\ub85c\n```python\n>>> hgtk.josa.attach('\ud558\ub298', hgtk.josa.EURO_RO)\n'\ud558\ub298\ub85c'\n>>> hgtk.josa.attach('\ubc14\ub2e4', hgtk.josa.EURO_RO)\n'\ubc14\ub2e4\ub85c'\n>>> hgtk.josa.attach('\ud0dc\uc591', hgtk.josa.EURO_RO)\n'\ud0dc\uc591\uc73c\ub85c'\n```\n#### RYUL_YUL - \uc728/\ub960\n```python\n>>> hgtk.josa.attach('\ubc29\uc5b4', hgtk.josa.RYUL_YUL)\n'\ubc29\uc5b4\uc728'\n>>> hgtk.josa.attach('\uacf5\uaca9', hgtk.josa.RYUL_YUL)\n'\uacf5\uaca9\ub960'\n>>> hgtk.josa.attach('\ubc18\ud658', hgtk.josa.RYUL_YUL)\n'\ubc18\ud658\uc728'\n```\n\n### Const\n* CHO: \ucd08\uc131 \ub9ac\uc2a4\ud2b8\n* JOONG: \uc911\uc131 \ub9ac\uc2a4\ud2b8\n* JONG: \uc885\uc131 \ub9ac\uc2a4\ud2b8, \uc885\uc131\uc774 \uc5c6\ub294 \uacbd\uc6b0\ub97c \ub300\ube44\ud574 \uacf5\ubc31 \ubb38\uc790\uac00 \ucd94\uac00\ub428\n\n* JAMO: \uacf5\ubc31\uc744 \uc81c\uc678\ud55c \ubaa8\ub4e0 \uc790\ubaa8(\ube44\uc870\ud569\ubb38\uc790)\n\n* NUM_CHO: \ucd08\uc131 \uac1c\uc218\n* NUM_JOONG: \uc911\uc131 \uac1c\uc218\n* NUM_JONG: \uc885\uc131 \uac1c\uc218 \n\n* FIRST_HANGUL_UNICODE: \uc720\ub2c8\ucf54\ub4dc \uc0c1\uc758 \ud55c\uae00 \ucf54\ub4dc(\uc870\ud569\ubb38\uc790) \uc2dc\uc791 \uc2dc\uc810\n* LAST_HANGUL_UNICODE: \uc720\ub2c8\ucf54\ub4dc \uc0c1\uc758 \ud55c\uae00 \ucf54\ub4dc(\uc870\ud569\ubb38\uc790) \uc885\ub8cc \uc2dc\uc810 \n\n### Exception\n\uc608\uc678 \ucc98\ub9ac\ub97c \uc704\ud55c Exception\ub4e4, \uc758\ubbf8\ub294 \ubcf4\uc774\ub294 \ub300\ub85c..\n* NotHangulException\n* NotLetterException\n* NotWordException\n\n\n##Tested in\n- python 2.6\n- python 2.7\n- python 3.3\n- python 3.4\n- python 3.5\n- python 3.6\n- python nightly build\n\n- PyPy 2.2.5.\n- Pypy 3 2.4.\n- PyPy 5.3.1\n\n\n----\n\nApache 2.0 License\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "Toolkit for Hangul composing, decomposing and etc...",
"version": "0.2.1",
"project_urls": {
"Homepage": "https://github.com/bluedisk/hangul-toolkit"
},
"split_keywords": [
"hangul",
"charactorjamo",
"automada",
"composing",
"decomposing",
"josa"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "4cd0d62a73954ab95a8d3967a063c371c16feddfb8bd2957fd11691adf7834e8",
"md5": "37043d0570f1577f110bcf903e51b723",
"sha256": "f3e33dacf6ab2564f6257418b718e2c7a4ae9fffa32e18d6c4f6278b72ba73ee"
},
"downloads": -1,
"filename": "hgtk-0.2.1-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "37043d0570f1577f110bcf903e51b723",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": null,
"size": 12011,
"upload_time": "2023-09-17T10:36:01",
"upload_time_iso_8601": "2023-09-17T10:36:01.773562Z",
"url": "https://files.pythonhosted.org/packages/4c/d0/d62a73954ab95a8d3967a063c371c16feddfb8bd2957fd11691adf7834e8/hgtk-0.2.1-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-17 10:36:01",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "bluedisk",
"github_project": "hangul-toolkit",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"circle": true,
"requirements": [],
"lcname": "hgtk"
}