hgtk


Namehgtk JSON
Version 0.2.1 PyPI version JSON
download
home_pagehttps://github.com/bluedisk/hangul-toolkit
SummaryToolkit for Hangul composing, decomposing and etc...
upload_time2023-09-17 10:36:01
maintainer
docs_urlNone
authorWonwoo, lee
requires_python
licenseApache 2.0
keywords hangul charactorjamo automada composing decomposing josa
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Simple Toolkit for Hangul
=========================
base code forked from https://github.com/rhobot/Hangulpy

한글 자모 분해, 조합(오토마타), 조사 붙이기, 초/중/종 분해조합, 한글/한자/영문 여부 체크 등을 지원합니다.

## INSTALL
```
pip install hgtk
```

## Samples
### Letter
#### Decompose character
```python
>>> hgtk.letter.decompose('감')
('ㄱ', 'ㅏ', 'ㅁ')
```
#### Compose character
```python
>>> hgtk.letter.compose('ㄱ', 'ㅏ', 'ㅁ')
'감'
```

### Text
#### Decompose text
```python
>>> hgtk.text.decompose('학교종이 땡땡땡! hello world 1234567890 ㅋㅋ!')
'ㅎㅏㄱᴥㄱㅛᴥㅈㅗㅇᴥㅇㅣᴥ ㄸㅐㅇᴥㄸㅐㅇᴥㄸㅐㅇᴥ! hello world 1234567890 ㅋᴥㅋᴥ!'
```

기본 조합 완료 기호는 ᴥ이고, 아래와 같이 compose_code 옵션으로 변경 가능합니다.
```python
>>> hgtk.text.decompose('학교종이 땡땡땡! hello world 1234567890 ㅋㅋ!', compose_code='/')
'ㅎㅏㄱ/ㄱㅛ/ㅈㅗㅇ/ㅇㅣ/ㄸㅐㅇ/ㄸㅐㅇ/ㄸㅐㅇ/! hello world 1234567890 ㅋ/ㅋ/!'
```
기본 조합기호의 의미는 곰돌이 입니다. 👇  
<img src='https://user-images.githubusercontent.com/3307964/136328328-a5dea3b0-4731-48a5-881a-fae9b2c83dba.png' width=300/>

#### Compose text (Automata)
```python
>>> hgtk.text.compose('ㅎㅏㄱᴥㄱㅛᴥㅈㅗㅇᴥㅇㅣᴥ ㄸㅐㅇᴥㄸㅐㅇᴥㄸㅐㅇᴥ! hello world 1234567890 ㅋᴥㅋᴥ!')
'학교종이 땡땡땡! hello world 1234567890 ㅋㅋ!'
```

### Checker

#### is hangul text
```python
>>> hgtk.checker.is_hangul('한글입니다')
True
>>> hgtk.checker.is_hangul('no한글입니다')
False
>>> hgtk.checker.is_hangul('it is english')
False
```

#### is hanja text
```python
>>> hgtk.checker.is_hanja('大韓民國')
True
>>> hgtk.checker.is_hanja('大한민국')
False
>>> hgtk.checker.is_hanja('대한민국')
False
```

#### is latin1 text
```python
>>> hgtk.checker.is_latin1('abcdefghijklmnopqrstuvwxyz')
True
>>> hgtk.checker.is_latin1('한글latin1한')
False
````

#### has batchim
```python
>>> hgtk.checker.has_batchim('한')   # '한' has batchim 'ㄴ'
True
>>> hgtk.checker.has_batchim('하')
False
```


### Josa
#### EUN_NEUN - 은/는
```python
>>> hgtk.josa.attach('하늘', hgtk.josa.EUN_NEUN)
'하늘은'
>>> hgtk.josa.attach('바다', hgtk.josa.EUN_NEUN)
'바다는'
```
#### I_GA - 이/가
```python
>>> hgtk.josa.attach('하늘', hgtk.josa.I_GA)
'하늘이'
>>> hgtk.josa.attach('바다', hgtk.josa.I_GA)
'바다가'
```
#### EUL_REUL - 을/를 
```python
>>> hgtk.josa.attach('하늘', hgtk.josa.EUL_REUL)
'하늘을'
>>> hgtk.josa.attach('바다', hgtk.josa.EUL_REUL)
'바다를'
```
#### GWA_WA - 과/와 
```python
>>> hgtk.josa.attach('하늘', hgtk.josa.GWA_WA)
'하늘과'
>>> hgtk.josa.attach('바다', hgtk.josa.GWA_WA)
'바다와'
```
#### IDA_DA - 이다/다 
```python
>>> hgtk.josa.attach('하늘', hgtk.josa.IDA_DA)
'하늘이다'
>>> hgtk.josa.attach('바다', hgtk.josa.IDA_DA)
'바다다'
```
#### EURO_RO - 로/으로
```python
>>> hgtk.josa.attach('하늘', hgtk.josa.EURO_RO)
'하늘로'
>>> hgtk.josa.attach('바다', hgtk.josa.EURO_RO)
'바다로'
>>> hgtk.josa.attach('태양', hgtk.josa.EURO_RO)
'태양으로'
```
#### RYUL_YUL - 율/률
```python
>>> hgtk.josa.attach('방어', hgtk.josa.RYUL_YUL)
'방어율'
>>> hgtk.josa.attach('공격', hgtk.josa.RYUL_YUL)
'공격률'
>>> hgtk.josa.attach('반환', hgtk.josa.RYUL_YUL)
'반환율'
```

### Const
* CHO: 초성 리스트
* JOONG: 중성 리스트
* JONG: 종성 리스트, 종성이 없는 경우를 대비해 공백 문자가 추가됨

* JAMO: 공백을 제외한 모든 자모(비조합문자)

* NUM_CHO: 초성 개수
* NUM_JOONG: 중성 개수
* NUM_JONG: 종성 개수 

* FIRST_HANGUL_UNICODE: 유니코드 상의 한글 코드(조합문자) 시작 시점
* LAST_HANGUL_UNICODE: 유니코드 상의 한글 코드(조합문자) 종료 시점 

### Exception
예외 처리를 위한 Exception들, 의미는 보이는 대로..
* NotHangulException
* NotLetterException
* NotWordException


##Tested in
- python 2.6
- python 2.7
- python 3.3
- python 3.4
- python 3.5
- python 3.6
- python nightly build

- PyPy 2.2.5.
- Pypy 3 2.4.
- PyPy 5.3.1


----

Apache 2.0 License

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/bluedisk/hangul-toolkit",
    "name": "hgtk",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "hangul charactorjamo automada composing decomposing josa",
    "author": "Wonwoo, lee",
    "author_email": "bluedisk@gmail.com",
    "download_url": "",
    "platform": null,
    "description": "Simple Toolkit for Hangul\n=========================\nbase code forked from https://github.com/rhobot/Hangulpy\n\n\ud55c\uae00 \uc790\ubaa8 \ubd84\ud574, \uc870\ud569(\uc624\ud1a0\ub9c8\ud0c0), \uc870\uc0ac \ubd99\uc774\uae30, \ucd08/\uc911/\uc885 \ubd84\ud574\uc870\ud569, \ud55c\uae00/\ud55c\uc790/\uc601\ubb38 \uc5ec\ubd80 \uccb4\ud06c \ub4f1\uc744 \uc9c0\uc6d0\ud569\ub2c8\ub2e4.\n\n## INSTALL\n```\npip install hgtk\n```\n\n## Samples\n### Letter\n#### Decompose character\n```python\n>>> hgtk.letter.decompose('\uac10')\n('\u3131', '\u314f', '\u3141')\n```\n#### Compose character\n```python\n>>> hgtk.letter.compose('\u3131', '\u314f', '\u3141')\n'\uac10'\n```\n\n### Text\n#### Decompose text\n```python\n>>> hgtk.text.decompose('\ud559\uad50\uc885\uc774 \ub561\ub561\ub561! hello world 1234567890 \u314b\u314b!')\n'\u314e\u314f\u3131\u1d25\u3131\u315b\u1d25\u3148\u3157\u3147\u1d25\u3147\u3163\u1d25 \u3138\u3150\u3147\u1d25\u3138\u3150\u3147\u1d25\u3138\u3150\u3147\u1d25! hello world 1234567890 \u314b\u1d25\u314b\u1d25!'\n```\n\n\uae30\ubcf8 \uc870\ud569 \uc644\ub8cc \uae30\ud638\ub294 \u1d25\uc774\uace0, \uc544\ub798\uc640 \uac19\uc774 compose_code \uc635\uc158\uc73c\ub85c \ubcc0\uacbd \uac00\ub2a5\ud569\ub2c8\ub2e4.\n```python\n>>> hgtk.text.decompose('\ud559\uad50\uc885\uc774 \ub561\ub561\ub561! hello world 1234567890 \u314b\u314b!', compose_code='/')\n'\u314e\u314f\u3131/\u3131\u315b/\u3148\u3157\u3147/\u3147\u3163/\u3138\u3150\u3147/\u3138\u3150\u3147/\u3138\u3150\u3147/! hello world 1234567890 \u314b/\u314b/!'\n```\n\uae30\ubcf8 \uc870\ud569\uae30\ud638\uc758 \uc758\ubbf8\ub294 \uacf0\ub3cc\uc774 \uc785\ub2c8\ub2e4. \ud83d\udc47  \n<img src='https://user-images.githubusercontent.com/3307964/136328328-a5dea3b0-4731-48a5-881a-fae9b2c83dba.png' width=300/>\n\n#### Compose text (Automata)\n```python\n>>> hgtk.text.compose('\u314e\u314f\u3131\u1d25\u3131\u315b\u1d25\u3148\u3157\u3147\u1d25\u3147\u3163\u1d25 \u3138\u3150\u3147\u1d25\u3138\u3150\u3147\u1d25\u3138\u3150\u3147\u1d25! hello world 1234567890 \u314b\u1d25\u314b\u1d25!')\n'\ud559\uad50\uc885\uc774 \ub561\ub561\ub561! hello world 1234567890 \u314b\u314b!'\n```\n\n### Checker\n\n#### is hangul text\n```python\n>>> hgtk.checker.is_hangul('\ud55c\uae00\uc785\ub2c8\ub2e4')\nTrue\n>>> hgtk.checker.is_hangul('no\ud55c\uae00\uc785\ub2c8\ub2e4')\nFalse\n>>> hgtk.checker.is_hangul('it is english')\nFalse\n```\n\n#### is hanja text\n```python\n>>> hgtk.checker.is_hanja('\u5927\u97d3\u6c11\u570b')\nTrue\n>>> hgtk.checker.is_hanja('\u5927\ud55c\ubbfc\uad6d')\nFalse\n>>> hgtk.checker.is_hanja('\ub300\ud55c\ubbfc\uad6d')\nFalse\n```\n\n#### is latin1 text\n```python\n>>> hgtk.checker.is_latin1('abcdefghijklmnopqrstuvwxyz')\nTrue\n>>> hgtk.checker.is_latin1('\ud55c\uae00latin1\ud55c')\nFalse\n````\n\n#### has batchim\n```python\n>>> hgtk.checker.has_batchim('\ud55c')   # '\ud55c' has batchim '\u3134'\nTrue\n>>> hgtk.checker.has_batchim('\ud558')\nFalse\n```\n\n\n### Josa\n#### EUN_NEUN - \uc740/\ub294\n```python\n>>> hgtk.josa.attach('\ud558\ub298', hgtk.josa.EUN_NEUN)\n'\ud558\ub298\uc740'\n>>> hgtk.josa.attach('\ubc14\ub2e4', hgtk.josa.EUN_NEUN)\n'\ubc14\ub2e4\ub294'\n```\n#### I_GA - \uc774/\uac00\n```python\n>>> hgtk.josa.attach('\ud558\ub298', hgtk.josa.I_GA)\n'\ud558\ub298\uc774'\n>>> hgtk.josa.attach('\ubc14\ub2e4', hgtk.josa.I_GA)\n'\ubc14\ub2e4\uac00'\n```\n#### EUL_REUL - \uc744/\ub97c \n```python\n>>> hgtk.josa.attach('\ud558\ub298', hgtk.josa.EUL_REUL)\n'\ud558\ub298\uc744'\n>>> hgtk.josa.attach('\ubc14\ub2e4', hgtk.josa.EUL_REUL)\n'\ubc14\ub2e4\ub97c'\n```\n#### GWA_WA - \uacfc/\uc640 \n```python\n>>> hgtk.josa.attach('\ud558\ub298', hgtk.josa.GWA_WA)\n'\ud558\ub298\uacfc'\n>>> hgtk.josa.attach('\ubc14\ub2e4', hgtk.josa.GWA_WA)\n'\ubc14\ub2e4\uc640'\n```\n#### IDA_DA - \uc774\ub2e4/\ub2e4 \n```python\n>>> hgtk.josa.attach('\ud558\ub298', hgtk.josa.IDA_DA)\n'\ud558\ub298\uc774\ub2e4'\n>>> hgtk.josa.attach('\ubc14\ub2e4', hgtk.josa.IDA_DA)\n'\ubc14\ub2e4\ub2e4'\n```\n#### EURO_RO - \ub85c/\uc73c\ub85c\n```python\n>>> hgtk.josa.attach('\ud558\ub298', hgtk.josa.EURO_RO)\n'\ud558\ub298\ub85c'\n>>> hgtk.josa.attach('\ubc14\ub2e4', hgtk.josa.EURO_RO)\n'\ubc14\ub2e4\ub85c'\n>>> hgtk.josa.attach('\ud0dc\uc591', hgtk.josa.EURO_RO)\n'\ud0dc\uc591\uc73c\ub85c'\n```\n#### RYUL_YUL - \uc728/\ub960\n```python\n>>> hgtk.josa.attach('\ubc29\uc5b4', hgtk.josa.RYUL_YUL)\n'\ubc29\uc5b4\uc728'\n>>> hgtk.josa.attach('\uacf5\uaca9', hgtk.josa.RYUL_YUL)\n'\uacf5\uaca9\ub960'\n>>> hgtk.josa.attach('\ubc18\ud658', hgtk.josa.RYUL_YUL)\n'\ubc18\ud658\uc728'\n```\n\n### Const\n* CHO: \ucd08\uc131 \ub9ac\uc2a4\ud2b8\n* JOONG: \uc911\uc131 \ub9ac\uc2a4\ud2b8\n* JONG: \uc885\uc131 \ub9ac\uc2a4\ud2b8, \uc885\uc131\uc774 \uc5c6\ub294 \uacbd\uc6b0\ub97c \ub300\ube44\ud574 \uacf5\ubc31 \ubb38\uc790\uac00 \ucd94\uac00\ub428\n\n* JAMO: \uacf5\ubc31\uc744 \uc81c\uc678\ud55c \ubaa8\ub4e0 \uc790\ubaa8(\ube44\uc870\ud569\ubb38\uc790)\n\n* NUM_CHO: \ucd08\uc131 \uac1c\uc218\n* NUM_JOONG: \uc911\uc131 \uac1c\uc218\n* NUM_JONG: \uc885\uc131 \uac1c\uc218 \n\n* FIRST_HANGUL_UNICODE: \uc720\ub2c8\ucf54\ub4dc \uc0c1\uc758 \ud55c\uae00 \ucf54\ub4dc(\uc870\ud569\ubb38\uc790) \uc2dc\uc791 \uc2dc\uc810\n* LAST_HANGUL_UNICODE: \uc720\ub2c8\ucf54\ub4dc \uc0c1\uc758 \ud55c\uae00 \ucf54\ub4dc(\uc870\ud569\ubb38\uc790) \uc885\ub8cc \uc2dc\uc810 \n\n### Exception\n\uc608\uc678 \ucc98\ub9ac\ub97c \uc704\ud55c Exception\ub4e4, \uc758\ubbf8\ub294 \ubcf4\uc774\ub294 \ub300\ub85c..\n* NotHangulException\n* NotLetterException\n* NotWordException\n\n\n##Tested in\n- python 2.6\n- python 2.7\n- python 3.3\n- python 3.4\n- python 3.5\n- python 3.6\n- python nightly build\n\n- PyPy 2.2.5.\n- Pypy 3 2.4.\n- PyPy 5.3.1\n\n\n----\n\nApache 2.0 License\n",
    "bugtrack_url": null,
    "license": "Apache 2.0",
    "summary": "Toolkit for Hangul composing, decomposing and etc...",
    "version": "0.2.1",
    "project_urls": {
        "Homepage": "https://github.com/bluedisk/hangul-toolkit"
    },
    "split_keywords": [
        "hangul",
        "charactorjamo",
        "automada",
        "composing",
        "decomposing",
        "josa"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4cd0d62a73954ab95a8d3967a063c371c16feddfb8bd2957fd11691adf7834e8",
                "md5": "37043d0570f1577f110bcf903e51b723",
                "sha256": "f3e33dacf6ab2564f6257418b718e2c7a4ae9fffa32e18d6c4f6278b72ba73ee"
            },
            "downloads": -1,
            "filename": "hgtk-0.2.1-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "37043d0570f1577f110bcf903e51b723",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 12011,
            "upload_time": "2023-09-17T10:36:01",
            "upload_time_iso_8601": "2023-09-17T10:36:01.773562Z",
            "url": "https://files.pythonhosted.org/packages/4c/d0/d62a73954ab95a8d3967a063c371c16feddfb8bd2957fd11691adf7834e8/hgtk-0.2.1-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-17 10:36:01",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bluedisk",
    "github_project": "hangul-toolkit",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "circle": true,
    "requirements": [],
    "lcname": "hgtk"
}
        
Elapsed time: 3.32227s