OldHangeul


NameOldHangeul JSON
Version 1.2.1 PyPI version JSON
download
home_pagehttps://github.com/go00ood/OldHangeul
SummaryProgram with functions for manipulation of old Korean script, including Unicode normalization and jamo separation.
upload_time2024-08-09 11:36:46
maintainerNone
docs_urlNone
authorgo00od
requires_python>=3
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # OldHangeul

`OldHangeul`은 Python에서 옛한글을 편리하게 다루기 위해 개발한 패키지입니다. 

파이썬에서는 한양 PUA로 인코딩된 완성형 옛한글을 지원하지 않습니다. 그래서 옛한글이 포함된 텍스트는 유니코드 정규화(Unicode normalization)가 작동하지 않고, `string index`와 `len()`에서 문제가 발생합니다. `OldHangeul`은 텍스트를 조합형으로 전환하여 이러한 문제를 해결했습니다. 더불어 자음과 모음으로 옛한글을 처리할 수 있는 기능이 있습니다. 




## 설치


```python
pip install OldHangeul
```



## 사용법

`OldHangeul`은 파이썬에서 작동합니다. 

---
### OldTexts

옛한글이 포함된 텍스트를 다루는 클래스입니다. 완성형이 포함된 텍스트를 조합형으로 전환하고, 인덱싱(Indexing)과 슬라이싱(Slicing), `len()`, `get_jamo()`를 지원합니다. 

```python
from OldHangeul import OldTexts
text=OldTexts('스님이 免帖 나 주시고') #완성형이 포함된 텍스트입니다
print(text)
```

OldTexts 사용 예시입니다. 
```python
스스ᇰ님이 免帖 ᄒᆞ나ᄒᆞᆯ 주시고
```

OldTexts는 문자열의 길이를 손쉽게 계산할 수 있으며, 인덱싱(indexing)과 슬라이싱(slicing) 기능도 제공합니다.

```python
len(text)
#15

text[1]
#스ᇰ
```

---
### text_to_jamo

텍스트를 자음과 모음으로 분리합니다. 낱자는 space로 구분되어 있으며, 문서 내의 공백은 _로 나타냅니다. 

compatibility: 초성과 종성을 동일한 유니코드로 통일하여 처리

spacing: 문서 내 공백 표현 

   


```python
from OldHangeul import text_to_jamo
text=text_to_jamo('스스ᇰ님이 免帖 ᄒᆞ나ᄒᆞᆯ 주시고', compatibility=False, spacing=True)
print(text)
```

text_to_jamo는 str타입으로 결과를 반환합니다. 

```python
ᄉ ᅳ ᄉ ᅳ ᇰ ᄂ ᅵ ᆷ ᄋ ᅵ _ 免 帖 _ ᄒ ᆞ ᄂ ᅡ ᄒ ᆞ ᆯ _ ᄌ ᅮ ᄉ ᅵ ᄀ ᅩ
```



```python
from OldHangeul import text_to_jamo
text=text_to_jamo('스스ᇰ님이 免帖 ᄒᆞ나ᄒᆞᆯ 주시고', compatibility=True, spacing=True)
print(text)
```

compatibility와 spacing 옵션을 통해 다양한 자모 분리 및 공백 처리 방식을 설정할 수 있습니다.

```python
ㅅ ㅡ ㅅ ㅡ ㆁ ㄴ ㅣ ㅁ ㅇ ㅣ _ 免 帖 _ ㅎ ㆍ ㄴ ㅏ ㅎ ㆍ ㄹ _ ㅈ ㅜ ㅅ ㅣ ㄱ ㅗ
```


---
### hNFD

[유니코드 정규화](https://ko.wikipedia.org/wiki/%EC%9C%A0%EB%8B%88%EC%BD%94%EB%93%9C_%EB%93%B1%EA%B0%80%EC%84%B1) 중 NFD의 기능입니다. 옛한글이 포함된 텍스트에도 작동하며, 소리마디(한양 PUA)를 첫가끝 코드로 변환합니다. 


```python
from OldHangeul import hNFD
text=hNFD('스님이 免帖 나 주시고')
print(text)
```
```python
스스ᇰ님이 免帖 ᄒᆞ나ᄒᆞᆯ 주시고
```



---
### hNFC

[유니코드 정규화](https://ko.wikipedia.org/wiki/%EC%9C%A0%EB%8B%88%EC%BD%94%EB%93%9C_%EB%93%B1%EA%B0%80%EC%84%B1) 중 NFC의 기능입니다. 옛한글이 포함된 텍스트에도 작동하며, 첫가끝 코드를 소리마디(한양 PUA)로 변환합니다. 변환이 안 된 텍스트는 `activation failed`로 안내됩니다. 


```python
from OldHangeul import hNFC
text=hNFC('스스ᇰ님이 免帖 ᄒᆞ나ᄒᆞᆯ 주시고')
print(text)
```
```python
스님이 免帖 나 주시고
```


---
### old_hNFD

[유니코드 정규화](https://ko.wikipedia.org/wiki/%EC%9C%A0%EB%8B%88%EC%BD%94%EB%93%9C_%EB%93%B1%EA%B0%80%EC%84%B1) 중 NFD의 기능입니다. 옛한글이 포함된 텍스트에도 작동하며, 전체 텍스트 중 옛한글이 포함된 글자만 첫가끝(조합형)으로 변환합니다. 


```python
from OldHangeul import old_hNFD
text=old_hNFD('스스ᇰ님이 免帖 ᄒᆞ나ᄒᆞᆯ 주시고') # 조합형 → 옛한글 글자만 조합형
print(text)
```

```python
스스ᇰ님이 免帖 ᄒᆞ나ᄒᆞᆯ 주시고
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/go00ood/OldHangeul",
    "name": "OldHangeul",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3",
    "maintainer_email": null,
    "keywords": null,
    "author": "go00od",
    "author_email": "go00od@naver.com",
    "download_url": "https://files.pythonhosted.org/packages/7e/e4/1e306724ab5583e9d5ab0f99a733808fb4009b67624c976048b804efc55e/oldhangeul-1.2.1.tar.gz",
    "platform": null,
    "description": "# OldHangeul\r\n\r\n`OldHangeul`\uc740 Python\uc5d0\uc11c \uc61b\ud55c\uae00\uc744 \ud3b8\ub9ac\ud558\uac8c \ub2e4\ub8e8\uae30 \uc704\ud574 \uac1c\ubc1c\ud55c \ud328\ud0a4\uc9c0\uc785\ub2c8\ub2e4. \r\n\r\n\ud30c\uc774\uc36c\uc5d0\uc11c\ub294 \ud55c\uc591 PUA\ub85c \uc778\ucf54\ub529\ub41c \uc644\uc131\ud615 \uc61b\ud55c\uae00\uc744 \uc9c0\uc6d0\ud558\uc9c0 \uc54a\uc2b5\ub2c8\ub2e4. \uadf8\ub798\uc11c \uc61b\ud55c\uae00\uc774 \ud3ec\ud568\ub41c \ud14d\uc2a4\ud2b8\ub294 \uc720\ub2c8\ucf54\ub4dc \uc815\uaddc\ud654(Unicode normalization)\uac00 \uc791\ub3d9\ud558\uc9c0 \uc54a\uace0, `string index`\uc640 `len()`\uc5d0\uc11c \ubb38\uc81c\uac00 \ubc1c\uc0dd\ud569\ub2c8\ub2e4. `OldHangeul`\uc740 \ud14d\uc2a4\ud2b8\ub97c \uc870\ud569\ud615\uc73c\ub85c \uc804\ud658\ud558\uc5ec \uc774\ub7ec\ud55c \ubb38\uc81c\ub97c \ud574\uacb0\ud588\uc2b5\ub2c8\ub2e4. \ub354\ubd88\uc5b4 \uc790\uc74c\uacfc \ubaa8\uc74c\uc73c\ub85c \uc61b\ud55c\uae00\uc744 \ucc98\ub9ac\ud560 \uc218 \uc788\ub294 \uae30\ub2a5\uc774 \uc788\uc2b5\ub2c8\ub2e4. \r\n\r\n\r\n\r\n\r\n## \uc124\uce58\r\n\r\n\r\n```python\r\npip install OldHangeul\r\n```\r\n\r\n\r\n\r\n## \uc0ac\uc6a9\ubc95\r\n\r\n`OldHangeul`\uc740 \ud30c\uc774\uc36c\uc5d0\uc11c \uc791\ub3d9\ud569\ub2c8\ub2e4. \r\n\r\n---\r\n### OldTexts\r\n\r\n\uc61b\ud55c\uae00\uc774 \ud3ec\ud568\ub41c \ud14d\uc2a4\ud2b8\ub97c \ub2e4\ub8e8\ub294 \ud074\ub798\uc2a4\uc785\ub2c8\ub2e4. \uc644\uc131\ud615\uc774 \ud3ec\ud568\ub41c \ud14d\uc2a4\ud2b8\ub97c \uc870\ud569\ud615\uc73c\ub85c \uc804\ud658\ud558\uace0, \uc778\ub371\uc2f1(Indexing)\uacfc \uc2ac\ub77c\uc774\uc2f1(Slicing), `len()`, `get_jamo()`\ub97c \uc9c0\uc6d0\ud569\ub2c8\ub2e4. \r\n\r\n```python\r\nfrom OldHangeul import OldTexts\r\ntext=OldTexts('\uc2a4\ue95b\ub2d8\uc774 \u514d\u5e16 \uf537\ub098\uf53c \uc8fc\uc2dc\uace0') #\uc644\uc131\ud615\uc774 \ud3ec\ud568\ub41c \ud14d\uc2a4\ud2b8\uc785\ub2c8\ub2e4\r\nprint(text)\r\n```\r\n\r\nOldTexts \uc0ac\uc6a9 \uc608\uc2dc\uc785\ub2c8\ub2e4. \r\n```python\r\n\u1109\u1173\u1109\u1173\u11f0\u1102\u1175\u11b7\u110b\u1175 \u514d\u5e16 \u1112\u119e\u1102\u1161\u1112\u119e\u11af \u110c\u116e\u1109\u1175\u1100\u1169\r\n```\r\n\r\nOldTexts\ub294 \ubb38\uc790\uc5f4\uc758 \uae38\uc774\ub97c \uc190\uc27d\uac8c \uacc4\uc0b0\ud560 \uc218 \uc788\uc73c\uba70, \uc778\ub371\uc2f1(indexing)\uacfc \uc2ac\ub77c\uc774\uc2f1(slicing) \uae30\ub2a5\ub3c4 \uc81c\uacf5\ud569\ub2c8\ub2e4.\r\n\r\n```python\r\nlen(text)\r\n#15\r\n\r\ntext[1]\r\n#\u1109\u1173\u11f0\r\n```\r\n\r\n---\r\n### text_to_jamo\r\n\r\n\ud14d\uc2a4\ud2b8\ub97c \uc790\uc74c\uacfc \ubaa8\uc74c\uc73c\ub85c \ubd84\ub9ac\ud569\ub2c8\ub2e4. \ub0b1\uc790\ub294 space\ub85c \uad6c\ubd84\ub418\uc5b4 \uc788\uc73c\uba70, \ubb38\uc11c \ub0b4\uc758 \uacf5\ubc31\uc740 _\ub85c \ub098\ud0c0\ub0c5\ub2c8\ub2e4. \r\n\r\ncompatibility: \ucd08\uc131\uacfc \uc885\uc131\uc744 \ub3d9\uc77c\ud55c \uc720\ub2c8\ucf54\ub4dc\ub85c \ud1b5\uc77c\ud558\uc5ec \ucc98\ub9ac\r\n\r\nspacing: \ubb38\uc11c \ub0b4 \uacf5\ubc31 \ud45c\ud604 \r\n\r\n   \r\n\r\n\r\n```python\r\nfrom OldHangeul import text_to_jamo\r\ntext=text_to_jamo('\u1109\u1173\u1109\u1173\u11f0\u1102\u1175\u11b7\u110b\u1175 \u514d\u5e16 \u1112\u119e\u1102\u1161\u1112\u119e\u11af \u110c\u116e\u1109\u1175\u1100\u1169', compatibility=False, spacing=True)\r\nprint(text)\r\n```\r\n\r\ntext_to_jamo\ub294 str\ud0c0\uc785\uc73c\ub85c \uacb0\uacfc\ub97c \ubc18\ud658\ud569\ub2c8\ub2e4. \r\n\r\n```python\r\n\u1109 \u1173 \u1109 \u1173 \u11f0 \u1102 \u1175 \u11b7 \u110b \u1175 _ \u514d \u5e16 _ \u1112 \u119e \u1102 \u1161 \u1112 \u119e \u11af _ \u110c \u116e \u1109 \u1175 \u1100 \u1169\r\n```\r\n\r\n\r\n\r\n```python\r\nfrom OldHangeul import text_to_jamo\r\ntext=text_to_jamo('\u1109\u1173\u1109\u1173\u11f0\u1102\u1175\u11b7\u110b\u1175 \u514d\u5e16 \u1112\u119e\u1102\u1161\u1112\u119e\u11af \u110c\u116e\u1109\u1175\u1100\u1169', compatibility=True, spacing=True)\r\nprint(text)\r\n```\r\n\r\ncompatibility\uc640 spacing \uc635\uc158\uc744 \ud1b5\ud574 \ub2e4\uc591\ud55c \uc790\ubaa8 \ubd84\ub9ac \ubc0f \uacf5\ubc31 \ucc98\ub9ac \ubc29\uc2dd\uc744 \uc124\uc815\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\r\n\r\n```python\r\n\u3145 \u3161 \u3145 \u3161 \u3181 \u3134 \u3163 \u3141 \u3147 \u3163 _ \u514d \u5e16 _ \u314e \u318d \u3134 \u314f \u314e \u318d \u3139 _ \u3148 \u315c \u3145 \u3163 \u3131 \u3157\r\n```\r\n\r\n\r\n---\r\n### hNFD\r\n\r\n[\uc720\ub2c8\ucf54\ub4dc \uc815\uaddc\ud654](https://ko.wikipedia.org/wiki/%EC%9C%A0%EB%8B%88%EC%BD%94%EB%93%9C_%EB%93%B1%EA%B0%80%EC%84%B1) \uc911 NFD\uc758 \uae30\ub2a5\uc785\ub2c8\ub2e4. \uc61b\ud55c\uae00\uc774 \ud3ec\ud568\ub41c \ud14d\uc2a4\ud2b8\uc5d0\ub3c4 \uc791\ub3d9\ud558\uba70, \uc18c\ub9ac\ub9c8\ub514(\ud55c\uc591 PUA)\ub97c \uccab\uac00\ub05d \ucf54\ub4dc\ub85c \ubcc0\ud658\ud569\ub2c8\ub2e4. \r\n\r\n\r\n```python\r\nfrom OldHangeul import hNFD\r\ntext=hNFD('\uc2a4\ue95b\ub2d8\uc774 \u514d\u5e16 \uf537\ub098\uf53c \uc8fc\uc2dc\uace0')\r\nprint(text)\r\n```\r\n```python\r\n\u1109\u1173\u1109\u1173\u11f0\u1102\u1175\u11b7\u110b\u1175 \u514d\u5e16 \u1112\u119e\u1102\u1161\u1112\u119e\u11af \u110c\u116e\u1109\u1175\u1100\u1169\r\n```\r\n\r\n\r\n\r\n---\r\n### hNFC\r\n\r\n[\uc720\ub2c8\ucf54\ub4dc \uc815\uaddc\ud654](https://ko.wikipedia.org/wiki/%EC%9C%A0%EB%8B%88%EC%BD%94%EB%93%9C_%EB%93%B1%EA%B0%80%EC%84%B1) \uc911 NFC\uc758 \uae30\ub2a5\uc785\ub2c8\ub2e4. \uc61b\ud55c\uae00\uc774 \ud3ec\ud568\ub41c \ud14d\uc2a4\ud2b8\uc5d0\ub3c4 \uc791\ub3d9\ud558\uba70, \uccab\uac00\ub05d \ucf54\ub4dc\ub97c \uc18c\ub9ac\ub9c8\ub514(\ud55c\uc591 PUA)\ub85c \ubcc0\ud658\ud569\ub2c8\ub2e4. \ubcc0\ud658\uc774 \uc548 \ub41c \ud14d\uc2a4\ud2b8\ub294 `activation failed`\ub85c \uc548\ub0b4\ub429\ub2c8\ub2e4. \r\n\r\n\r\n```python\r\nfrom OldHangeul import hNFC\r\ntext=hNFC('\u1109\u1173\u1109\u1173\u11f0\u1102\u1175\u11b7\u110b\u1175 \u514d\u5e16 \u1112\u119e\u1102\u1161\u1112\u119e\u11af \u110c\u116e\u1109\u1175\u1100\u1169')\r\nprint(text)\r\n```\r\n```python\r\n\uc2a4\ue95b\ub2d8\uc774 \u514d\u5e16 \uf537\ub098\uf53c \uc8fc\uc2dc\uace0\r\n```\r\n\r\n\r\n---\r\n### old_hNFD\r\n\r\n[\uc720\ub2c8\ucf54\ub4dc \uc815\uaddc\ud654](https://ko.wikipedia.org/wiki/%EC%9C%A0%EB%8B%88%EC%BD%94%EB%93%9C_%EB%93%B1%EA%B0%80%EC%84%B1) \uc911 NFD\uc758 \uae30\ub2a5\uc785\ub2c8\ub2e4. \uc61b\ud55c\uae00\uc774 \ud3ec\ud568\ub41c \ud14d\uc2a4\ud2b8\uc5d0\ub3c4 \uc791\ub3d9\ud558\uba70, \uc804\uccb4 \ud14d\uc2a4\ud2b8 \uc911 \uc61b\ud55c\uae00\uc774 \ud3ec\ud568\ub41c \uae00\uc790\ub9cc \uccab\uac00\ub05d(\uc870\ud569\ud615)\uc73c\ub85c \ubcc0\ud658\ud569\ub2c8\ub2e4. \r\n\r\n\r\n```python\r\nfrom OldHangeul import old_hNFD\r\ntext=old_hNFD('\u1109\u1173\u1109\u1173\u11f0\u1102\u1175\u11b7\u110b\u1175 \u514d\u5e16 \u1112\u119e\u1102\u1161\u1112\u119e\u11af \u110c\u116e\u1109\u1175\u1100\u1169') # \uc870\ud569\ud615 \u2192 \uc61b\ud55c\uae00 \uae00\uc790\ub9cc \uc870\ud569\ud615\r\nprint(text)\r\n```\r\n\r\n```python\r\n\uc2a4\u1109\u1173\u11f0\ub2d8\uc774 \u514d\u5e16 \u1112\u119e\ub098\u1112\u119e\u11af \uc8fc\uc2dc\uace0\r\n```\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Program with functions for manipulation of old Korean script, including Unicode normalization and jamo separation.",
    "version": "1.2.1",
    "project_urls": {
        "Homepage": "https://github.com/go00ood/OldHangeul"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "938f16e9af9bfb5286d634d97425f81a27531268b4ab87aa93816b050404fa38",
                "md5": "d8f7d64feafc778a4211e4a3870cd9b3",
                "sha256": "38d9492dc380f27351666cc83e3c1f194c048a309148838d721a282b6d24f03c"
            },
            "downloads": -1,
            "filename": "OldHangeul-1.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d8f7d64feafc778a4211e4a3870cd9b3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3",
            "size": 22943,
            "upload_time": "2024-08-09T11:36:45",
            "upload_time_iso_8601": "2024-08-09T11:36:45.223933Z",
            "url": "https://files.pythonhosted.org/packages/93/8f/16e9af9bfb5286d634d97425f81a27531268b4ab87aa93816b050404fa38/OldHangeul-1.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7ee41e306724ab5583e9d5ab0f99a733808fb4009b67624c976048b804efc55e",
                "md5": "e3579e21ef1ebeb8a6fb3428966a9f1a",
                "sha256": "393287a7040929fdd14cd3b5335bf152cc9f24ab12d9ecfff601a40280562fc9"
            },
            "downloads": -1,
            "filename": "oldhangeul-1.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "e3579e21ef1ebeb8a6fb3428966a9f1a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3",
            "size": 23746,
            "upload_time": "2024-08-09T11:36:46",
            "upload_time_iso_8601": "2024-08-09T11:36:46.241577Z",
            "url": "https://files.pythonhosted.org/packages/7e/e4/1e306724ab5583e9d5ab0f99a733808fb4009b67624c976048b804efc55e/oldhangeul-1.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-09 11:36:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "go00ood",
    "github_project": "OldHangeul",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "oldhangeul"
}
        
Elapsed time: 0.43330s