example990420


Nameexample990420 JSON
Version 0.0.5 PyPI version JSON
download
home_pagehttps://github.com/andreihar/taibun
SummaryConvert Chinese characters to Taiwanese
upload_time2023-07-04 08:30:41
maintainer
docs_urlNone
authorAndrei Harbachov
requires_python
licenseMIT
keywords python taiwan taiwanese taigi hokkien romanization transliteration
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
<!-- PROJECT LOGO -->

<div align="center">



# <ruby>台文<rt>Tâi-bûn</rt></ruby>



<!-- PROJECT SHIELDS -->

[![Contributors][contributors-badge]][contributors]

[![Release][release-badge]][release]

[![Licence][licence-badge]][licence]

[![LinkedIn][linkedin-badge]][linkedin]

[![GitHub][github-badge]][github]



**Taiwanese Hokkien transliterator from Chinese characters**



It has methods that allow to customise transliteration and retrieve any necessary information about Taiwanese Hokkien pronunciation.<br />

Includes word tokeniser for Taiwanese Hokkien.



[Report Bug][bug] •

[PyPI][pypi]



</div>





---





<!-- TABLE OF CONTENTS -->

<details open>

  <summary>Table of Contents</summary>

  <ol>

    <li><a href="#install">Install</a></li>

    <li>

      <a href="#usage">Usage</a>

      <ul>

        <li>

          <a href="#converter">Converter</a>

          <ul>

            <li><a href="#system">System</a></li>

            <li><a href="#dialect">Dialect</a></li>

            <li><a href="#format">Format</a></li>

            <li><a href="#delimiter">Delimiter</a></li>

            <li><a href="#sandhi">Sandhi</a></li>

            <li><a href="#punctuation">Punctuation</a></li>

          </ul>

        </li>

        <li><a href="#tokeniser">Tokeniser</a></li>

      </ul>

    </li>

    <li><a href="#example">Example</a></li>

    <li><a href="#data">Data</a></li>

    <li><a href="#licence">Licence</a></li>

  </ol>

</details>





<!-- INSTALL -->

## Install



Taibun can be installed from [pypi][pypi]



```bash

$ pip install taibun

```





<!-- USAGE -->

## Usage



### Converter



`Converter` class transliterates the Chinese characters to the chosen transliteration system with parameters specified by the developer. Works for both Traditional and Simplified characters.



```python

# constructor

c = Converter(system, dialect, format, delimiter, sandhi, punctuation)



# transliterate Chinese characters

c.get(input)



# convert Simplified Chinese characters to Traditional Chinese Characters

c.to_traditional(input)

```



#### System



`system` String - system of transliteration.



* `Tailo` (default) - [Tâi-uân Lô-má-jī Phing-im Hong-àn][tailo-wiki]

* `POJ` - [Pe̍h-ōe-jī][poj-wiki]

* `Zhuyin` - [Taiwanese Phonetic Symbols][zhuyin-wiki]

* `TLPA` - [Taiwanese Language Phonetic Alphabet][tlpa-wiki]

* `Pingyim` - [Bbánlám Uē Pìngyīm Hōng'àn][pingyim-wiki]

* `Tongiong` - [Daī-ghî Tōng-iōng Pīng-im][tongiong-wiki]



| text | Tailo   | POJ     | Zhuyin      | TLPA      | Pingyim | Tongiong |

|------|---------|---------|-------------|-----------|---------|----------|

| 臺灣 | Tâi-uân | Tâi-oân | ㄉㄞˊ ㄨㄢˊ | Tai5 uan5 | Dáiwán  | Tāi-uǎn  |





#### Dialect



`dialect` String - preferred pronunciation.



* `south` (default) - [Zhangzhou][zhangzhou-wiki]-leaning pronunciation

* `north` - [Quanzhou][quanzhou-wiki]-leaning pronunciation



| text   | south         | north         |

|--------|---------------|---------------|

| 五月節 | Gōo-gue̍h-tseh | Gōo-ge̍h-tsueh |





#### Format



`format` String - format in which tones will be represented in the converted sentence.



* `mark` (default) - uses diacritics for each syllable. Not available for TLPA.

* `number` - add a number which represents the tone at the end of the syllable

* `strip` - removes any tone marking



| text | mark    | number    | strip   |

|------|---------|-----------|---------|

| 臺灣 | Tâi-uân | Tai5-uan5 | Tai-uan |





#### Delimiter



`delimiter` String - sets the delimiter character that will be placed in between syllables of a word.



Default value depends on the chosen `system`:



* `'-'` - for `Tailo`, `POJ`, `Tongiong`

* `''` - for `Pingyim`

* `' '` - for `Zhuyin`, `TLPA`



| text | '-'     | ''     | ' '     |

|------|---------|--------|---------|

| 臺灣 | Tâi-uân | Tâiuân | Tâi uân |





#### Sandhi



`sandhi` Boolean - applies the [sandhi rules of Taiwanese Hokkien][sandhi-wiki] to syllables of a single word.



Default value depends on the chosen `system`:



* `True` - for `Tongiong`

* `False` - for `Tailo`, `POJ`, `Zhuyin`, `TLPA`, `Pingyim`



| text     | False        | True         |

|----------|--------------|--------------|

| 育囡仔歌 | Io-gín-á-kua | Iō-gin-a-kua |



Sandhi rules also change depending on the dialect chosen.



| text | no sandhi | south   | north   |

|------|-----------|---------|---------|

| 臺灣 | Tâi-uân   | Tāi-uân | Tài-uân |



Note that the function is different from real sandhi rules, where changes are applied to every single syllable of the sentence, not just single words.



- **Taibun's sandhi rules**: Thái-khong pīng-iú, lín hó! Lín tsià-pá buē?

- **Actual sandhi rules**: Thái-khōng pīng-iú, lin hó! Lin tsià-pa buē?





#### Punctuation



`punctuation` String



* `format` (default) - converts Chinese-style punctuation to Latin-style punctuation and capitalises words at the beginning of each sentence.

* `none` - preserves Chinese-style punctuation and doesn't capitalise words at the beginning of new sentences.



| text | format | none |

|-|-|-|

| 這是臺南,簡稱「南」(白話字:Tâi-lâm;注音符號:ㄊㄞˊ ㄋㄢˊ,國語:Táinán)。 | Tse sī Tâi-lâm, kán-tshing "lâm" (Pe̍h-uē-jī: Tâi-lâm; tsù-im hû-hō: ㄊㄞˊ ㄋㄢˊ, kok-gí: Táinán). | tse sī Tâi-lâm,kán-tshing「lâm」(Pe̍h-uē-jī:Tâi-lâm;tsù-im hû-hō:ㄊㄞˊ ㄋㄢˊ,kok-gí:Táinán)。 |



### Tokeniser



`Tokeniser` class performs [NLTK wordpunct_tokenize][nltk-tokenize]-like tokenisation of a Taiwanese Hokkien sentence.



```python

# constructor

t = Tokeniser()



# tokenise Taiwanese Hokkien sentence

t.tokenise(input)

```





<!-- EXAMPLE -->

## Example



```python

from taibun import Converter



# System

c = Converter() # Tailo system default

c.get('先生講,學生恬恬聽。')

>> Sian-sinn kóng, ha̍k-sing tiām-tiām thiann.



c = Converter(system='Zhuyin')

c.get('先生講,學生恬恬聽。')

>> ㄒㄧㄢ ㄒㆪ ㄍㆲˋ, ㄏㄚㄍ ㄒㄧㄥ ㄉㄧㆰ˫ ㄉㄧㆰ˫ ㄊㄧㆩ.



# Dialect

c = Converter() # south dialect default

c.get("我欲用箸食魚")

>> Guá beh īng tī tsia̍h hî



c = Converter(dialect='north')

c.get("我欲用箸食魚")

>> Guá bueh īng tū tsia̍h hû



# Format

c = Converter() # for Tailo, mark by default

c.get("生日快樂")

>> Senn-ji̍t khuài-lo̍k



c = Converter(format='number')

c.get("生日快樂")

>> Senn1-jit8 khuai3-lok8



c = Converter(format='strip')

c.get("生日快樂")

>> Senn-jit khuai-lok



# Delimiter

c = Converter(delimiter='')

c.get("先生講,學生恬恬聽。")

>> Siansinn kóng, ha̍ksing tiāmtiām thiann.



c = Converter(system='Pingyim', delimiter='-')

c.get("先生講,學生恬恬聽。")

>> Siān-snī gǒng, hág-sīng diâm-diâm tinā.



# Sandhi

c = Converter() # for Tailo, sandhi False by default

c.get("南迴鐵路")

>> Lâm-huê-thih-lōo



c = Converter(sandhi=True)

c.get("南迴鐵路")

>> Lām-huē-thí-lōo



# Punctuation

c = Converter() # format punctuation default

c.get("太空朋友,恁好!恁食飽未?")

>> Thài-khong pîng-iú, lín hó! Lín tsia̍h-pá buē?



c = Converter(punctuation='none')

c.get("太空朋友,恁好!恁食飽未?")

>> thài-khong pîng-iú,lín hó!lín tsia̍h-pá buē?





# Tokeniser

t = Tokeniser()

t.tokenise("太空朋友,恁好!恁食飽未?")

>> ['太空', '朋友', ',', '恁', '好', '!', '恁', '食飽', '未', '?']

```





<!-- DATA -->

## Data



- [Dictionary of Frequently-Used Taiwan Minnan][dictionary] (via [moedict-data-twblg][dictionary-via])





<!-- LICENCE -->

## Licence



Because Taibun is MIT-licensed, any developer can essentially do whatever they want with it as long as they include the original copyright and licence notice in any copies of the source code. Note, that the data used by the package is licensed under a different copyright.



The data is licensed under [CC BY-ND 3.0 TW][distionary-cc]







<!-- MARKDOWN LINKS -->

[contributors-badge]: https://img.shields.io/github/contributors/andreihar/taibun?style=for-the-badge

[contributors]: #usage

[release-badge]: https://img.shields.io/github/v/release/andreihar/taibun?color=38618c&style=for-the-badge

[release]: https://github.com/andreihar/taibun/releases

[licence-badge]: https://img.shields.io/github/license/andreihar/taibun.svg?color=000000&style=for-the-badge

[licence]: LICENSE

[linkedin-badge]: https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white

[linkedin]: https://www.linkedin.com/in/andrei-harbachov/

[github-badge]: https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white

[github]: https://github.com/andreihar/taibun



[pypi]: https://pypi.org/project/taibun

[bug]: https://github.com/andreihar/taibun/issues

[dictionary]: https://twblg.dict.edu.tw/holodict_new/

[dictionary-via]: https://github.com/g0v/moedict-data-twblg

[distionary-cc]: https://creativecommons.org/licenses/by-nd/3.0/tw/deed.en



[tailo-wiki]: https://en.wikipedia.org/wiki/T%C3%A2i-u%C3%A2n_L%C3%B4-m%C3%A1-j%C4%AB_Phing-im_Hong-%C3%A0n

[poj-wiki]: https://en.wikipedia.org/wiki/Pe%CC%8Dh-%C5%8De-j%C4%AB

[zhuyin-wiki]: https://en.wikipedia.org/wiki/Taiwanese_Phonetic_Symbols

[tlpa-wiki]: https://en.wikipedia.org/wiki/Taiwanese_Language_Phonetic_Alphabet

[pingyim-wiki]: https://en.wikipedia.org/wiki/Bb%C3%A1nl%C3%A1m_p%C3%ACngy%C4%ABm

[tongiong-wiki]: https://en.wikipedia.org/wiki/Da%C4%AB-gh%C3%AE_t%C5%8Dng-i%C5%8Dng_p%C4%ABng-im

[zhangzhou-wiki]: https://en.wikipedia.org/wiki/Zhangzhou_dialects

[quanzhou-wiki]: https://en.wikipedia.org/wiki/Quanzhou_dialects

[nltk-tokenize]: https://nltk.org/api/nltk.tokenize.html

[sandhi-wiki]: https://en.wikipedia.org/wiki/Taiwanese_Hokkien#Tone%20sandhi:~:text=thng%E2%9F%A9%20(%22soup%22).-,Tone%20sandhi,-%5Bedit%5D

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/andreihar/taibun",
    "name": "example990420",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "python,taiwan,taiwanese,taigi,hokkien,romanization,transliteration",
    "author": "Andrei Harbachov",
    "author_email": "<andrei.harbachov@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/8e/ab/4284712faafb863b729f1f9ca3ec836eb59b9f4a8f49034ac502dd27670d/example990420-0.0.5.tar.gz",
    "platform": null,
    "description": "\r\n<!-- PROJECT LOGO -->\r\n\r\n<div align=\"center\">\r\n\r\n\r\n\r\n# <ruby>\u53f0\u6587<rt>T\u00e2i-b\u00fbn</rt></ruby>\r\n\r\n\r\n\r\n<!-- PROJECT SHIELDS -->\r\n\r\n[![Contributors][contributors-badge]][contributors]\r\n\r\n[![Release][release-badge]][release]\r\n\r\n[![Licence][licence-badge]][licence]\r\n\r\n[![LinkedIn][linkedin-badge]][linkedin]\r\n\r\n[![GitHub][github-badge]][github]\r\n\r\n\r\n\r\n**Taiwanese Hokkien transliterator from Chinese characters**\r\n\r\n\r\n\r\nIt has methods that allow to customise transliteration and retrieve any necessary information about Taiwanese Hokkien pronunciation.<br />\r\n\r\nIncludes word tokeniser for Taiwanese Hokkien.\r\n\r\n\r\n\r\n[Report Bug][bug] \u2022\r\n\r\n[PyPI][pypi]\r\n\r\n\r\n\r\n</div>\r\n\r\n\r\n\r\n\r\n\r\n---\r\n\r\n\r\n\r\n\r\n\r\n<!-- TABLE OF CONTENTS -->\r\n\r\n<details open>\r\n\r\n  <summary>Table of Contents</summary>\r\n\r\n  <ol>\r\n\r\n    <li><a href=\"#install\">Install</a></li>\r\n\r\n    <li>\r\n\r\n      <a href=\"#usage\">Usage</a>\r\n\r\n      <ul>\r\n\r\n        <li>\r\n\r\n          <a href=\"#converter\">Converter</a>\r\n\r\n          <ul>\r\n\r\n            <li><a href=\"#system\">System</a></li>\r\n\r\n            <li><a href=\"#dialect\">Dialect</a></li>\r\n\r\n            <li><a href=\"#format\">Format</a></li>\r\n\r\n            <li><a href=\"#delimiter\">Delimiter</a></li>\r\n\r\n            <li><a href=\"#sandhi\">Sandhi</a></li>\r\n\r\n            <li><a href=\"#punctuation\">Punctuation</a></li>\r\n\r\n          </ul>\r\n\r\n        </li>\r\n\r\n        <li><a href=\"#tokeniser\">Tokeniser</a></li>\r\n\r\n      </ul>\r\n\r\n    </li>\r\n\r\n    <li><a href=\"#example\">Example</a></li>\r\n\r\n    <li><a href=\"#data\">Data</a></li>\r\n\r\n    <li><a href=\"#licence\">Licence</a></li>\r\n\r\n  </ol>\r\n\r\n</details>\r\n\r\n\r\n\r\n\r\n\r\n<!-- INSTALL -->\r\n\r\n## Install\r\n\r\n\r\n\r\nTaibun can be installed from [pypi][pypi]\r\n\r\n\r\n\r\n```bash\r\n\r\n$ pip install taibun\r\n\r\n```\r\n\r\n\r\n\r\n\r\n\r\n<!-- USAGE -->\r\n\r\n## Usage\r\n\r\n\r\n\r\n### Converter\r\n\r\n\r\n\r\n`Converter` class transliterates the Chinese characters to the chosen transliteration system with parameters specified by the developer. Works for both Traditional and Simplified characters.\r\n\r\n\r\n\r\n```python\r\n\r\n# constructor\r\n\r\nc = Converter(system, dialect, format, delimiter, sandhi, punctuation)\r\n\r\n\r\n\r\n# transliterate Chinese characters\r\n\r\nc.get(input)\r\n\r\n\r\n\r\n# convert Simplified Chinese characters to Traditional Chinese Characters\r\n\r\nc.to_traditional(input)\r\n\r\n```\r\n\r\n\r\n\r\n#### System\r\n\r\n\r\n\r\n`system` String - system of transliteration.\r\n\r\n\r\n\r\n* `Tailo` (default) - [T\u00e2i-u\u00e2n L\u00f4-m\u00e1-j\u012b Phing-im Hong-\u00e0n][tailo-wiki]\r\n\r\n* `POJ` - [Pe\u030dh-\u014de-j\u012b][poj-wiki]\r\n\r\n* `Zhuyin` - [Taiwanese Phonetic Symbols][zhuyin-wiki]\r\n\r\n* `TLPA` - [Taiwanese Language Phonetic Alphabet][tlpa-wiki]\r\n\r\n* `Pingyim` - [Bb\u00e1nl\u00e1m U\u0113 P\u00ecngy\u012bm H\u014dng'\u00e0n][pingyim-wiki]\r\n\r\n* `Tongiong` - [Da\u012b-gh\u00ee T\u014dng-i\u014dng P\u012bng-im][tongiong-wiki]\r\n\r\n\r\n\r\n| text | Tailo   | POJ     | Zhuyin      | TLPA      | Pingyim | Tongiong |\r\n\r\n|------|---------|---------|-------------|-----------|---------|----------|\r\n\r\n| \u81fa\u7063 | T\u00e2i-u\u00e2n | T\u00e2i-o\u00e2n | \u3109\u311e\u02ca \u3128\u3122\u02ca | Tai5 uan5 | D\u00e1iw\u00e1n  | T\u0101i-u\u01cen  |\r\n\r\n\r\n\r\n\r\n\r\n#### Dialect\r\n\r\n\r\n\r\n`dialect` String - preferred pronunciation.\r\n\r\n\r\n\r\n* `south` (default) - [Zhangzhou][zhangzhou-wiki]-leaning pronunciation\r\n\r\n* `north` - [Quanzhou][quanzhou-wiki]-leaning pronunciation\r\n\r\n\r\n\r\n| text   | south         | north         |\r\n\r\n|--------|---------------|---------------|\r\n\r\n| \u4e94\u6708\u7bc0 | G\u014do-gue\u030dh-tseh | G\u014do-ge\u030dh-tsueh |\r\n\r\n\r\n\r\n\r\n\r\n#### Format\r\n\r\n\r\n\r\n`format` String - format in which tones will be represented in the converted sentence.\r\n\r\n\r\n\r\n* `mark` (default) - uses diacritics for each syllable. Not available for TLPA.\r\n\r\n* `number` - add a number which represents the tone at the end of the syllable\r\n\r\n* `strip` - removes any tone marking\r\n\r\n\r\n\r\n| text | mark    | number    | strip   |\r\n\r\n|------|---------|-----------|---------|\r\n\r\n| \u81fa\u7063 | T\u00e2i-u\u00e2n | Tai5-uan5 | Tai-uan |\r\n\r\n\r\n\r\n\r\n\r\n#### Delimiter\r\n\r\n\r\n\r\n`delimiter` String - sets the delimiter character that will be placed in between syllables of a word.\r\n\r\n\r\n\r\nDefault value depends on the chosen `system`:\r\n\r\n\r\n\r\n* `'-'` - for `Tailo`, `POJ`, `Tongiong`\r\n\r\n* `''` - for `Pingyim`\r\n\r\n* `' '` - for `Zhuyin`, `TLPA`\r\n\r\n\r\n\r\n| text | '-'     | ''     | ' '     |\r\n\r\n|------|---------|--------|---------|\r\n\r\n| \u81fa\u7063 | T\u00e2i-u\u00e2n | T\u00e2iu\u00e2n | T\u00e2i u\u00e2n |\r\n\r\n\r\n\r\n\r\n\r\n#### Sandhi\r\n\r\n\r\n\r\n`sandhi` Boolean - applies the [sandhi rules of Taiwanese Hokkien][sandhi-wiki] to syllables of a single word.\r\n\r\n\r\n\r\nDefault value depends on the chosen `system`:\r\n\r\n\r\n\r\n* `True` - for `Tongiong`\r\n\r\n* `False` - for `Tailo`, `POJ`, `Zhuyin`, `TLPA`, `Pingyim`\r\n\r\n\r\n\r\n| text     | False        | True         |\r\n\r\n|----------|--------------|--------------|\r\n\r\n| \u80b2\u56e1\u4ed4\u6b4c | Io-g\u00edn-\u00e1-kua | I\u014d-gin-a-kua |\r\n\r\n\r\n\r\nSandhi rules also change depending on the dialect chosen.\r\n\r\n\r\n\r\n| text | no sandhi | south   | north   |\r\n\r\n|------|-----------|---------|---------|\r\n\r\n| \u81fa\u7063 | T\u00e2i-u\u00e2n   | T\u0101i-u\u00e2n | T\u00e0i-u\u00e2n |\r\n\r\n\r\n\r\nNote that the function is different from real sandhi rules, where changes are applied to every single syllable of the sentence, not just single words.\r\n\r\n\r\n\r\n- **Taibun's sandhi rules**: Th\u00e1i-khong p\u012bng-i\u00fa, l\u00edn h\u00f3! L\u00edn tsi\u00e0-p\u00e1 bu\u0113?\r\n\r\n- **Actual sandhi rules**: Th\u00e1i-kh\u014dng p\u012bng-i\u00fa, lin h\u00f3! Lin tsi\u00e0-pa bu\u0113?\r\n\r\n\r\n\r\n\r\n\r\n#### Punctuation\r\n\r\n\r\n\r\n`punctuation` String\r\n\r\n\r\n\r\n* `format` (default) - converts Chinese-style punctuation to Latin-style punctuation and capitalises words at the beginning of each sentence.\r\n\r\n* `none` - preserves Chinese-style punctuation and doesn't capitalise words at the beginning of new sentences.\r\n\r\n\r\n\r\n| text | format | none |\r\n\r\n|-|-|-|\r\n\r\n| \u9019\u662f\u81fa\u5357\uff0c\u7c21\u7a31\u300c\u5357\u300d\uff08\u767d\u8a71\u5b57\uff1aT\u00e2i-l\u00e2m\uff1b\u6ce8\u97f3\u7b26\u865f\uff1a\u310a\u311e\u02ca \u310b\u3122\u02ca\uff0c\u570b\u8a9e\uff1aT\u00e1in\u00e1n\uff09\u3002 | Tse s\u012b T\u00e2i-l\u00e2m, k\u00e1n-tshing \"l\u00e2m\" (Pe\u030dh-u\u0113-j\u012b: T\u00e2i-l\u00e2m; ts\u00f9-im h\u00fb-h\u014d: \u310a\u311e\u02ca \u310b\u3122\u02ca, kok-g\u00ed: T\u00e1in\u00e1n). | tse s\u012b T\u00e2i-l\u00e2m\uff0ck\u00e1n-tshing\u300cl\u00e2m\u300d\uff08Pe\u030dh-u\u0113-j\u012b\uff1aT\u00e2i-l\u00e2m\uff1bts\u00f9-im h\u00fb-h\u014d\uff1a\u310a\u311e\u02ca \u310b\u3122\u02ca\uff0ckok-g\u00ed\uff1aT\u00e1in\u00e1n\uff09\u3002 |\r\n\r\n\r\n\r\n### Tokeniser\r\n\r\n\r\n\r\n`Tokeniser` class performs [NLTK wordpunct_tokenize][nltk-tokenize]-like tokenisation of a Taiwanese Hokkien sentence.\r\n\r\n\r\n\r\n```python\r\n\r\n# constructor\r\n\r\nt = Tokeniser()\r\n\r\n\r\n\r\n# tokenise Taiwanese Hokkien sentence\r\n\r\nt.tokenise(input)\r\n\r\n```\r\n\r\n\r\n\r\n\r\n\r\n<!-- EXAMPLE -->\r\n\r\n## Example\r\n\r\n\r\n\r\n```python\r\n\r\nfrom taibun import Converter\r\n\r\n\r\n\r\n# System\r\n\r\nc = Converter() # Tailo system default\r\n\r\nc.get('\u5148\u751f\u8b1b\uff0c\u5b78\u751f\u606c\u606c\u807d\u3002')\r\n\r\n>> Sian-sinn k\u00f3ng, ha\u030dk-sing ti\u0101m-ti\u0101m thiann.\r\n\r\n\r\n\r\nc = Converter(system='Zhuyin')\r\n\r\nc.get('\u5148\u751f\u8b1b\uff0c\u5b78\u751f\u606c\u606c\u807d\u3002')\r\n\r\n>> \u3112\u3127\u3122 \u3112\u31aa \u310d\u31b2\u02cb, \u310f\u311a\u310d \u3112\u3127\u3125 \u3109\u3127\u31b0\u02eb \u3109\u3127\u31b0\u02eb \u310a\u3127\u31a9.\r\n\r\n\r\n\r\n# Dialect\r\n\r\nc = Converter() # south dialect default\r\n\r\nc.get(\"\u6211\u6b32\u7528\u7bb8\u98df\u9b5a\")\r\n\r\n>> Gu\u00e1 beh \u012bng t\u012b tsia\u030dh h\u00ee\r\n\r\n\r\n\r\nc = Converter(dialect='north')\r\n\r\nc.get(\"\u6211\u6b32\u7528\u7bb8\u98df\u9b5a\")\r\n\r\n>> Gu\u00e1 bueh \u012bng t\u016b tsia\u030dh h\u00fb\r\n\r\n\r\n\r\n# Format\r\n\r\nc = Converter() # for Tailo, mark by default\r\n\r\nc.get(\"\u751f\u65e5\u5feb\u6a02\")\r\n\r\n>> Senn-ji\u030dt khu\u00e0i-lo\u030dk\r\n\r\n\r\n\r\nc = Converter(format='number')\r\n\r\nc.get(\"\u751f\u65e5\u5feb\u6a02\")\r\n\r\n>> Senn1-jit8 khuai3-lok8\r\n\r\n\r\n\r\nc = Converter(format='strip')\r\n\r\nc.get(\"\u751f\u65e5\u5feb\u6a02\")\r\n\r\n>> Senn-jit khuai-lok\r\n\r\n\r\n\r\n# Delimiter\r\n\r\nc = Converter(delimiter='')\r\n\r\nc.get(\"\u5148\u751f\u8b1b\uff0c\u5b78\u751f\u606c\u606c\u807d\u3002\")\r\n\r\n>> Siansinn k\u00f3ng, ha\u030dksing ti\u0101mti\u0101m thiann.\r\n\r\n\r\n\r\nc = Converter(system='Pingyim', delimiter='-')\r\n\r\nc.get(\"\u5148\u751f\u8b1b\uff0c\u5b78\u751f\u606c\u606c\u807d\u3002\")\r\n\r\n>> Si\u0101n-sn\u012b g\u01d2ng, h\u00e1g-s\u012bng di\u00e2m-di\u00e2m tin\u0101.\r\n\r\n\r\n\r\n# Sandhi\r\n\r\nc = Converter() # for Tailo, sandhi False by default\r\n\r\nc.get(\"\u5357\u8ff4\u9435\u8def\")\r\n\r\n>> L\u00e2m-hu\u00ea-thih-l\u014do\r\n\r\n\r\n\r\nc = Converter(sandhi=True)\r\n\r\nc.get(\"\u5357\u8ff4\u9435\u8def\")\r\n\r\n>> L\u0101m-hu\u0113-th\u00ed-l\u014do\r\n\r\n\r\n\r\n# Punctuation\r\n\r\nc = Converter() # format punctuation default\r\n\r\nc.get(\"\u592a\u7a7a\u670b\u53cb\uff0c\u6041\u597d\uff01\u6041\u98df\u98fd\u672a\uff1f\")\r\n\r\n>> Th\u00e0i-khong p\u00eeng-i\u00fa, l\u00edn h\u00f3! L\u00edn tsia\u030dh-p\u00e1 bu\u0113?\r\n\r\n\r\n\r\nc = Converter(punctuation='none')\r\n\r\nc.get(\"\u592a\u7a7a\u670b\u53cb\uff0c\u6041\u597d\uff01\u6041\u98df\u98fd\u672a\uff1f\")\r\n\r\n>> th\u00e0i-khong p\u00eeng-i\u00fa\uff0cl\u00edn h\u00f3\uff01l\u00edn tsia\u030dh-p\u00e1 bu\u0113\uff1f\r\n\r\n\r\n\r\n\r\n\r\n# Tokeniser\r\n\r\nt = Tokeniser()\r\n\r\nt.tokenise(\"\u592a\u7a7a\u670b\u53cb\uff0c\u6041\u597d\uff01\u6041\u98df\u98fd\u672a\uff1f\")\r\n\r\n>> ['\u592a\u7a7a', '\u670b\u53cb', '\uff0c', '\u6041', '\u597d', '\uff01', '\u6041', '\u98df\u98fd', '\u672a', '\uff1f']\r\n\r\n```\r\n\r\n\r\n\r\n\r\n\r\n<!-- DATA -->\r\n\r\n## Data\r\n\r\n\r\n\r\n- [Dictionary of Frequently-Used Taiwan Minnan][dictionary] (via [moedict-data-twblg][dictionary-via])\r\n\r\n\r\n\r\n\r\n\r\n<!-- LICENCE -->\r\n\r\n## Licence\r\n\r\n\r\n\r\nBecause Taibun is MIT-licensed, any developer can essentially do whatever they want with it as long as they include the original copyright and licence notice in any copies of the source code. Note, that the data used by the package is licensed under a different copyright.\r\n\r\n\r\n\r\nThe data is licensed under [CC BY-ND 3.0 TW][distionary-cc]\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n<!-- MARKDOWN LINKS -->\r\n\r\n[contributors-badge]: https://img.shields.io/github/contributors/andreihar/taibun?style=for-the-badge\r\n\r\n[contributors]: #usage\r\n\r\n[release-badge]: https://img.shields.io/github/v/release/andreihar/taibun?color=38618c&style=for-the-badge\r\n\r\n[release]: https://github.com/andreihar/taibun/releases\r\n\r\n[licence-badge]: https://img.shields.io/github/license/andreihar/taibun.svg?color=000000&style=for-the-badge\r\n\r\n[licence]: LICENSE\r\n\r\n[linkedin-badge]: https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white\r\n\r\n[linkedin]: https://www.linkedin.com/in/andrei-harbachov/\r\n\r\n[github-badge]: https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white\r\n\r\n[github]: https://github.com/andreihar/taibun\r\n\r\n\r\n\r\n[pypi]: https://pypi.org/project/taibun\r\n\r\n[bug]: https://github.com/andreihar/taibun/issues\r\n\r\n[dictionary]: https://twblg.dict.edu.tw/holodict_new/\r\n\r\n[dictionary-via]: https://github.com/g0v/moedict-data-twblg\r\n\r\n[distionary-cc]: https://creativecommons.org/licenses/by-nd/3.0/tw/deed.en\r\n\r\n\r\n\r\n[tailo-wiki]: https://en.wikipedia.org/wiki/T%C3%A2i-u%C3%A2n_L%C3%B4-m%C3%A1-j%C4%AB_Phing-im_Hong-%C3%A0n\r\n\r\n[poj-wiki]: https://en.wikipedia.org/wiki/Pe%CC%8Dh-%C5%8De-j%C4%AB\r\n\r\n[zhuyin-wiki]: https://en.wikipedia.org/wiki/Taiwanese_Phonetic_Symbols\r\n\r\n[tlpa-wiki]: https://en.wikipedia.org/wiki/Taiwanese_Language_Phonetic_Alphabet\r\n\r\n[pingyim-wiki]: https://en.wikipedia.org/wiki/Bb%C3%A1nl%C3%A1m_p%C3%ACngy%C4%ABm\r\n\r\n[tongiong-wiki]: https://en.wikipedia.org/wiki/Da%C4%AB-gh%C3%AE_t%C5%8Dng-i%C5%8Dng_p%C4%ABng-im\r\n\r\n[zhangzhou-wiki]: https://en.wikipedia.org/wiki/Zhangzhou_dialects\r\n\r\n[quanzhou-wiki]: https://en.wikipedia.org/wiki/Quanzhou_dialects\r\n\r\n[nltk-tokenize]: https://nltk.org/api/nltk.tokenize.html\r\n\r\n[sandhi-wiki]: https://en.wikipedia.org/wiki/Taiwanese_Hokkien#Tone%20sandhi:~:text=thng%E2%9F%A9%20(%22soup%22).-,Tone%20sandhi,-%5Bedit%5D\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Convert Chinese characters to Taiwanese",
    "version": "0.0.5",
    "project_urls": {
        "Homepage": "https://github.com/andreihar/taibun"
    },
    "split_keywords": [
        "python",
        "taiwan",
        "taiwanese",
        "taigi",
        "hokkien",
        "romanization",
        "transliteration"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "63b17ea9252034daca73bbc9882e3e6a2300963c5b6a152025adb11b931a2dcd",
                "md5": "d5116ae6621d2094ced80da0b8b6750f",
                "sha256": "3eb7fe29b82b3cee845fc4a866bbe4dd4f0d7d1d4588efd6124a98696ee9d3fb"
            },
            "downloads": -1,
            "filename": "example990420-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d5116ae6621d2094ced80da0b8b6750f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 226460,
            "upload_time": "2023-07-04T08:30:39",
            "upload_time_iso_8601": "2023-07-04T08:30:39.498062Z",
            "url": "https://files.pythonhosted.org/packages/63/b1/7ea9252034daca73bbc9882e3e6a2300963c5b6a152025adb11b931a2dcd/example990420-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8eab4284712faafb863b729f1f9ca3ec836eb59b9f4a8f49034ac502dd27670d",
                "md5": "b945040a6022688200b9d7a78d67b437",
                "sha256": "77a313cd7ece452d36fc2b75807415c7709288d73cb675cee169da72e6e3c2a1"
            },
            "downloads": -1,
            "filename": "example990420-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "b945040a6022688200b9d7a78d67b437",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 226342,
            "upload_time": "2023-07-04T08:30:41",
            "upload_time_iso_8601": "2023-07-04T08:30:41.263703Z",
            "url": "https://files.pythonhosted.org/packages/8e/ab/4284712faafb863b729f1f9ca3ec836eb59b9f4a8f49034ac502dd27670d/example990420-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-04 08:30:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "andreihar",
    "github_project": "taibun",
    "github_not_found": true,
    "lcname": "example990420"
}
        
Elapsed time: 0.08833s