[![Open in Streamlit](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://polm-cutlet-demo-demo-0tur8v.streamlit.app/)
[![Current PyPI packages](https://badge.fury.io/py/cutlet.svg)](https://pypi.org/project/cutlet/)
# cutlet
<img src="https://github.com/polm/cutlet/raw/master/cutlet.png" width=125 height=125 alt="cutlet by Irasutoya" />
Cutlet is a tool to convert Japanese to romaji. Check out the [interactive demo][demo]! Also see the [docs](https://polm.github.io/cutlet/cutlet.html) and the [original blog post](https://www.dampfkraft.com/nlp/cutlet-python-romaji-converter.html).
[demo]: https://polm-cutlet-demo-demo-0tur8v.streamlit.app/
**issueを英語で書く必要はありません。**
Features:
- support for [Modified Hepburn](https://en.wikipedia.org/wiki/Hepburn_romanization), [Kunreisiki](https://en.wikipedia.org/wiki/Kunrei-shiki_romanization), [Nihonsiki](https://en.wikipedia.org/wiki/Nihon-shiki_romanization) systems
- custom overrides for individual mappings
- custom overrides for specific words
- built in exceptions list (Tokyo, Osaka, etc.)
- uses foreign spelling when available in UniDic
- proper nouns are capitalized
- slug mode for url generation
Things not supported:
- traditional Hepburn n-to-m: Shimbashi
- macrons or circumflexes: Tōkyō, Tôkyô
- passport Hepburn: Satoh (but you can use an exception)
- hyphenating words
- Traditional Hepburn in general is not supported
Internally, cutlet uses [fugashi](https://github.com/polm/fugashi), so you can
use the same dictionary you use for normal tokenization.
## Installation
Cutlet can be installed through pip as usual.
pip install cutlet
Note that if you don't have a MeCab dictionary installed you'll also have to
install one. If you're just getting started
[unidic-lite](https://github.com/polm/unidic-lite) is a good choice.
pip install unidic-lite
## Usage
A command-line script is included for quick testing. Just use `cutlet` and each
line of stdin will be treated as a sentence. You can specify the system to use
(`hepburn`, `kunrei`, `nippon`, or `nihon`) as the first argument.
$ cutlet
ローマ字変換プログラム作ってみた。
Roma ji henkan program tsukutte mita.
In code:
```python
import cutlet
katsu = cutlet.Cutlet()
katsu.romaji("カツカレーは美味しい")
# => 'Cutlet curry wa oishii'
# you can print a slug suitable for urls
katsu.slug("カツカレーは美味しい")
# => 'cutlet-curry-wa-oishii'
# You can disable using foreign spelling too
katsu.use_foreign_spelling = False
katsu.romaji("カツカレーは美味しい")
# => 'Katsu karee wa oishii'
# kunreisiki, nihonsiki work too
katu = cutlet.Cutlet('kunrei')
katu.romaji("富士山")
# => 'Huzi yama'
# comparison
nkatu = cutlet.Cutlet('nihon')
sent = "彼女は王への手紙を読み上げた。"
katsu.romaji(sent)
# => 'Kanojo wa ou e no tegami wo yomiageta.'
katu.romaji(sent)
# => 'Kanozyo wa ou e no tegami o yomiageta.'
nkatu.romaji(sent)
# => 'Kanozyo ha ou he no tegami wo yomiageta.'
```
## Alternatives
- [kakasi](http://kakasi.namazu.org/index.html.ja): Historically important, but not updated since 2014.
- [pykakasi](https://github.com/miurahr/pykakasi): self contained, it does segmentation on its own and uses its own dictionary.
- [kuroshiro](https://github.com/hexenq/kuroshiro): Javascript based.
- [kana](https://github.com/gojp/kana): Go based.
Raw data
{
"_id": null,
"home_page": "https://github.com/polm/cutlet",
"name": "cutlet",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.5",
"maintainer_email": null,
"keywords": null,
"author": "Paul O'Leary McCann",
"author_email": "polm@dampfkraft.com",
"download_url": "https://files.pythonhosted.org/packages/61/8c/53a5937d102b6be60ace23565cc845e0c0f91f053c42584e495bb817a0c0/cutlet-0.4.0.tar.gz",
"platform": null,
"description": "[![Open in Streamlit](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://polm-cutlet-demo-demo-0tur8v.streamlit.app/)\n[![Current PyPI packages](https://badge.fury.io/py/cutlet.svg)](https://pypi.org/project/cutlet/)\n\n# cutlet\n\n<img src=\"https://github.com/polm/cutlet/raw/master/cutlet.png\" width=125 height=125 alt=\"cutlet by Irasutoya\" />\n\nCutlet is a tool to convert Japanese to romaji. Check out the [interactive demo][demo]! Also see the [docs](https://polm.github.io/cutlet/cutlet.html) and the [original blog post](https://www.dampfkraft.com/nlp/cutlet-python-romaji-converter.html). \n\n[demo]: https://polm-cutlet-demo-demo-0tur8v.streamlit.app/\n\n**issue\u3092\u82f1\u8a9e\u3067\u66f8\u304f\u5fc5\u8981\u306f\u3042\u308a\u307e\u305b\u3093\u3002**\n\nFeatures:\n\n- support for [Modified Hepburn](https://en.wikipedia.org/wiki/Hepburn_romanization), [Kunreisiki](https://en.wikipedia.org/wiki/Kunrei-shiki_romanization), [Nihonsiki](https://en.wikipedia.org/wiki/Nihon-shiki_romanization) systems\n- custom overrides for individual mappings\n- custom overrides for specific words\n- built in exceptions list (Tokyo, Osaka, etc.)\n- uses foreign spelling when available in UniDic\n- proper nouns are capitalized\n- slug mode for url generation\n\nThings not supported:\n\n- traditional Hepburn n-to-m: Shimbashi\n- macrons or circumflexes: T\u014dky\u014d, T\u00f4ky\u00f4\n- passport Hepburn: Satoh (but you can use an exception)\n- hyphenating words\n- Traditional Hepburn in general is not supported\n\nInternally, cutlet uses [fugashi](https://github.com/polm/fugashi), so you can\nuse the same dictionary you use for normal tokenization.\n\n## Installation\n\nCutlet can be installed through pip as usual.\n\n pip install cutlet\n\nNote that if you don't have a MeCab dictionary installed you'll also have to\ninstall one. If you're just getting started\n[unidic-lite](https://github.com/polm/unidic-lite) is a good choice.\n\n pip install unidic-lite\n\n## Usage\n\nA command-line script is included for quick testing. Just use `cutlet` and each\nline of stdin will be treated as a sentence. You can specify the system to use\n(`hepburn`, `kunrei`, `nippon`, or `nihon`) as the first argument.\n\n $ cutlet\n \u30ed\u30fc\u30de\u5b57\u5909\u63db\u30d7\u30ed\u30b0\u30e9\u30e0\u4f5c\u3063\u3066\u307f\u305f\u3002\n Roma ji henkan program tsukutte mita.\n\nIn code:\n\n```python\nimport cutlet\nkatsu = cutlet.Cutlet()\nkatsu.romaji(\"\u30ab\u30c4\u30ab\u30ec\u30fc\u306f\u7f8e\u5473\u3057\u3044\")\n# => 'Cutlet curry wa oishii'\n\n# you can print a slug suitable for urls\nkatsu.slug(\"\u30ab\u30c4\u30ab\u30ec\u30fc\u306f\u7f8e\u5473\u3057\u3044\")\n# => 'cutlet-curry-wa-oishii'\n\n# You can disable using foreign spelling too\nkatsu.use_foreign_spelling = False\nkatsu.romaji(\"\u30ab\u30c4\u30ab\u30ec\u30fc\u306f\u7f8e\u5473\u3057\u3044\")\n# => 'Katsu karee wa oishii'\n\n# kunreisiki, nihonsiki work too\nkatu = cutlet.Cutlet('kunrei')\nkatu.romaji(\"\u5bcc\u58eb\u5c71\")\n# => 'Huzi yama'\n\n# comparison\nnkatu = cutlet.Cutlet('nihon')\n\nsent = \"\u5f7c\u5973\u306f\u738b\u3078\u306e\u624b\u7d19\u3092\u8aad\u307f\u4e0a\u3052\u305f\u3002\"\nkatsu.romaji(sent)\n# => 'Kanojo wa ou e no tegami wo yomiageta.'\nkatu.romaji(sent)\n# => 'Kanozyo wa ou e no tegami o yomiageta.'\nnkatu.romaji(sent)\n# => 'Kanozyo ha ou he no tegami wo yomiageta.'\n```\n\n## Alternatives\n\n- [kakasi](http://kakasi.namazu.org/index.html.ja): Historically important, but not updated since 2014. \n- [pykakasi](https://github.com/miurahr/pykakasi): self contained, it does segmentation on its own and uses its own dictionary.\n- [kuroshiro](https://github.com/hexenq/kuroshiro): Javascript based.\n- [kana](https://github.com/gojp/kana): Go based.\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Romaji converter",
"version": "0.4.0",
"project_urls": {
"Homepage": "https://github.com/polm/cutlet"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "618c53a5937d102b6be60ace23565cc845e0c0f91f053c42584e495bb817a0c0",
"md5": "05b6bafcf558259e0185f1947a3b177e",
"sha256": "9abc50b2c36aabc0c863b7a0fd6a3a651dc372e056f0914d83e76ac2612f3626"
},
"downloads": -1,
"filename": "cutlet-0.4.0.tar.gz",
"has_sig": false,
"md5_digest": "05b6bafcf558259e0185f1947a3b177e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5",
"size": 412284,
"upload_time": "2024-03-24T08:07:33",
"upload_time_iso_8601": "2024-03-24T08:07:33.616195Z",
"url": "https://files.pythonhosted.org/packages/61/8c/53a5937d102b6be60ace23565cc845e0c0f91f053c42584e495bb817a0c0/cutlet-0.4.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-03-24 08:07:33",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "polm",
"github_project": "cutlet",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "attrs",
"specs": [
[
"==",
"23.1.0"
]
]
},
{
"name": "fugashi",
"specs": [
[
"==",
"1.2.1"
]
]
},
{
"name": "hypothesis",
"specs": [
[
"==",
"6.78.2"
]
]
},
{
"name": "iniconfig",
"specs": [
[
"==",
"2.0.0"
]
]
},
{
"name": "jaconv",
"specs": [
[
"==",
"0.3.4"
]
]
},
{
"name": "mojimoji",
"specs": [
[
"==",
"0.0.12"
]
]
},
{
"name": "packaging",
"specs": [
[
"==",
"23.1"
]
]
},
{
"name": "pluggy",
"specs": [
[
"==",
"1.0.0"
]
]
},
{
"name": "pytest",
"specs": [
[
"==",
"7.3.2"
]
]
},
{
"name": "sortedcontainers",
"specs": [
[
"==",
"2.4.0"
]
]
},
{
"name": "unidic-lite",
"specs": [
[
"==",
"1.0.8"
]
]
}
],
"lcname": "cutlet"
}