cutlet


Namecutlet JSON
Version 0.4.0 PyPI version JSON
download
home_pagehttps://github.com/polm/cutlet
SummaryRomaji converter
upload_time2024-03-24 08:07:33
maintainerNone
docs_urlNone
authorPaul O'Leary McCann
requires_python>=3.5
licenseNone
keywords
VCS
bugtrack_url
requirements attrs fugashi hypothesis iniconfig jaconv mojimoji packaging pluggy pytest sortedcontainers unidic-lite
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Open in Streamlit](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://polm-cutlet-demo-demo-0tur8v.streamlit.app/)
[![Current PyPI packages](https://badge.fury.io/py/cutlet.svg)](https://pypi.org/project/cutlet/)

# cutlet

<img src="https://github.com/polm/cutlet/raw/master/cutlet.png" width=125 height=125 alt="cutlet by Irasutoya" />

Cutlet is a tool to convert Japanese to romaji. Check out the [interactive demo][demo]! Also see the [docs](https://polm.github.io/cutlet/cutlet.html) and the [original blog post](https://www.dampfkraft.com/nlp/cutlet-python-romaji-converter.html). 

[demo]: https://polm-cutlet-demo-demo-0tur8v.streamlit.app/

**issueを英語で書く必要はありません。**

Features:

- support for [Modified Hepburn](https://en.wikipedia.org/wiki/Hepburn_romanization), [Kunreisiki](https://en.wikipedia.org/wiki/Kunrei-shiki_romanization), [Nihonsiki](https://en.wikipedia.org/wiki/Nihon-shiki_romanization) systems
- custom overrides for individual mappings
- custom overrides for specific words
- built in exceptions list (Tokyo, Osaka, etc.)
- uses foreign spelling when available in UniDic
- proper nouns are capitalized
- slug mode for url generation

Things not supported:

- traditional Hepburn n-to-m: Shimbashi
- macrons or circumflexes: Tōkyō, Tôkyô
- passport Hepburn: Satoh (but you can use an exception)
- hyphenating words
- Traditional Hepburn in general is not supported

Internally, cutlet uses [fugashi](https://github.com/polm/fugashi), so you can
use the same dictionary you use for normal tokenization.

## Installation

Cutlet can be installed through pip as usual.

    pip install cutlet

Note that if you don't have a MeCab dictionary installed you'll also have to
install one. If you're just getting started
[unidic-lite](https://github.com/polm/unidic-lite) is a good choice.

    pip install unidic-lite

## Usage

A command-line script is included for quick testing. Just use `cutlet` and each
line of stdin will be treated as a sentence. You can specify the system to use
(`hepburn`, `kunrei`, `nippon`, or `nihon`) as the first argument.

    $ cutlet
    ローマ字変換プログラム作ってみた。
    Roma ji henkan program tsukutte mita.

In code:

```python
import cutlet
katsu = cutlet.Cutlet()
katsu.romaji("カツカレーは美味しい")
# => 'Cutlet curry wa oishii'

# you can print a slug suitable for urls
katsu.slug("カツカレーは美味しい")
# => 'cutlet-curry-wa-oishii'

# You can disable using foreign spelling too
katsu.use_foreign_spelling = False
katsu.romaji("カツカレーは美味しい")
# => 'Katsu karee wa oishii'

# kunreisiki, nihonsiki work too
katu = cutlet.Cutlet('kunrei')
katu.romaji("富士山")
# => 'Huzi yama'

# comparison
nkatu = cutlet.Cutlet('nihon')

sent = "彼女は王への手紙を読み上げた。"
katsu.romaji(sent)
# => 'Kanojo wa ou e no tegami wo yomiageta.'
katu.romaji(sent)
# => 'Kanozyo wa ou e no tegami o yomiageta.'
nkatu.romaji(sent)
# => 'Kanozyo ha ou he no tegami wo yomiageta.'
```

## Alternatives

- [kakasi](http://kakasi.namazu.org/index.html.ja): Historically important, but not updated since 2014. 
- [pykakasi](https://github.com/miurahr/pykakasi): self contained, it does segmentation on its own and uses its own dictionary.
- [kuroshiro](https://github.com/hexenq/kuroshiro): Javascript based.
- [kana](https://github.com/gojp/kana): Go based.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/polm/cutlet",
    "name": "cutlet",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.5",
    "maintainer_email": null,
    "keywords": null,
    "author": "Paul O'Leary McCann",
    "author_email": "polm@dampfkraft.com",
    "download_url": "https://files.pythonhosted.org/packages/61/8c/53a5937d102b6be60ace23565cc845e0c0f91f053c42584e495bb817a0c0/cutlet-0.4.0.tar.gz",
    "platform": null,
    "description": "[![Open in Streamlit](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://polm-cutlet-demo-demo-0tur8v.streamlit.app/)\n[![Current PyPI packages](https://badge.fury.io/py/cutlet.svg)](https://pypi.org/project/cutlet/)\n\n# cutlet\n\n<img src=\"https://github.com/polm/cutlet/raw/master/cutlet.png\" width=125 height=125 alt=\"cutlet by Irasutoya\" />\n\nCutlet is a tool to convert Japanese to romaji. Check out the [interactive demo][demo]! Also see the [docs](https://polm.github.io/cutlet/cutlet.html) and the [original blog post](https://www.dampfkraft.com/nlp/cutlet-python-romaji-converter.html). \n\n[demo]: https://polm-cutlet-demo-demo-0tur8v.streamlit.app/\n\n**issue\u3092\u82f1\u8a9e\u3067\u66f8\u304f\u5fc5\u8981\u306f\u3042\u308a\u307e\u305b\u3093\u3002**\n\nFeatures:\n\n- support for [Modified Hepburn](https://en.wikipedia.org/wiki/Hepburn_romanization), [Kunreisiki](https://en.wikipedia.org/wiki/Kunrei-shiki_romanization), [Nihonsiki](https://en.wikipedia.org/wiki/Nihon-shiki_romanization) systems\n- custom overrides for individual mappings\n- custom overrides for specific words\n- built in exceptions list (Tokyo, Osaka, etc.)\n- uses foreign spelling when available in UniDic\n- proper nouns are capitalized\n- slug mode for url generation\n\nThings not supported:\n\n- traditional Hepburn n-to-m: Shimbashi\n- macrons or circumflexes: T\u014dky\u014d, T\u00f4ky\u00f4\n- passport Hepburn: Satoh (but you can use an exception)\n- hyphenating words\n- Traditional Hepburn in general is not supported\n\nInternally, cutlet uses [fugashi](https://github.com/polm/fugashi), so you can\nuse the same dictionary you use for normal tokenization.\n\n## Installation\n\nCutlet can be installed through pip as usual.\n\n    pip install cutlet\n\nNote that if you don't have a MeCab dictionary installed you'll also have to\ninstall one. If you're just getting started\n[unidic-lite](https://github.com/polm/unidic-lite) is a good choice.\n\n    pip install unidic-lite\n\n## Usage\n\nA command-line script is included for quick testing. Just use `cutlet` and each\nline of stdin will be treated as a sentence. You can specify the system to use\n(`hepburn`, `kunrei`, `nippon`, or `nihon`) as the first argument.\n\n    $ cutlet\n    \u30ed\u30fc\u30de\u5b57\u5909\u63db\u30d7\u30ed\u30b0\u30e9\u30e0\u4f5c\u3063\u3066\u307f\u305f\u3002\n    Roma ji henkan program tsukutte mita.\n\nIn code:\n\n```python\nimport cutlet\nkatsu = cutlet.Cutlet()\nkatsu.romaji(\"\u30ab\u30c4\u30ab\u30ec\u30fc\u306f\u7f8e\u5473\u3057\u3044\")\n# => 'Cutlet curry wa oishii'\n\n# you can print a slug suitable for urls\nkatsu.slug(\"\u30ab\u30c4\u30ab\u30ec\u30fc\u306f\u7f8e\u5473\u3057\u3044\")\n# => 'cutlet-curry-wa-oishii'\n\n# You can disable using foreign spelling too\nkatsu.use_foreign_spelling = False\nkatsu.romaji(\"\u30ab\u30c4\u30ab\u30ec\u30fc\u306f\u7f8e\u5473\u3057\u3044\")\n# => 'Katsu karee wa oishii'\n\n# kunreisiki, nihonsiki work too\nkatu = cutlet.Cutlet('kunrei')\nkatu.romaji(\"\u5bcc\u58eb\u5c71\")\n# => 'Huzi yama'\n\n# comparison\nnkatu = cutlet.Cutlet('nihon')\n\nsent = \"\u5f7c\u5973\u306f\u738b\u3078\u306e\u624b\u7d19\u3092\u8aad\u307f\u4e0a\u3052\u305f\u3002\"\nkatsu.romaji(sent)\n# => 'Kanojo wa ou e no tegami wo yomiageta.'\nkatu.romaji(sent)\n# => 'Kanozyo wa ou e no tegami o yomiageta.'\nnkatu.romaji(sent)\n# => 'Kanozyo ha ou he no tegami wo yomiageta.'\n```\n\n## Alternatives\n\n- [kakasi](http://kakasi.namazu.org/index.html.ja): Historically important, but not updated since 2014. \n- [pykakasi](https://github.com/miurahr/pykakasi): self contained, it does segmentation on its own and uses its own dictionary.\n- [kuroshiro](https://github.com/hexenq/kuroshiro): Javascript based.\n- [kana](https://github.com/gojp/kana): Go based.\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Romaji converter",
    "version": "0.4.0",
    "project_urls": {
        "Homepage": "https://github.com/polm/cutlet"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "618c53a5937d102b6be60ace23565cc845e0c0f91f053c42584e495bb817a0c0",
                "md5": "05b6bafcf558259e0185f1947a3b177e",
                "sha256": "9abc50b2c36aabc0c863b7a0fd6a3a651dc372e056f0914d83e76ac2612f3626"
            },
            "downloads": -1,
            "filename": "cutlet-0.4.0.tar.gz",
            "has_sig": false,
            "md5_digest": "05b6bafcf558259e0185f1947a3b177e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.5",
            "size": 412284,
            "upload_time": "2024-03-24T08:07:33",
            "upload_time_iso_8601": "2024-03-24T08:07:33.616195Z",
            "url": "https://files.pythonhosted.org/packages/61/8c/53a5937d102b6be60ace23565cc845e0c0f91f053c42584e495bb817a0c0/cutlet-0.4.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-24 08:07:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "polm",
    "github_project": "cutlet",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "attrs",
            "specs": [
                [
                    "==",
                    "23.1.0"
                ]
            ]
        },
        {
            "name": "fugashi",
            "specs": [
                [
                    "==",
                    "1.2.1"
                ]
            ]
        },
        {
            "name": "hypothesis",
            "specs": [
                [
                    "==",
                    "6.78.2"
                ]
            ]
        },
        {
            "name": "iniconfig",
            "specs": [
                [
                    "==",
                    "2.0.0"
                ]
            ]
        },
        {
            "name": "jaconv",
            "specs": [
                [
                    "==",
                    "0.3.4"
                ]
            ]
        },
        {
            "name": "mojimoji",
            "specs": [
                [
                    "==",
                    "0.0.12"
                ]
            ]
        },
        {
            "name": "packaging",
            "specs": [
                [
                    "==",
                    "23.1"
                ]
            ]
        },
        {
            "name": "pluggy",
            "specs": [
                [
                    "==",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    "==",
                    "7.3.2"
                ]
            ]
        },
        {
            "name": "sortedcontainers",
            "specs": [
                [
                    "==",
                    "2.4.0"
                ]
            ]
        },
        {
            "name": "unidic-lite",
            "specs": [
                [
                    "==",
                    "1.0.8"
                ]
            ]
        }
    ],
    "lcname": "cutlet"
}
        
Elapsed time: 1.24800s