pykakasi


Namepykakasi JSON
Version 2.2.1 PyPI version JSON
download
home_pagehttps://github.com/miurahr/pykakasi
SummaryKana kanji simple inversion library
upload_time2021-07-10 01:18:24
maintainer
docs_urlNone
authorHiroshi Miura
requires_python
licenseGPLv3
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ========
Pykakasi
========

Overview
========

.. image:: https://readthedocs.org/projects/pykakasi/badge/?version=latest
   :target: https://pykakasi.readthedocs.io/en/latest/?badge=latest
   :alt: Documentation Status

.. image:: https://badge.fury.io/py/pykakasi.png
   :target: http://badge.fury.io/py/Pykakasi
   :alt: PyPI version

.. image:: https://github.com/miurahr/pykakasi/workflows/Run%20Tox%20tests/badge.svg
   :target: https://github.com/miurahr/pykakasi/actions?query=workflow%3A%22Run+Tox+tests%22
   :alt: Run Tox tests

.. image:: https://dev.azure.com/miurahr/github/_apis/build/status/miurahr.pykakasi?branchName=master
   :target: https://dev.azure.com/miurahr/github/_build?definitionId=13&branchName=master
   :alt: Azure-Pipelines

.. image:: https://coveralls.io/repos/miurahr/pykakasi/badge.svg?branch=master
   :target: https://coveralls.io/r/miurahr/pykakasi?branch=master
   :alt: Coverage status


``pykakasi`` is a Python Natural Language Processing (NLP) library to transliterate *hiragana*, *katakana* and *kanji* (Japanese text) into *rōmaji* (Latin/Roman alphabet). It can handle characters in NFC form.

Its algorithms are based on the `kakasi`_ library, which is written in C.

* Install (from `PyPI`_): ``pip install pykakasi``
* `Documentation available on readthedocs`_

.. _`PyPI`: https://pypi.org/project/pykakasi/
.. _`kakasi`: http://kakasi.namazu.org/
.. _`Documentation available on readthedocs`: https://pykakasi.readthedocs.io/en/latest/index.html


Supported python versions
=========================

* pykakasi supports python 3.6, 3.7, 3.8, 3.9, and pypy3

Usage
=====

Transliterate Japanese text to kana, hiragana and romaji:

.. code-block:: python

    import pykakasi
    kks = pykakasi.kakasi()
    text = "かな漢字"
    result = kks.convert(text)
    for item in result:
        print("{}: kana '{}', hiragana '{}', romaji: '{}'".format(item['orig'], item['kana'], item['hira'], item['hepburn']))

    かな: kana 'カナ', hiragana: 'かな', romaji: 'kana'
    漢字: kana 'カンジ', hiragana: 'かんじ', romaji: 'kanji'


Here is an example that output as similar with furigana mode.

.. code-block:: python

    import pykakasi
    kks = pykakasi.kakasi()
    text = "かな漢字交じり文"
    result = kks.convert(text)
    for item in result:
        print("{}[{}] ".format(item['orig'], item['hepburn'].capitalize()), end='')
    print()

    かな[Kana] 漢字[Kanji] 交じり[Majiri] 文[Bun]


Benchmark result
================

You can see benchmark result on various versions and platforms at https://github.com/miurahr/pykakasi/issues/123


Copyright and License
=====================

PyKakasi::
    Copyright (C) 2010-2021 Hiroshi Miura and contributors(see AUTHORS)

KAKASI Dictionary::
    Copyright (C) 2010-2021 Hiroshi Miura and contributors(see AUTHORS)

    Copyright (C) 1992 1993 1994 Hironobu Takahashi, Masahiko Sato,
    Yukiyoshi Kameyama, Miki Inooka, Akihiko Sasaki, Dai Ando, Junichi Okukawa,
    Katsushi Sato and Nobuhiro Yamagishi

UniDic::
    Copyright (c) 2011-2021, The UniDic Consortium

    All rights reserved.

    Unidic is released under any of the GPL2, the LGPL2.1,
    or the 3-clause BSD License. (See src/data/unidic/BSD.txt)
    PyKakasi relicenses a part of the unidic with GPL3+.

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

==================
PyKakasi ChangeLog
==================

All notable changes to this project will be documented in this file.

Unreleased_
===========

Added
-----

Changed
-------

Fixed
-----

Deprecated
----------

Removed
-------

Security
--------

v2.2.1_ (10, July 2021)
=======================

Fixed
-----
* Add Zenkaku-Question(\uFF1F) and other Zenkaku marks as endmark (#146)


v2.2.0_ (22, June 2021)
=======================

Added
-----
* dictionary: add noun and adjectives from UniDic(#140)

Changed
-------
* Refactoring main loop logics for convert()(#144)

Fixed
-----
* Fix segmentation (wakati) when combination with Katakana and Hiragana(#142)

v2.1.1_ (16, May 2021)
======================

Added
-----
* Provide Kakasi.normalize(text) class method
* Add unidic data into data (not used yet), and add parse utility.

Fixed
-----
* Put type hint stub into package
* Copyright notifications

Changed
-------
* Expand all cletter into dictionary (#139)
* Change primary kanwadict index from str to int
* test: gather all legacy test into test_pykakasi_legacy.py file.


v2.1.0_ (6, May 2021)
=====================

Added
-----
* Deprecation warning when using old api(#124)
* Add type hint file(pyi) (#124)
* Benchmark test codes(#122)

Changed
-------
* Cache internal results and improve performance about 30-40 times.(#128)
* Use standard pickle for database file(#128)
* Exceptions module is now `pykakasi`, not `pykakasi.exceptions`

Removed
-------
* Dependency for klepto(#128)


v2.0.8_ (4, May 2021)
=====================

Added
-----

* test: Benchmark and profiling (#122)

Changed
-------

* Performance: avoid ord() when checking long-mark, speed up about 6%
* Reformat code by black(#121)


v2.0.7_ (26, Feb. 2021)
=======================

Fixed
-----

* Infinite loop after running for a while,
  handle independent HW VOICED SOUND MARK (#115, #118)


v2.0.6_ (7, Feb. 2021)
======================

Fixed
-----

* Hiragana for Age countersa(#116,#117)


v2.0.5_ (5, Feb. 2021)
======================

Changed
-------

* CLI: use argparse for option parse(#113)

Fixed
-----

* Handle 思った、言った、行った properly.(#114)
* CI: fix coveralls error

Deprecated
----------

* CI: drop travis-ci test and badge


v2.0.4_ (26, Nov. 2020)
=======================

Fixed
-----

* CLI: Fix -v and -h option crash on python 3.7 and before (#108).

v2.0.3_ (25, Nov. 2020)
=======================

Fixed
-----

* CLI: Fix -v and -h option crash (#108).


v2.0.2_ (23, Jul. 2020)
=======================

Fixed
-----

* Fix convert() to handle Katakana correctly.(#103)


v2.0.1_ (23, Jul. 2020)
=======================

Changed
-------

* Update setup.py, setup.cfg, tox.ini(#102)


Fixed
-----

* Fix convert() misses last part of a text (#99, #100)
* Fix CI, coverage, and coveralls configurations(#101)


v2.0.0_ (31, May. 2020)
=======================


.. _Unreleased: https://github.com/miurahr/pykakasi/compare/v2.2.1...HEAD
.. _v2.2.1: https://github.com/miurahr/pykakasi/compare/v2.2.0...v2.2.1
.. _v2.2.0: https://github.com/miurahr/pykakasi/compare/v2.1.1...v2.2.0
.. _v2.1.1: https://github.com/miurahr/pykakasi/compare/v2.1.0...v2.1.1
.. _v2.1.0: https://github.com/miurahr/pykakasi/compare/v2.0.8...v2.1.0
.. _v2.0.8: https://github.com/miurahr/pykakasi/compare/v2.0.7...v2.0.8
.. _v2.0.7: https://github.com/miurahr/pykakasi/compare/v2.0.6...v2.0.7
.. _v2.0.6: https://github.com/miurahr/pykakasi/compare/v2.0.5...v2.0.6
.. _v2.0.5: https://github.com/miurahr/pykakasi/compare/v2.0.4...v2.0.5
.. _v2.0.4: https://github.com/miurahr/pykakasi/compare/v2.0.3...v2.0.4
.. _v2.0.3: https://github.com/miurahr/pykakasi/compare/v2.0.2...v2.0.3
.. _v2.0.2: https://github.com/miurahr/pykakasi/compare/v2.0.1...v2.0.2
.. _v2.0.1: https://github.com/miurahr/pykakasi/compare/v2.0.0...v2.0.1
.. _v2.0.0: https://github.com/miurahr/pykakasi/compare/v2.0.0b1...v2.0.0



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/miurahr/pykakasi",
    "name": "pykakasi",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Hiroshi Miura",
    "author_email": "miurahr@linux.com",
    "download_url": "https://files.pythonhosted.org/packages/cc/d4/f523531109ae48c9b089041e63ea5ef07d092cc021f7b563c1b64fd5aa9e/pykakasi-2.2.1.tar.gz",
    "platform": "",
    "description": "========\nPykakasi\n========\n\nOverview\n========\n\n.. image:: https://readthedocs.org/projects/pykakasi/badge/?version=latest\n   :target: https://pykakasi.readthedocs.io/en/latest/?badge=latest\n   :alt: Documentation Status\n\n.. image:: https://badge.fury.io/py/pykakasi.png\n   :target: http://badge.fury.io/py/Pykakasi\n   :alt: PyPI version\n\n.. image:: https://github.com/miurahr/pykakasi/workflows/Run%20Tox%20tests/badge.svg\n   :target: https://github.com/miurahr/pykakasi/actions?query=workflow%3A%22Run+Tox+tests%22\n   :alt: Run Tox tests\n\n.. image:: https://dev.azure.com/miurahr/github/_apis/build/status/miurahr.pykakasi?branchName=master\n   :target: https://dev.azure.com/miurahr/github/_build?definitionId=13&branchName=master\n   :alt: Azure-Pipelines\n\n.. image:: https://coveralls.io/repos/miurahr/pykakasi/badge.svg?branch=master\n   :target: https://coveralls.io/r/miurahr/pykakasi?branch=master\n   :alt: Coverage status\n\n\n``pykakasi`` is a Python Natural Language Processing (NLP) library to transliterate *hiragana*, *katakana* and *kanji* (Japanese text) into *r\u014dmaji* (Latin/Roman alphabet). It can handle characters in NFC form.\n\nIts algorithms are based on the `kakasi`_ library, which is written in C.\n\n* Install (from `PyPI`_): ``pip install pykakasi``\n* `Documentation available on readthedocs`_\n\n.. _`PyPI`: https://pypi.org/project/pykakasi/\n.. _`kakasi`: http://kakasi.namazu.org/\n.. _`Documentation available on readthedocs`: https://pykakasi.readthedocs.io/en/latest/index.html\n\n\nSupported python versions\n=========================\n\n* pykakasi supports python 3.6, 3.7, 3.8, 3.9, and pypy3\n\nUsage\n=====\n\nTransliterate Japanese text to kana, hiragana and romaji:\n\n.. code-block:: python\n\n    import pykakasi\n    kks = pykakasi.kakasi()\n    text = \"\u304b\u306a\u6f22\u5b57\"\n    result = kks.convert(text)\n    for item in result:\n        print(\"{}: kana '{}', hiragana '{}', romaji: '{}'\".format(item['orig'], item['kana'], item['hira'], item['hepburn']))\n\n    \u304b\u306a: kana '\u30ab\u30ca', hiragana: '\u304b\u306a', romaji: 'kana'\n    \u6f22\u5b57: kana '\u30ab\u30f3\u30b8', hiragana: '\u304b\u3093\u3058', romaji: 'kanji'\n\n\nHere is an example that output as similar with furigana mode.\n\n.. code-block:: python\n\n    import pykakasi\n    kks = pykakasi.kakasi()\n    text = \"\u304b\u306a\u6f22\u5b57\u4ea4\u3058\u308a\u6587\"\n    result = kks.convert(text)\n    for item in result:\n        print(\"{}[{}] \".format(item['orig'], item['hepburn'].capitalize()), end='')\n    print()\n\n    \u304b\u306a[Kana] \u6f22\u5b57[Kanji] \u4ea4\u3058\u308a[Majiri] \u6587[Bun]\n\n\nBenchmark result\n================\n\nYou can see benchmark result on various versions and platforms at https://github.com/miurahr/pykakasi/issues/123\n\n\nCopyright and License\n=====================\n\nPyKakasi::\n    Copyright (C) 2010-2021 Hiroshi Miura and contributors(see AUTHORS)\n\nKAKASI Dictionary::\n    Copyright (C) 2010-2021 Hiroshi Miura and contributors(see AUTHORS)\n\n    Copyright (C) 1992 1993 1994 Hironobu Takahashi, Masahiko Sato,\n    Yukiyoshi Kameyama, Miki Inooka, Akihiko Sasaki, Dai Ando, Junichi Okukawa,\n    Katsushi Sato and Nobuhiro Yamagishi\n\nUniDic::\n    Copyright (c) 2011-2021, The UniDic Consortium\n\n    All rights reserved.\n\n    Unidic is released under any of the GPL2, the LGPL2.1,\n    or the 3-clause BSD License. (See src/data/unidic/BSD.txt)\n    PyKakasi relicenses a part of the unidic with GPL3+.\n\nThis program is free software: you can redistribute it and/or modify\nit under the terms of the GNU General Public License as published by\nthe Free Software Foundation, either version 3 of the License, or\n(at your option) any later version.\n\nThis program is distributed in the hope that it will be useful,\nbut WITHOUT ANY WARRANTY; without even the implied warranty of\nMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\nGNU General Public License for more details.\n\nYou should have received a copy of the GNU General Public License\nalong with this program.  If not, see <http://www.gnu.org/licenses/>.\n\n==================\nPyKakasi ChangeLog\n==================\n\nAll notable changes to this project will be documented in this file.\n\nUnreleased_\n===========\n\nAdded\n-----\n\nChanged\n-------\n\nFixed\n-----\n\nDeprecated\n----------\n\nRemoved\n-------\n\nSecurity\n--------\n\nv2.2.1_ (10, July 2021)\n=======================\n\nFixed\n-----\n* Add Zenkaku-Question(\\uFF1F) and other Zenkaku marks as endmark (#146)\n\n\nv2.2.0_ (22, June 2021)\n=======================\n\nAdded\n-----\n* dictionary: add noun and adjectives from UniDic(#140)\n\nChanged\n-------\n* Refactoring main loop logics for convert()(#144)\n\nFixed\n-----\n* Fix segmentation (wakati) when combination with Katakana and Hiragana(#142)\n\nv2.1.1_ (16, May 2021)\n======================\n\nAdded\n-----\n* Provide Kakasi.normalize(text) class method\n* Add unidic data into data (not used yet), and add parse utility.\n\nFixed\n-----\n* Put type hint stub into package\n* Copyright notifications\n\nChanged\n-------\n* Expand all cletter into dictionary (#139)\n* Change primary kanwadict index from str to int\n* test: gather all legacy test into test_pykakasi_legacy.py file.\n\n\nv2.1.0_ (6, May 2021)\n=====================\n\nAdded\n-----\n* Deprecation warning when using old api(#124)\n* Add type hint file(pyi) (#124)\n* Benchmark test codes(#122)\n\nChanged\n-------\n* Cache internal results and improve performance about 30-40 times.(#128)\n* Use standard pickle for database file(#128)\n* Exceptions module is now `pykakasi`, not `pykakasi.exceptions`\n\nRemoved\n-------\n* Dependency for klepto(#128)\n\n\nv2.0.8_ (4, May 2021)\n=====================\n\nAdded\n-----\n\n* test: Benchmark and profiling (#122)\n\nChanged\n-------\n\n* Performance: avoid ord() when checking long-mark, speed up about 6%\n* Reformat code by black(#121)\n\n\nv2.0.7_ (26, Feb. 2021)\n=======================\n\nFixed\n-----\n\n* Infinite loop after running for a while,\n  handle independent HW VOICED SOUND MARK (#115, #118)\n\n\nv2.0.6_ (7, Feb. 2021)\n======================\n\nFixed\n-----\n\n* Hiragana for Age countersa(#116,#117)\n\n\nv2.0.5_ (5, Feb. 2021)\n======================\n\nChanged\n-------\n\n* CLI: use argparse for option parse(#113)\n\nFixed\n-----\n\n* Handle \u601d\u3063\u305f\u3001\u8a00\u3063\u305f\u3001\u884c\u3063\u305f properly.(#114)\n* CI: fix coveralls error\n\nDeprecated\n----------\n\n* CI: drop travis-ci test and badge\n\n\nv2.0.4_ (26, Nov. 2020)\n=======================\n\nFixed\n-----\n\n* CLI: Fix -v and -h option crash on python 3.7 and before (#108).\n\nv2.0.3_ (25, Nov. 2020)\n=======================\n\nFixed\n-----\n\n* CLI: Fix -v and -h option crash (#108).\n\n\nv2.0.2_ (23, Jul. 2020)\n=======================\n\nFixed\n-----\n\n* Fix convert() to handle Katakana correctly.(#103)\n\n\nv2.0.1_ (23, Jul. 2020)\n=======================\n\nChanged\n-------\n\n* Update setup.py, setup.cfg, tox.ini(#102)\n\n\nFixed\n-----\n\n* Fix convert() misses last part of a text (#99, #100)\n* Fix CI, coverage, and coveralls configurations(#101)\n\n\nv2.0.0_ (31, May. 2020)\n=======================\n\n\n.. _Unreleased: https://github.com/miurahr/pykakasi/compare/v2.2.1...HEAD\n.. _v2.2.1: https://github.com/miurahr/pykakasi/compare/v2.2.0...v2.2.1\n.. _v2.2.0: https://github.com/miurahr/pykakasi/compare/v2.1.1...v2.2.0\n.. _v2.1.1: https://github.com/miurahr/pykakasi/compare/v2.1.0...v2.1.1\n.. _v2.1.0: https://github.com/miurahr/pykakasi/compare/v2.0.8...v2.1.0\n.. _v2.0.8: https://github.com/miurahr/pykakasi/compare/v2.0.7...v2.0.8\n.. _v2.0.7: https://github.com/miurahr/pykakasi/compare/v2.0.6...v2.0.7\n.. _v2.0.6: https://github.com/miurahr/pykakasi/compare/v2.0.5...v2.0.6\n.. _v2.0.5: https://github.com/miurahr/pykakasi/compare/v2.0.4...v2.0.5\n.. _v2.0.4: https://github.com/miurahr/pykakasi/compare/v2.0.3...v2.0.4\n.. _v2.0.3: https://github.com/miurahr/pykakasi/compare/v2.0.2...v2.0.3\n.. _v2.0.2: https://github.com/miurahr/pykakasi/compare/v2.0.1...v2.0.2\n.. _v2.0.1: https://github.com/miurahr/pykakasi/compare/v2.0.0...v2.0.1\n.. _v2.0.0: https://github.com/miurahr/pykakasi/compare/v2.0.0b1...v2.0.0\n\n\n",
    "bugtrack_url": null,
    "license": "GPLv3",
    "summary": "Kana kanji simple inversion library",
    "version": "2.2.1",
    "project_urls": {
        "Homepage": "https://github.com/miurahr/pykakasi"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "82158d2fb3ede5477821704070859e63fefb16596b56339b68460770db75f5ea",
                "md5": "9e5ae87e5cdcef4728aceec55204eb48",
                "sha256": "f2d3d12a684314d7f317314499f5b0bec4a711eef4cfc963a2ca6f5c3d68f3b3"
            },
            "downloads": -1,
            "filename": "pykakasi-2.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9e5ae87e5cdcef4728aceec55204eb48",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 2374937,
            "upload_time": "2021-07-10T01:18:21",
            "upload_time_iso_8601": "2021-07-10T01:18:21.752160Z",
            "url": "https://files.pythonhosted.org/packages/82/15/8d2fb3ede5477821704070859e63fefb16596b56339b68460770db75f5ea/pykakasi-2.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ccd4f523531109ae48c9b089041e63ea5ef07d092cc021f7b563c1b64fd5aa9e",
                "md5": "d812a38b9fc394e4c307036b9c9e0f51",
                "sha256": "3a3510929a5596cae51fffa9cf78c0f742d96cebd93f726c96acee51407d18cc"
            },
            "downloads": -1,
            "filename": "pykakasi-2.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "d812a38b9fc394e4c307036b9c9e0f51",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 21751242,
            "upload_time": "2021-07-10T01:18:24",
            "upload_time_iso_8601": "2021-07-10T01:18:24.654964Z",
            "url": "https://files.pythonhosted.org/packages/cc/d4/f523531109ae48c9b089041e63ea5ef07d092cc021f7b563c1b64fd5aa9e/pykakasi-2.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-07-10 01:18:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "miurahr",
    "github_project": "pykakasi",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "pykakasi"
}
        
Elapsed time: 0.07135s