mecab-text-cleaner


Namemecab-text-cleaner JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/34j/mecab-text-cleaner
SummarySimple Python package for getting japanese reading (yomigana) using MeCab
upload_time2023-12-31 09:35:23
maintainer
docs_urlNone
author34j
requires_python>=3.8,<4.0
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # MeCab Text Cleaner

<p align="center">
  <a href="https://github.com/34j/mecab-text-cleaner/actions/workflows/ci.yml?query=branch%3Amain">
    <img src="https://img.shields.io/github/actions/workflow/status/34j/mecab-text-cleaner/ci.yml?branch=main&label=CI&logo=github&style=flat-square" alt="CI Status" >
  </a>
  <a href="https://mecab-text-cleaner.readthedocs.io">
    <img src="https://img.shields.io/readthedocs/mecab-text-cleaner.svg?logo=read-the-docs&logoColor=fff&style=flat-square" alt="Documentation Status">
  </a>
  <a href="https://codecov.io/gh/34j/mecab-text-cleaner">
    <img src="https://img.shields.io/codecov/c/github/34j/mecab-text-cleaner.svg?logo=codecov&logoColor=fff&style=flat-square" alt="Test coverage percentage">
  </a>
</p>
<p align="center">
  <a href="https://python-poetry.org/">
    <img src="https://img.shields.io/badge/packaging-poetry-299bd7?style=flat-square&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA4AAAASCAYAAABrXO8xAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAJJSURBVHgBfZLPa1NBEMe/s7tNXoxW1KJQKaUHkXhQvHgW6UHQQ09CBS/6V3hKc/AP8CqCrUcpmop3Cx48eDB4yEECjVQrlZb80CRN8t6OM/teagVxYZi38+Yz853dJbzoMV3MM8cJUcLMSUKIE8AzQ2PieZzFxEJOHMOgMQQ+dUgSAckNXhapU/NMhDSWLs1B24A8sO1xrN4NECkcAC9ASkiIJc6k5TRiUDPhnyMMdhKc+Zx19l6SgyeW76BEONY9exVQMzKExGKwwPsCzza7KGSSWRWEQhyEaDXp6ZHEr416ygbiKYOd7TEWvvcQIeusHYMJGhTwF9y7sGnSwaWyFAiyoxzqW0PM/RjghPxF2pWReAowTEXnDh0xgcLs8l2YQmOrj3N7ByiqEoH0cARs4u78WgAVkoEDIDoOi3AkcLOHU60RIg5wC4ZuTC7FaHKQm8Hq1fQuSOBvX/sodmNJSB5geaF5CPIkUeecdMxieoRO5jz9bheL6/tXjrwCyX/UYBUcjCaWHljx1xiX6z9xEjkYAzbGVnB8pvLmyXm9ep+W8CmsSHQQY77Zx1zboxAV0w7ybMhQmfqdmmw3nEp1I0Z+FGO6M8LZdoyZnuzzBdjISicKRnpxzI9fPb+0oYXsNdyi+d3h9bm9MWYHFtPeIZfLwzmFDKy1ai3p+PDls1Llz4yyFpferxjnyjJDSEy9CaCx5m2cJPerq6Xm34eTrZt3PqxYO1XOwDYZrFlH1fWnpU38Y9HRze3lj0vOujZcXKuuXm3jP+s3KbZVra7y2EAAAAAASUVORK5CYII=" alt="Poetry">
  </a>
  <a href="https://github.com/ambv/black">
    <img src="https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square" alt="black">
  </a>
  <a href="https://github.com/pre-commit/pre-commit">
    <img src="https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white&style=flat-square" alt="pre-commit">
  </a>
</p>
<p align="center">
  <a href="https://pypi.org/project/mecab-text-cleaner/">
    <img src="https://img.shields.io/pypi/v/mecab-text-cleaner.svg?logo=python&logoColor=fff&style=flat-square" alt="PyPI Version">
  </a>
  <img src="https://img.shields.io/pypi/pyversions/mecab-text-cleaner.svg?style=flat-square&logo=python&amp;logoColor=fff" alt="Supported Python versions">
  <img src="https://img.shields.io/pypi/l/mecab-text-cleaner.svg?style=flat-square" alt="License">
</p>

This is a simple Python package for getting japanese readings (yomigana) and accents using MeCab.
Please also consider using [pyopenjtalk](https://github.com/r9y9/pyopenjtalk) (no accents) or [pyopenjtalk_g2p_prosody (ESPnet)](https://github.com/espnet/espnet/blob/5d0758e2a7063b82d1f10a8ac2de98eb6cf8a352/espnet2/text/phoneme_tokenizer.py#L103) (with accents), as this package does not account for accent changes in compound words.

## Installation

Install this via pip or pipx (or your favourite package manager):

```shell
pipx install mecab-text-cleaner[unidecode,unidic]
```

```shell
pip install mecab-text-cleaner[unidecode,unidic]
```

## Usage

```shell
> mtc いい天気ですね。
イ]ー テ]ンキ デス ネ。
> mtc いい天気ですね。 --ascii
i] te]nki desu ne.
> mtc いい天気ですね --no-add-atype --no-add-blank-between-words
イーテンキデスネ
> mtc いい天気ですね --no-add-atype --no-add-blank-between-words -r kana
イイテンキデスネ
```

```python
from mecab_text_cleaner import to_reading, to_ascii_clean

assert to_reading("     空、雲。\n雨!(") == "ソ]ラ、 ク]モ。\nア]メ!("
assert to_ascii_clean("      한空、雲。\n雨!(") == "han so]ra, ku]mo. \na]me!("
```

## Contributors ✨

Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):

<!-- prettier-ignore-start -->
<!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section -->
<!-- markdownlint-disable -->
<!-- markdownlint-enable -->
<!-- ALL-CONTRIBUTORS-LIST:END -->
<!-- prettier-ignore-end -->

This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/34j/mecab-text-cleaner",
    "name": "mecab-text-cleaner",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "34j",
    "author_email": "34j.95a2p@simplelogin.com",
    "download_url": "https://files.pythonhosted.org/packages/dd/9d/9aa8b4eead8a3b3c2f8f8cbdf3774b727ea3bd9fac673b45e6f26f1b4fdb/mecab_text_cleaner-0.1.1.tar.gz",
    "platform": null,
    "description": "# MeCab Text Cleaner\n\n<p align=\"center\">\n  <a href=\"https://github.com/34j/mecab-text-cleaner/actions/workflows/ci.yml?query=branch%3Amain\">\n    <img src=\"https://img.shields.io/github/actions/workflow/status/34j/mecab-text-cleaner/ci.yml?branch=main&label=CI&logo=github&style=flat-square\" alt=\"CI Status\" >\n  </a>\n  <a href=\"https://mecab-text-cleaner.readthedocs.io\">\n    <img src=\"https://img.shields.io/readthedocs/mecab-text-cleaner.svg?logo=read-the-docs&logoColor=fff&style=flat-square\" alt=\"Documentation Status\">\n  </a>\n  <a href=\"https://codecov.io/gh/34j/mecab-text-cleaner\">\n    <img src=\"https://img.shields.io/codecov/c/github/34j/mecab-text-cleaner.svg?logo=codecov&logoColor=fff&style=flat-square\" alt=\"Test coverage percentage\">\n  </a>\n</p>\n<p align=\"center\">\n  <a href=\"https://python-poetry.org/\">\n    <img src=\"https://img.shields.io/badge/packaging-poetry-299bd7?style=flat-square&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA4AAAASCAYAAABrXO8xAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAJJSURBVHgBfZLPa1NBEMe/s7tNXoxW1KJQKaUHkXhQvHgW6UHQQ09CBS/6V3hKc/AP8CqCrUcpmop3Cx48eDB4yEECjVQrlZb80CRN8t6OM/teagVxYZi38+Yz853dJbzoMV3MM8cJUcLMSUKIE8AzQ2PieZzFxEJOHMOgMQQ+dUgSAckNXhapU/NMhDSWLs1B24A8sO1xrN4NECkcAC9ASkiIJc6k5TRiUDPhnyMMdhKc+Zx19l6SgyeW76BEONY9exVQMzKExGKwwPsCzza7KGSSWRWEQhyEaDXp6ZHEr416ygbiKYOd7TEWvvcQIeusHYMJGhTwF9y7sGnSwaWyFAiyoxzqW0PM/RjghPxF2pWReAowTEXnDh0xgcLs8l2YQmOrj3N7ByiqEoH0cARs4u78WgAVkoEDIDoOi3AkcLOHU60RIg5wC4ZuTC7FaHKQm8Hq1fQuSOBvX/sodmNJSB5geaF5CPIkUeecdMxieoRO5jz9bheL6/tXjrwCyX/UYBUcjCaWHljx1xiX6z9xEjkYAzbGVnB8pvLmyXm9ep+W8CmsSHQQY77Zx1zboxAV0w7ybMhQmfqdmmw3nEp1I0Z+FGO6M8LZdoyZnuzzBdjISicKRnpxzI9fPb+0oYXsNdyi+d3h9bm9MWYHFtPeIZfLwzmFDKy1ai3p+PDls1Llz4yyFpferxjnyjJDSEy9CaCx5m2cJPerq6Xm34eTrZt3PqxYO1XOwDYZrFlH1fWnpU38Y9HRze3lj0vOujZcXKuuXm3jP+s3KbZVra7y2EAAAAAASUVORK5CYII=\" alt=\"Poetry\">\n  </a>\n  <a href=\"https://github.com/ambv/black\">\n    <img src=\"https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square\" alt=\"black\">\n  </a>\n  <a href=\"https://github.com/pre-commit/pre-commit\">\n    <img src=\"https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white&style=flat-square\" alt=\"pre-commit\">\n  </a>\n</p>\n<p align=\"center\">\n  <a href=\"https://pypi.org/project/mecab-text-cleaner/\">\n    <img src=\"https://img.shields.io/pypi/v/mecab-text-cleaner.svg?logo=python&logoColor=fff&style=flat-square\" alt=\"PyPI Version\">\n  </a>\n  <img src=\"https://img.shields.io/pypi/pyversions/mecab-text-cleaner.svg?style=flat-square&logo=python&amp;logoColor=fff\" alt=\"Supported Python versions\">\n  <img src=\"https://img.shields.io/pypi/l/mecab-text-cleaner.svg?style=flat-square\" alt=\"License\">\n</p>\n\nThis is a simple Python package for getting japanese readings (yomigana) and accents using MeCab.\nPlease also consider using [pyopenjtalk](https://github.com/r9y9/pyopenjtalk) (no accents) or [pyopenjtalk_g2p_prosody (ESPnet)](https://github.com/espnet/espnet/blob/5d0758e2a7063b82d1f10a8ac2de98eb6cf8a352/espnet2/text/phoneme_tokenizer.py#L103) (with accents), as this package does not account for accent changes in compound words.\n\n## Installation\n\nInstall this via pip or pipx (or your favourite package manager):\n\n```shell\npipx install mecab-text-cleaner[unidecode,unidic]\n```\n\n```shell\npip install mecab-text-cleaner[unidecode,unidic]\n```\n\n## Usage\n\n```shell\n> mtc \u3044\u3044\u5929\u6c17\u3067\u3059\u306d\u3002\n\u30a4]\u30fc \u30c6]\u30f3\u30ad \u30c7\u30b9 \u30cd\u3002\n> mtc \u3044\u3044\u5929\u6c17\u3067\u3059\u306d\u3002 --ascii\ni] te]nki desu ne.\n> mtc \u3044\u3044\u5929\u6c17\u3067\u3059\u306d --no-add-atype --no-add-blank-between-words\n\u30a4\u30fc\u30c6\u30f3\u30ad\u30c7\u30b9\u30cd\n> mtc \u3044\u3044\u5929\u6c17\u3067\u3059\u306d --no-add-atype --no-add-blank-between-words -r kana\n\u30a4\u30a4\u30c6\u30f3\u30ad\u30c7\u30b9\u30cd\n```\n\n```python\nfrom mecab_text_cleaner import to_reading, to_ascii_clean\n\nassert to_reading(\"     \u7a7a\u3001\u96f2\u3002\\n\u96e8\uff01\uff08\") == \"\u30bd]\u30e9\u3001 \u30af]\u30e2\u3002\\n\u30a2]\u30e1\uff01\uff08\"\nassert to_ascii_clean(\"      \ud55c\u7a7a\u3001\u96f2\u3002\\n\u96e8\uff01\uff08\") == \"han so]ra, ku]mo. \\na]me!(\"\n```\n\n## Contributors \u2728\n\nThanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):\n\n<!-- prettier-ignore-start -->\n<!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section -->\n<!-- markdownlint-disable -->\n<!-- markdownlint-enable -->\n<!-- ALL-CONTRIBUTORS-LIST:END -->\n<!-- prettier-ignore-end -->\n\nThis project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Simple Python package for getting japanese reading (yomigana) using MeCab",
    "version": "0.1.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/34j/mecab-text-cleaner/issues",
        "Changelog": "https://github.com/34j/mecab-text-cleaner/blob/main/CHANGELOG.md",
        "Documentation": "https://mecab-text-cleaner.readthedocs.io",
        "Homepage": "https://github.com/34j/mecab-text-cleaner",
        "Repository": "https://github.com/34j/mecab-text-cleaner"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3151da52e56d0647889f3699159168873f9bcf7125cf7eec48854a770b7dc9c3",
                "md5": "9b3a0e4c2a60ec715d50246b5ad35330",
                "sha256": "f7e87ed974daeb50184c55e8fc25fc3d0bb4e1949a534eb1116b1bb12c8eff1f"
            },
            "downloads": -1,
            "filename": "mecab_text_cleaner-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9b3a0e4c2a60ec715d50246b5ad35330",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 8884,
            "upload_time": "2023-12-31T09:35:21",
            "upload_time_iso_8601": "2023-12-31T09:35:21.728527Z",
            "url": "https://files.pythonhosted.org/packages/31/51/da52e56d0647889f3699159168873f9bcf7125cf7eec48854a770b7dc9c3/mecab_text_cleaner-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dd9d9aa8b4eead8a3b3c2f8f8cbdf3774b727ea3bd9fac673b45e6f26f1b4fdb",
                "md5": "ac6cabb599ea49417e1e2d675a380c76",
                "sha256": "6f56cb65a3ce0f55801ed0323f9070a2fca6c58e257be84758391fc264b94bf8"
            },
            "downloads": -1,
            "filename": "mecab_text_cleaner-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "ac6cabb599ea49417e1e2d675a380c76",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 9709,
            "upload_time": "2023-12-31T09:35:23",
            "upload_time_iso_8601": "2023-12-31T09:35:23.762563Z",
            "url": "https://files.pythonhosted.org/packages/dd/9d/9aa8b4eead8a3b3c2f8f8cbdf3774b727ea3bd9fac673b45e6f26f1b4fdb/mecab_text_cleaner-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-31 09:35:23",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "34j",
    "github_project": "mecab-text-cleaner",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "mecab-text-cleaner"
}
        
34j
Elapsed time: 0.17205s