kuro2sudachi


Namekuro2sudachi JSON
Version 0.3.6 PyPI version JSON
download
home_pagehttp://github.com/po3rin/kuro2sudachi
Summary
upload_time2021-06-29 04:23:56
maintainer
docs_urlNone
authorpo3rin
requires_python>=3.7,<4.0
licenseApache-2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # kuro2sudachi

[![PyPi version](https://img.shields.io/pypi/v/kuro2sudachi.svg)](https://pypi.python.org/pypi/kuro2sudachi/)
![PyTest](https://github.com/po3rin/kuro2sudachi/workflows/PyTest/badge.svg)
[![](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/release/python-390/)

kuro2sudachi lets you to convert kuromoji user dict to sudachi user dict.

## Usage

```sh
$ pip install kuro2sudachi
$ kuro2sudachi kuromoji_dict.txt -o sudachi_user_dict.txt
```

## Custom pos convert dict

you can overwrite convert config with setting json file.

```json
{
    "固有名詞": {
        "sudachi_pos": "名詞,固有名詞,一般,*,*,*",
        "left_id": 4786,
        "right_id": 4786,
        "cost": 5000
    },
    "名詞": {
        "sudachi_pos": "名詞,普通名詞,一般,*,*,*",
        "left_id": 5146,
        "right_id": 5146,
        "cost": 5000
    }
}

```

```$
$ kuro2sudachi kuromoji_dict.txt -o sudachi_user_dict.txt -c convert_config.json
```

if you want to ignore unsupported pos error & invalid format, use `--ignore` flag.

## Dictionary type

You can specify the dictionary with the tokenize option -s (default: core).

```sh
$ pip install sudachidict_full
$ kuro2sudachi kuromoji_dict.txt -o sudachi_user_dict.txt -s full
```

## Auto Splitting

kuro2sudachi supports suto splitting.

```json
{
    "名詞": {
        "sudachi_pos": "名詞,普通名詞,一般,*,*,*",
        "left_id": 5146,
        "right_id": 5146,
        "cost": 5000,
        "split_mode": "C",
        "unit_div_mode": [
            "A", "B"
        ]
    }
}
```

output includes unit devision info.

```sh
$ cat kuromoji_dict.txt
融合たんぱく質,融合たんぱく質,融合たんぱく質,名詞
発作性心房細動,発作性心房細動,発作性心房細動,名詞

$ kuro2sudachi kuromoji_dict.txt -o sudachi_user_dict.txt -c convert_config.json --ignore

$ cat sudachi_user_dict.txt
融合たんぱく質,4786,4786,5000,融合たんぱく質,名詞,普通名詞,一般,*,*,*,,融合たんぱく質,*,C,"融合,名詞,普通名詞,サ変可能,*,*,*,ユウゴウ/たんぱく,名詞,普通名詞,一般,*,*,*,タンパク/質,接尾辞,名詞的,一般,*,*,*,シツ","融合,名詞,普通名詞,サ変可能,*,*,*,ユウゴウ/たんぱく質,名詞,普通名詞,一般,*,*,*,タンパクシツ",*
発作性心房細動,4786,4786,5000,発作性心房細動,名詞,普通名詞,一般,*,*,*,,発作性心房細動,*,C,"発作,名詞,普通名詞,一般,*,*,*,ホッサ/性,接尾辞,名詞的,一般,*,*,*,セイ/心房,名詞,普通名詞,一般,*,*,*,シンボウ/細動,名詞,普通名詞,一般,*,*,*,サイドウ","発作,名詞,普通名詞,一般,*,*,*,ホッサ/性,接尾辞,名詞的,一般,*,*,*,セイ/心房,名詞,普通名詞,一般,*,*,*,シンボウ/細動,名詞,普通名詞,一般,*,*,*,サイドウ",*
```

## Splitting Words defined by kuromoji

Currently, the CLI does not support word splitting defined by kuromoji. Therefore, the split representation of kuromoji is ignored.

```
中咽頭ガン,中咽頭 ガン,チュウイントウ ガン,カスタム名詞
↓
中咽頭ガン,4786,4786,7000,中咽頭ガン,名詞,固有名詞,一般,*,*,*,チュウイントウガン,中咽頭ガン,*,*,*,*,*
```

# For Developer

test kuro2sudachi

```sh
$ poetry install
$ poetry run pytest
```

exec kuro2sudachi command

```sh
$ poetry run kuro2sudachi tests/kuromoji_dict_test.txt -o sudachi_user_dict.txt
```

## TODO

- [ ] split mode
- [ ] default rewrite

            

Raw data

            {
    "_id": null,
    "home_page": "http://github.com/po3rin/kuro2sudachi",
    "name": "kuro2sudachi",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7,<4.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "po3rin",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/d8/7c/c2bd0bc055f4b22dff32040519c4f7f1727b2c053db0c23f160dad2df0e5/kuro2sudachi-0.3.6.tar.gz",
    "platform": "",
    "description": "# kuro2sudachi\n\n[![PyPi version](https://img.shields.io/pypi/v/kuro2sudachi.svg)](https://pypi.python.org/pypi/kuro2sudachi/)\n![PyTest](https://github.com/po3rin/kuro2sudachi/workflows/PyTest/badge.svg)\n[![](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/release/python-390/)\n\nkuro2sudachi lets you to convert kuromoji user dict to sudachi user dict.\n\n## Usage\n\n```sh\n$ pip install kuro2sudachi\n$ kuro2sudachi kuromoji_dict.txt -o sudachi_user_dict.txt\n```\n\n## Custom pos convert dict\n\nyou can overwrite convert config with setting json file.\n\n```json\n{\n    \"\u56fa\u6709\u540d\u8a5e\": {\n        \"sudachi_pos\": \"\u540d\u8a5e,\u56fa\u6709\u540d\u8a5e,\u4e00\u822c,*,*,*\",\n        \"left_id\": 4786,\n        \"right_id\": 4786,\n        \"cost\": 5000\n    },\n    \"\u540d\u8a5e\": {\n        \"sudachi_pos\": \"\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*\",\n        \"left_id\": 5146,\n        \"right_id\": 5146,\n        \"cost\": 5000\n    }\n}\n\n```\n\n```$\n$ kuro2sudachi kuromoji_dict.txt -o sudachi_user_dict.txt -c convert_config.json\n```\n\nif you want to ignore unsupported pos error & invalid format, use `--ignore` flag.\n\n## Dictionary type\n\nYou can specify the dictionary with the tokenize option -s (default: core).\n\n```sh\n$ pip install sudachidict_full\n$ kuro2sudachi kuromoji_dict.txt -o sudachi_user_dict.txt -s full\n```\n\n## Auto Splitting\n\nkuro2sudachi supports suto splitting.\n\n```json\n{\n    \"\u540d\u8a5e\": {\n        \"sudachi_pos\": \"\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*\",\n        \"left_id\": 5146,\n        \"right_id\": 5146,\n        \"cost\": 5000,\n        \"split_mode\": \"C\",\n        \"unit_div_mode\": [\n            \"A\", \"B\"\n        ]\n    }\n}\n```\n\noutput includes unit devision info.\n\n```sh\n$ cat kuromoji_dict.txt\n\u878d\u5408\u305f\u3093\u3071\u304f\u8cea,\u878d\u5408\u305f\u3093\u3071\u304f\u8cea,\u878d\u5408\u305f\u3093\u3071\u304f\u8cea,\u540d\u8a5e\n\u767a\u4f5c\u6027\u5fc3\u623f\u7d30\u52d5,\u767a\u4f5c\u6027\u5fc3\u623f\u7d30\u52d5,\u767a\u4f5c\u6027\u5fc3\u623f\u7d30\u52d5,\u540d\u8a5e\n\n$ kuro2sudachi kuromoji_dict.txt -o sudachi_user_dict.txt -c convert_config.json --ignore\n\n$ cat sudachi_user_dict.txt\n\u878d\u5408\u305f\u3093\u3071\u304f\u8cea,4786,4786,5000,\u878d\u5408\u305f\u3093\u3071\u304f\u8cea,\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*,,\u878d\u5408\u305f\u3093\u3071\u304f\u8cea,*,C,\"\u878d\u5408,\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u30b5\u5909\u53ef\u80fd,*,*,*,\u30e6\u30a6\u30b4\u30a6/\u305f\u3093\u3071\u304f,\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*,\u30bf\u30f3\u30d1\u30af/\u8cea,\u63a5\u5c3e\u8f9e,\u540d\u8a5e\u7684,\u4e00\u822c,*,*,*,\u30b7\u30c4\",\"\u878d\u5408,\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u30b5\u5909\u53ef\u80fd,*,*,*,\u30e6\u30a6\u30b4\u30a6/\u305f\u3093\u3071\u304f\u8cea,\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*,\u30bf\u30f3\u30d1\u30af\u30b7\u30c4\",*\n\u767a\u4f5c\u6027\u5fc3\u623f\u7d30\u52d5,4786,4786,5000,\u767a\u4f5c\u6027\u5fc3\u623f\u7d30\u52d5,\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*,,\u767a\u4f5c\u6027\u5fc3\u623f\u7d30\u52d5,*,C,\"\u767a\u4f5c,\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*,\u30db\u30c3\u30b5/\u6027,\u63a5\u5c3e\u8f9e,\u540d\u8a5e\u7684,\u4e00\u822c,*,*,*,\u30bb\u30a4/\u5fc3\u623f,\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*,\u30b7\u30f3\u30dc\u30a6/\u7d30\u52d5,\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*,\u30b5\u30a4\u30c9\u30a6\",\"\u767a\u4f5c,\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*,\u30db\u30c3\u30b5/\u6027,\u63a5\u5c3e\u8f9e,\u540d\u8a5e\u7684,\u4e00\u822c,*,*,*,\u30bb\u30a4/\u5fc3\u623f,\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*,\u30b7\u30f3\u30dc\u30a6/\u7d30\u52d5,\u540d\u8a5e,\u666e\u901a\u540d\u8a5e,\u4e00\u822c,*,*,*,\u30b5\u30a4\u30c9\u30a6\",*\n```\n\n## Splitting Words defined by kuromoji\n\nCurrently, the CLI does not support word splitting defined by kuromoji. Therefore, the split representation of kuromoji is ignored.\n\n```\n\u4e2d\u54bd\u982d\u30ac\u30f3,\u4e2d\u54bd\u982d \u30ac\u30f3,\u30c1\u30e5\u30a6\u30a4\u30f3\u30c8\u30a6 \u30ac\u30f3,\u30ab\u30b9\u30bf\u30e0\u540d\u8a5e\n\u2193\n\u4e2d\u54bd\u982d\u30ac\u30f3,4786,4786,7000,\u4e2d\u54bd\u982d\u30ac\u30f3,\u540d\u8a5e,\u56fa\u6709\u540d\u8a5e,\u4e00\u822c,*,*,*,\u30c1\u30e5\u30a6\u30a4\u30f3\u30c8\u30a6\u30ac\u30f3,\u4e2d\u54bd\u982d\u30ac\u30f3,*,*,*,*,*\n```\n\n# For Developer\n\ntest kuro2sudachi\n\n```sh\n$ poetry install\n$ poetry run pytest\n```\n\nexec kuro2sudachi command\n\n```sh\n$ poetry run kuro2sudachi tests/kuromoji_dict_test.txt -o sudachi_user_dict.txt\n```\n\n## TODO\n\n- [ ] split mode\n- [ ] default rewrite\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "",
    "version": "0.3.6",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "488d58dc0e87534f3a06a38431b7ed57",
                "sha256": "27ea04873f16882c72330e2e70635e846e6890543730566dc78d024613cddc92"
            },
            "downloads": -1,
            "filename": "kuro2sudachi-0.3.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "488d58dc0e87534f3a06a38431b7ed57",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7,<4.0",
            "size": 8257,
            "upload_time": "2021-06-29T04:23:55",
            "upload_time_iso_8601": "2021-06-29T04:23:55.033250Z",
            "url": "https://files.pythonhosted.org/packages/bf/93/8706a957aa287cf91453b837b46ef25f0f0ba6ffeb1c69701473355cd22c/kuro2sudachi-0.3.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "a42e7c7414877ad3fd942709102162fd",
                "sha256": "6ad929c2d3189f78de9f3a34f7fb4f59cf11c82af2b981a257c9f90d768fc956"
            },
            "downloads": -1,
            "filename": "kuro2sudachi-0.3.6.tar.gz",
            "has_sig": false,
            "md5_digest": "a42e7c7414877ad3fd942709102162fd",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7,<4.0",
            "size": 8582,
            "upload_time": "2021-06-29T04:23:56",
            "upload_time_iso_8601": "2021-06-29T04:23:56.959623Z",
            "url": "https://files.pythonhosted.org/packages/d8/7c/c2bd0bc055f4b22dff32040519c4f7f1727b2c053db0c23f160dad2df0e5/kuro2sudachi-0.3.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-06-29 04:23:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "po3rin",
    "github_project": "kuro2sudachi",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "kuro2sudachi"
}
        
Elapsed time: 0.28297s