divide-char-type


Namedivide-char-type JSON
Version 0.2.6 PyPI version JSON
download
home_pagehttps://github.com/ShinyaAkagiI/divide_character_type
SummaryDivide documents by character type
upload_time2024-02-02 06:27:41
maintainer
docs_urlNone
authorShinya Akagi
requires_python
licensePSF
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # 概要

文字列をひらがな、片仮名、漢字、数字、アルファベットごとに分割するツールです。
英日両文に適用可能ですが、ピリオドを含む一部の用語は適切に分割できない場合があります。
詳しくは、実行サンプルをご確認ください。


# セットアップ

```
pip install divide-char-type
```


# アンインストール

```
pip uninstall divide-char-type
```

# 使い方

```
from divide_char_type import divide_char_type

data = divide_char_type("今日の天気は晴れです。")

print(data[0])
```


# 戻り値

- 配列型
	- 第0要素:字種分割した全体のリスト
	- 第1要素:字種分割した全体の字種タイプリスト
	- 第2要素:字種分割したひらがなのリスト
	- 第3要素:字種分割したカタカナのリスト
	- 第4要素:字種分割した漢字のリスト
	- 第5要素:字種分割したアルファベットのリスト
	- 第6要素:字種分割した数字のリスト
	- 第7要素:字種分割したその他記号などのリスト

# 字種タイプリスト  

- 0:平仮名
- 1:カタカナ
- 2:漢字
- 3:アルファベット
- 4:数字
- 5:その他記号など


# 実行サンプル

```
['1.0', ' ', 'is', ' ', 'number', '.']
['1', ',', '000', ' ', 'is', ' ', 'number', '.']
['u.s.a.', ' ', 'is', ' ', 'state', '.']
['u.k', '.', ' ', 'is', ' ', 'state', '.']
['e.g.', ',', ' ', 'th', ',', ' ', 'ch', ',', ' ', 'sh', ',', ' ', 'ph', ',', ' ', 'gh', ',', ' ', 'ng', ',', ' ', 'qu']
['state', ' ', 'include', ' ', 'u.s.', ' ', 'u.s.', ' ', 'is', ' ', 'state', '.']
['state', ' ', 'include', ' ', 'u.k', '.', ' ', 'u.k', '.', ' ', 'is', ' ', 'state', '.']
['u.s.', 'は', '国', 'です', '。']
['u.s', '.', 'は', '国', 'です', '。']
['あいうえおーかきくけこ']
['アイウエオーカキクケコ']
['今日', 'の', '天気', 'は', '晴', 'れです', '。\n', '明日', 'の', '天気', 'は', '曇', 'りです', '。\n']
['&&&', '1.0', '&&&']
```

# 実行速度

![](calc_implementation_time.png)


# 論文

赤木信也:字種分割ツールの開発と公開,  
情報処理学会第85回全国大会講演論文集 2023 (1), 29-30, 2023-02-16  
https://cir.nii.ac.jp/crid/1050579753470466176  

# ライセンス

- divide_char_type
	- Python Software Foundation License  
	- Copyright (C) 2023-2024 Shinya Akagi




            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ShinyaAkagiI/divide_character_type",
    "name": "divide-char-type",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Shinya Akagi",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/31/c0/e99771363f39cea51e11a3c4b3a7acb46b7845700645adf7139b8cb88a9c/divide-char-type-0.2.6.tar.gz",
    "platform": null,
    "description": "# \u6982\u8981\n\n\u6587\u5b57\u5217\u3092\u3072\u3089\u304c\u306a\u3001\u7247\u4eee\u540d\u3001\u6f22\u5b57\u3001\u6570\u5b57\u3001\u30a2\u30eb\u30d5\u30a1\u30d9\u30c3\u30c8\u3054\u3068\u306b\u5206\u5272\u3059\u308b\u30c4\u30fc\u30eb\u3067\u3059\u3002\n\u82f1\u65e5\u4e21\u6587\u306b\u9069\u7528\u53ef\u80fd\u3067\u3059\u304c\u3001\u30d4\u30ea\u30aa\u30c9\u3092\u542b\u3080\u4e00\u90e8\u306e\u7528\u8a9e\u306f\u9069\u5207\u306b\u5206\u5272\u3067\u304d\u306a\u3044\u5834\u5408\u304c\u3042\u308a\u307e\u3059\u3002\n\u8a73\u3057\u304f\u306f\u3001\u5b9f\u884c\u30b5\u30f3\u30d7\u30eb\u3092\u3054\u78ba\u8a8d\u304f\u3060\u3055\u3044\u3002\n\n\n# \u30bb\u30c3\u30c8\u30a2\u30c3\u30d7\n\n```\npip install divide-char-type\n```\n\n\n# \u30a2\u30f3\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\n\n```\npip uninstall divide-char-type\n```\n\n# \u4f7f\u3044\u65b9\n\n```\nfrom divide_char_type import divide_char_type\n\ndata = divide_char_type(\"\u4eca\u65e5\u306e\u5929\u6c17\u306f\u6674\u308c\u3067\u3059\u3002\")\n\nprint(data[0])\n```\n\n\n# \u623b\u308a\u5024\n\n- \u914d\u5217\u578b\n\t- \u7b2c0\u8981\u7d20\uff1a\u5b57\u7a2e\u5206\u5272\u3057\u305f\u5168\u4f53\u306e\u30ea\u30b9\u30c8\n\t- \u7b2c1\u8981\u7d20\uff1a\u5b57\u7a2e\u5206\u5272\u3057\u305f\u5168\u4f53\u306e\u5b57\u7a2e\u30bf\u30a4\u30d7\u30ea\u30b9\u30c8\n\t- \u7b2c2\u8981\u7d20\uff1a\u5b57\u7a2e\u5206\u5272\u3057\u305f\u3072\u3089\u304c\u306a\u306e\u30ea\u30b9\u30c8\n\t- \u7b2c3\u8981\u7d20\uff1a\u5b57\u7a2e\u5206\u5272\u3057\u305f\u30ab\u30bf\u30ab\u30ca\u306e\u30ea\u30b9\u30c8\n\t- \u7b2c4\u8981\u7d20\uff1a\u5b57\u7a2e\u5206\u5272\u3057\u305f\u6f22\u5b57\u306e\u30ea\u30b9\u30c8\n\t- \u7b2c5\u8981\u7d20\uff1a\u5b57\u7a2e\u5206\u5272\u3057\u305f\u30a2\u30eb\u30d5\u30a1\u30d9\u30c3\u30c8\u306e\u30ea\u30b9\u30c8\n\t- \u7b2c6\u8981\u7d20\uff1a\u5b57\u7a2e\u5206\u5272\u3057\u305f\u6570\u5b57\u306e\u30ea\u30b9\u30c8\n\t- \u7b2c7\u8981\u7d20\uff1a\u5b57\u7a2e\u5206\u5272\u3057\u305f\u305d\u306e\u4ed6\u8a18\u53f7\u306a\u3069\u306e\u30ea\u30b9\u30c8\n\n# \u5b57\u7a2e\u30bf\u30a4\u30d7\u30ea\u30b9\u30c8  \n\n- \uff10\uff1a\u5e73\u4eee\u540d\n- \uff11\uff1a\u30ab\u30bf\u30ab\u30ca\n- \uff12\uff1a\u6f22\u5b57\n- \uff13\uff1a\u30a2\u30eb\u30d5\u30a1\u30d9\u30c3\u30c8\n- \uff14\uff1a\u6570\u5b57\n- \uff15\uff1a\u305d\u306e\u4ed6\u8a18\u53f7\u306a\u3069\n\n\n# \u5b9f\u884c\u30b5\u30f3\u30d7\u30eb\n\n```\n['1.0', ' ', 'is', ' ', 'number', '.']\n['1', ',', '000', ' ', 'is', ' ', 'number', '.']\n['u.s.a.', ' ', 'is', ' ', 'state', '.']\n['u.k', '.', ' ', 'is', ' ', 'state', '.']\n['e.g.', ',', ' ', 'th', ',', ' ', 'ch', ',', ' ', 'sh', ',', ' ', 'ph', ',', ' ', 'gh', ',', ' ', 'ng', ',', ' ', 'qu']\n['state', ' ', 'include', ' ', 'u.s.', ' ', 'u.s.', ' ', 'is', ' ', 'state', '.']\n['state', ' ', 'include', ' ', 'u.k', '.', ' ', 'u.k', '.', ' ', 'is', ' ', 'state', '.']\n['u.s.', '\u306f', '\u56fd', '\u3067\u3059', '\u3002']\n['u.s', '.', '\u306f', '\u56fd', '\u3067\u3059', '\u3002']\n['\u3042\u3044\u3046\u3048\u304a\u30fc\u304b\u304d\u304f\u3051\u3053']\n['\u30a2\u30a4\u30a6\u30a8\u30aa\u30fc\u30ab\u30ad\u30af\u30b1\u30b3']\n['\u4eca\u65e5', '\u306e', '\u5929\u6c17', '\u306f', '\u6674', '\u308c\u3067\u3059', '\u3002\\n', '\u660e\u65e5', '\u306e', '\u5929\u6c17', '\u306f', '\u66c7', '\u308a\u3067\u3059', '\u3002\\n']\n['&&&', '1.0', '&&&']\n```\n\n# \u5b9f\u884c\u901f\u5ea6\n\n![](calc_implementation_time.png)\n\n\n# \u8ad6\u6587\n\n\u8d64\u6728\u4fe1\u4e5f\uff1a\u5b57\u7a2e\u5206\u5272\u30c4\u30fc\u30eb\u306e\u958b\u767a\u3068\u516c\u958b,  \n\u60c5\u5831\u51e6\u7406\u5b66\u4f1a\u7b2c85\u56de\u5168\u56fd\u5927\u4f1a\u8b1b\u6f14\u8ad6\u6587\u96c6 2023 (1), 29-30, 2023-02-16  \nhttps://cir.nii.ac.jp/crid/1050579753470466176  \n\n# \u30e9\u30a4\u30bb\u30f3\u30b9\n\n- divide_char_type\n\t- Python Software Foundation License  \n\t- Copyright (C) 2023-2024 Shinya Akagi\n\n\n\n",
    "bugtrack_url": null,
    "license": "PSF",
    "summary": "Divide documents by character type",
    "version": "0.2.6",
    "project_urls": {
        "Homepage": "https://github.com/ShinyaAkagiI/divide_character_type"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "262b983d970a656cc0b46326a3c68560ff2b552554516099b8376b2d30d07859",
                "md5": "81dda468b9ae543677ffa9a30bafb5d0",
                "sha256": "cd094142ed65cfe593384824ec464ad8c45ee237afff4f901da4031e031b1624"
            },
            "downloads": -1,
            "filename": "divide_char_type-0.2.6-py2-none-any.whl",
            "has_sig": false,
            "md5_digest": "81dda468b9ae543677ffa9a30bafb5d0",
            "packagetype": "bdist_wheel",
            "python_version": "py2",
            "requires_python": null,
            "size": 4705,
            "upload_time": "2024-02-02T06:27:40",
            "upload_time_iso_8601": "2024-02-02T06:27:40.622584Z",
            "url": "https://files.pythonhosted.org/packages/26/2b/983d970a656cc0b46326a3c68560ff2b552554516099b8376b2d30d07859/divide_char_type-0.2.6-py2-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "31c0e99771363f39cea51e11a3c4b3a7acb46b7845700645adf7139b8cb88a9c",
                "md5": "d2b1c6ea5af673c3ad27ca0fe971e769",
                "sha256": "36a09d475bcd7b39611c5949d32b4eb682b5633c5c96daa7fe7d3c6a28ca3fb1"
            },
            "downloads": -1,
            "filename": "divide-char-type-0.2.6.tar.gz",
            "has_sig": false,
            "md5_digest": "d2b1c6ea5af673c3ad27ca0fe971e769",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 4739,
            "upload_time": "2024-02-02T06:27:41",
            "upload_time_iso_8601": "2024-02-02T06:27:41.990017Z",
            "url": "https://files.pythonhosted.org/packages/31/c0/e99771363f39cea51e11a3c4b3a7acb46b7845700645adf7139b8cb88a9c/divide-char-type-0.2.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-02 06:27:41",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ShinyaAkagiI",
    "github_project": "divide_character_type",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "divide-char-type"
}
        
Elapsed time: 0.19727s