esupar


Nameesupar JSON
Version 1.7.0 PyPI version JSON
download
home_pagehttps://github.com/KoichiYasuoka/esupar
SummaryTokenizer POS-tagger and Dependency-parser with BERT/RoBERTa/DeBERTa models for Japanese and other languages
upload_time2024-02-29 02:35:21
maintainer
docs_urlNone
authorKoichi Yasuoka
requires_python>=3.7
licenseMIT
keywords nlp japanese korean chinese thai vietnamese english german serbian coptic ainu
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Current PyPI packages](https://badge.fury.io/py/esupar.svg)](https://pypi.org/project/esupar/)

# esupar

Tokenizer, POS-tagger, and dependency-parser with [Transformers](https://huggingface.co/transformers/) and [SuPar](https://pypi.org/project/supar/).

## Basic usage

```py
>>> import esupar
>>> nlp=esupar.load("ja")
>>> doc=nlp("太郎は花子が読んでいる本を次郎に渡した")
>>> print(doc)
1	太郎	_	PROPN	_	_	12	nsubj	_	SpaceAfter=No
2	は	_	ADP	_	_	1	case	_	SpaceAfter=No
3	花子	_	PROPN	_	_	5	nsubj	_	SpaceAfter=No
4	が	_	ADP	_	_	3	case	_	SpaceAfter=No
5	読ん	_	VERB	_	_	8	acl	_	SpaceAfter=No
6	で	_	SCONJ	_	_	5	mark	_	SpaceAfter=No
7	いる	_	AUX	_	_	5	aux	_	SpaceAfter=No
8	本	_	NOUN	_	_	12	obj	_	SpaceAfter=No
9	を	_	ADP	_	_	8	case	_	SpaceAfter=No
10	次郎	_	PROPN	_	_	12	obl	_	SpaceAfter=No
11	に	_	ADP	_	_	10	case	_	SpaceAfter=No
12	渡し	_	VERB	_	_	0	root	_	SpaceAfter=No
13	た	_	AUX	_	_	12	aux	_	_

>>> import deplacy
>>> deplacy.render(doc,Japanese=True)
太郎 PROPN ═╗<════════╗ nsubj(主語)
は   ADP   <╝         ║ case(格表示)
花子 PROPN ═╗<══╗     ║ nsubj(主語)
が   ADP   <╝   ║     ║ case(格表示)
読ん VERB  ═╗═╗═╝<╗   ║ acl(連体修飾節)
で   SCONJ <╝ ║   ║   ║ mark(標識)
いる AUX   <══╝   ║   ║ aux(動詞補助成分)
本   NOUN  ═╗═════╝<╗ ║ obj(目的語)
を   ADP   <╝       ║ ║ case(格表示)
次郎 PROPN ═╗<╗     ║ ║ obl(斜格補語)
に   ADP   <╝ ║     ║ ║ case(格表示)
渡し VERB  ═╗═╝═════╝═╝ root(親)
た   AUX   <╝           aux(動詞補助成分)
```

`esupar.load(model)` loads a natural language processor pipeline, working on [Universal Dependencies](https://universaldependencies.org/format.html). Available `model` options are:

* `model="ja"` Japanese model [bert-base-japanese-upos](https://huggingface.co/KoichiYasuoka/bert-base-japanese-upos) (default)
* `model="ja_large"` Japanese model [bert-large-japanese-upos](https://huggingface.co/KoichiYasuoka/bert-large-japanese-upos)
* `model="ja_luw_small"` Japanese long-unit-word model [roberta-small-japanese-char-luw-upos](https://huggingface.co/KoichiYasuoka/roberta-small-japanese-char-luw-upos)
* `model="ja_luw_base"` Japanese long-unit-word model [bert-base-japanese-luw-upos](https://huggingface.co/KoichiYasuoka/bert-base-japanese-luw-upos)
* `model="ja_luw_large"` Japanese long-unit-word model [bert-large-japanese-luw-upos](https://huggingface.co/KoichiYasuoka/bert-large-japanese-luw-upos)
* `model="ko"` Korean model [roberta-base-korean-upos](https://huggingface.co/KoichiYasuoka/roberta-base-korean-upos)
* `model="ko_large"` Korean model [roberta-large-korean-upos](https://huggingface.co/KoichiYasuoka/roberta-large-korean-upos)
* `model="ko_morph_base"` Korean morpheme model [roberta-base-korean-morph-upos](https://huggingface.co/KoichiYasuoka/roberta-base-korean-morph-upos)
* `model="ko_morph_large"` Korean morpheme model [roberta-large-korean-morph-upos](https://huggingface.co/KoichiYasuoka/roberta-large-korean-morph-upos)
* `model="zh"` Chinese model [chinese-bert-wwm-ext-upos](https://huggingface.co/KoichiYasuoka/chinese-bert-wwm-ext-upos)
* `model="zh_base"` Chinese model [chinese-roberta-base-upos](https://huggingface.co/KoichiYasuoka/chinese-roberta-base-upos)
* `model="zh_large"` Chinese model [chinese-roberta-large-upos](https://huggingface.co/KoichiYasuoka/chinese-roberta-large-upos)
* `model="lzh"` Classical Chinese model [roberta-classical-chinese-base-upos](https://huggingface.co/KoichiYasuoka/roberta-classical-chinese-base-upos)
* `model="lzh_large"` Classical Chinese model [roberta-classical-chinese-large-upos](https://huggingface.co/KoichiYasuoka/roberta-classical-chinese-large-upos)
* `model="th"` Thai model [roberta-base-thai-spm-upos](https://huggingface.co/KoichiYasuoka/roberta-base-thai-spm-upos)
* `model="vi"` Vietnamese model [bert-base-vietnamese-upos](https://huggingface.co/KoichiYasuoka/bert-base-vietnamese-upos)
* `model="en"` English model [roberta-base-english-upos](https://huggingface.co/KoichiYasuoka/roberta-base-english-upos)
* `model="en_large"` English model [roberta-large-english-upos](https://huggingface.co/KoichiYasuoka/roberta-large-english-upos)
* `model="de"` German model [bert-base-german-upos](https://huggingface.co/KoichiYasuoka/bert-base-german-upos)
* `model="de_large"` German model [bert-large-german-upos](https://huggingface.co/KoichiYasuoka/bert-large-german-upos)
* `model="sr"` Serbian (Cyrillic and Latin) model [roberta-base-serbian-upos](https://huggingface.co/KoichiYasuoka/roberta-base-serbian-upos)
* `model="cop"` Coptic model [roberta-base-coptic-upos](https://huggingface.co/KoichiYasuoka/roberta-base-coptic-upos)
* `model="ain"` Ainu model [roberta-base-ainu-upos](https://huggingface.co/KoichiYasuoka/roberta-base-ainu-upos)

## Installation for Linux

```sh
pip3 install esupar --user
```

## Installation for Cygwin64

Make sure to get `python37-devel` `python37-pip` `python37-cython` `python37-numpy` `python37-wheel` `gcc-g++` `mingw64-x86_64-gcc-g++` `git` `curl` `make` `cmake`, and then:

```sh
curl -L https://raw.githubusercontent.com/KoichiYasuoka/CygTorch/master/installer/supar.sh | sh
pip3.7 install esupar
```

## Installation for Google Colaboratory

```py
!pip install esupar
```

Try [notebook](https://colab.research.google.com/github/KoichiYasuoka/esupar/blob/master/esupar.ipynb).

## Author

Koichi Yasuoka (安岡孝一)


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/KoichiYasuoka/esupar",
    "name": "esupar",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "NLP Japanese Korean Chinese Thai Vietnamese English German Serbian Coptic Ainu",
    "author": "Koichi Yasuoka",
    "author_email": "yasuoka@kanji.zinbun.kyoto-u.ac.jp",
    "download_url": "",
    "platform": null,
    "description": "[![Current PyPI packages](https://badge.fury.io/py/esupar.svg)](https://pypi.org/project/esupar/)\n\n# esupar\n\nTokenizer, POS-tagger, and dependency-parser with [Transformers](https://huggingface.co/transformers/) and [SuPar](https://pypi.org/project/supar/).\n\n## Basic usage\n\n```py\n>>> import esupar\n>>> nlp=esupar.load(\"ja\")\n>>> doc=nlp(\"\u592a\u90ce\u306f\u82b1\u5b50\u304c\u8aad\u3093\u3067\u3044\u308b\u672c\u3092\u6b21\u90ce\u306b\u6e21\u3057\u305f\")\n>>> print(doc)\n1\t\u592a\u90ce\t_\tPROPN\t_\t_\t12\tnsubj\t_\tSpaceAfter=No\n2\t\u306f\t_\tADP\t_\t_\t1\tcase\t_\tSpaceAfter=No\n3\t\u82b1\u5b50\t_\tPROPN\t_\t_\t5\tnsubj\t_\tSpaceAfter=No\n4\t\u304c\t_\tADP\t_\t_\t3\tcase\t_\tSpaceAfter=No\n5\t\u8aad\u3093\t_\tVERB\t_\t_\t8\tacl\t_\tSpaceAfter=No\n6\t\u3067\t_\tSCONJ\t_\t_\t5\tmark\t_\tSpaceAfter=No\n7\t\u3044\u308b\t_\tAUX\t_\t_\t5\taux\t_\tSpaceAfter=No\n8\t\u672c\t_\tNOUN\t_\t_\t12\tobj\t_\tSpaceAfter=No\n9\t\u3092\t_\tADP\t_\t_\t8\tcase\t_\tSpaceAfter=No\n10\t\u6b21\u90ce\t_\tPROPN\t_\t_\t12\tobl\t_\tSpaceAfter=No\n11\t\u306b\t_\tADP\t_\t_\t10\tcase\t_\tSpaceAfter=No\n12\t\u6e21\u3057\t_\tVERB\t_\t_\t0\troot\t_\tSpaceAfter=No\n13\t\u305f\t_\tAUX\t_\t_\t12\taux\t_\t_\n\n>>> import deplacy\n>>> deplacy.render(doc,Japanese=True)\n\u592a\u90ce PROPN \u2550\u2557<\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2557 nsubj(\u4e3b\u8a9e)\n\u306f   ADP   <\u255d         \u2551 case(\u683c\u8868\u793a)\n\u82b1\u5b50 PROPN \u2550\u2557<\u2550\u2550\u2557     \u2551 nsubj(\u4e3b\u8a9e)\n\u304c   ADP   <\u255d   \u2551     \u2551 case(\u683c\u8868\u793a)\n\u8aad\u3093 VERB  \u2550\u2557\u2550\u2557\u2550\u255d<\u2557   \u2551 acl(\u9023\u4f53\u4fee\u98fe\u7bc0)\n\u3067   SCONJ <\u255d \u2551   \u2551   \u2551 mark(\u6a19\u8b58)\n\u3044\u308b AUX   <\u2550\u2550\u255d   \u2551   \u2551 aux(\u52d5\u8a5e\u88dc\u52a9\u6210\u5206)\n\u672c   NOUN  \u2550\u2557\u2550\u2550\u2550\u2550\u2550\u255d<\u2557 \u2551 obj(\u76ee\u7684\u8a9e)\n\u3092   ADP   <\u255d       \u2551 \u2551 case(\u683c\u8868\u793a)\n\u6b21\u90ce PROPN \u2550\u2557<\u2557     \u2551 \u2551 obl(\u659c\u683c\u88dc\u8a9e)\n\u306b   ADP   <\u255d \u2551     \u2551 \u2551 case(\u683c\u8868\u793a)\n\u6e21\u3057 VERB  \u2550\u2557\u2550\u255d\u2550\u2550\u2550\u2550\u2550\u255d\u2550\u255d root(\u89aa)\n\u305f   AUX   <\u255d           aux(\u52d5\u8a5e\u88dc\u52a9\u6210\u5206)\n```\n\n`esupar.load(model)` loads a natural language processor pipeline, working on [Universal Dependencies](https://universaldependencies.org/format.html). Available `model` options are:\n\n* `model=\"ja\"` Japanese model [bert-base-japanese-upos](https://huggingface.co/KoichiYasuoka/bert-base-japanese-upos) (default)\n* `model=\"ja_large\"` Japanese model [bert-large-japanese-upos](https://huggingface.co/KoichiYasuoka/bert-large-japanese-upos)\n* `model=\"ja_luw_small\"` Japanese long-unit-word model [roberta-small-japanese-char-luw-upos](https://huggingface.co/KoichiYasuoka/roberta-small-japanese-char-luw-upos)\n* `model=\"ja_luw_base\"` Japanese long-unit-word model [bert-base-japanese-luw-upos](https://huggingface.co/KoichiYasuoka/bert-base-japanese-luw-upos)\n* `model=\"ja_luw_large\"` Japanese long-unit-word model [bert-large-japanese-luw-upos](https://huggingface.co/KoichiYasuoka/bert-large-japanese-luw-upos)\n* `model=\"ko\"` Korean model [roberta-base-korean-upos](https://huggingface.co/KoichiYasuoka/roberta-base-korean-upos)\n* `model=\"ko_large\"` Korean model [roberta-large-korean-upos](https://huggingface.co/KoichiYasuoka/roberta-large-korean-upos)\n* `model=\"ko_morph_base\"` Korean morpheme model [roberta-base-korean-morph-upos](https://huggingface.co/KoichiYasuoka/roberta-base-korean-morph-upos)\n* `model=\"ko_morph_large\"` Korean morpheme model [roberta-large-korean-morph-upos](https://huggingface.co/KoichiYasuoka/roberta-large-korean-morph-upos)\n* `model=\"zh\"` Chinese model [chinese-bert-wwm-ext-upos](https://huggingface.co/KoichiYasuoka/chinese-bert-wwm-ext-upos)\n* `model=\"zh_base\"` Chinese model [chinese-roberta-base-upos](https://huggingface.co/KoichiYasuoka/chinese-roberta-base-upos)\n* `model=\"zh_large\"` Chinese model [chinese-roberta-large-upos](https://huggingface.co/KoichiYasuoka/chinese-roberta-large-upos)\n* `model=\"lzh\"` Classical Chinese model [roberta-classical-chinese-base-upos](https://huggingface.co/KoichiYasuoka/roberta-classical-chinese-base-upos)\n* `model=\"lzh_large\"` Classical Chinese model [roberta-classical-chinese-large-upos](https://huggingface.co/KoichiYasuoka/roberta-classical-chinese-large-upos)\n* `model=\"th\"` Thai model [roberta-base-thai-spm-upos](https://huggingface.co/KoichiYasuoka/roberta-base-thai-spm-upos)\n* `model=\"vi\"` Vietnamese model [bert-base-vietnamese-upos](https://huggingface.co/KoichiYasuoka/bert-base-vietnamese-upos)\n* `model=\"en\"` English model [roberta-base-english-upos](https://huggingface.co/KoichiYasuoka/roberta-base-english-upos)\n* `model=\"en_large\"` English model [roberta-large-english-upos](https://huggingface.co/KoichiYasuoka/roberta-large-english-upos)\n* `model=\"de\"` German model [bert-base-german-upos](https://huggingface.co/KoichiYasuoka/bert-base-german-upos)\n* `model=\"de_large\"` German model [bert-large-german-upos](https://huggingface.co/KoichiYasuoka/bert-large-german-upos)\n* `model=\"sr\"` Serbian (Cyrillic and Latin) model [roberta-base-serbian-upos](https://huggingface.co/KoichiYasuoka/roberta-base-serbian-upos)\n* `model=\"cop\"` Coptic model [roberta-base-coptic-upos](https://huggingface.co/KoichiYasuoka/roberta-base-coptic-upos)\n* `model=\"ain\"` Ainu model [roberta-base-ainu-upos](https://huggingface.co/KoichiYasuoka/roberta-base-ainu-upos)\n\n## Installation for Linux\n\n```sh\npip3 install esupar --user\n```\n\n## Installation for Cygwin64\n\nMake sure to get `python37-devel` `python37-pip` `python37-cython` `python37-numpy` `python37-wheel` `gcc-g++` `mingw64-x86_64-gcc-g++` `git` `curl` `make` `cmake`, and then:\n\n```sh\ncurl -L https://raw.githubusercontent.com/KoichiYasuoka/CygTorch/master/installer/supar.sh | sh\npip3.7 install esupar\n```\n\n## Installation for Google Colaboratory\n\n```py\n!pip install esupar\n```\n\nTry [notebook](https://colab.research.google.com/github/KoichiYasuoka/esupar/blob/master/esupar.ipynb).\n\n## Author\n\nKoichi Yasuoka (\u5b89\u5ca1\u5b5d\u4e00)\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Tokenizer POS-tagger and Dependency-parser with BERT/RoBERTa/DeBERTa models for Japanese and other languages",
    "version": "1.7.0",
    "project_urls": {
        "Homepage": "https://github.com/KoichiYasuoka/esupar",
        "Source": "https://github.com/KoichiYasuoka/esupar",
        "Tracker": "https://github.com/KoichiYasuoka/esupar/issues"
    },
    "split_keywords": [
        "nlp",
        "japanese",
        "korean",
        "chinese",
        "thai",
        "vietnamese",
        "english",
        "german",
        "serbian",
        "coptic",
        "ainu"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7d3e3c79d72bf61ff90d301582b098a31f53877e1928d2c460554de01f141f30",
                "md5": "2a5e8397b82f0c167360ef7f71a5b4da",
                "sha256": "a92e2a5ebedb2f2abba674b2f69ba4037f522eff5de9bc421025f02073caf265"
            },
            "downloads": -1,
            "filename": "esupar-1.7.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2a5e8397b82f0c167360ef7f71a5b4da",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 59913,
            "upload_time": "2024-02-29T02:35:21",
            "upload_time_iso_8601": "2024-02-29T02:35:21.021675Z",
            "url": "https://files.pythonhosted.org/packages/7d/3e/3c79d72bf61ff90d301582b098a31f53877e1928d2c460554de01f141f30/esupar-1.7.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-29 02:35:21",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "KoichiYasuoka",
    "github_project": "esupar",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "esupar"
}
        
Elapsed time: 0.19543s