multicombo


Namemulticombo JSON
Version 0.8.6 PyPI version JSON
download
home_pagehttps://github.com/KoichiYasuoka/MultiCOMBO
SummaryMultilingual POS-tagger and Dependency-parser
upload_time2025-07-13 08:22:12
maintainerNone
docs_urlNone
authorKoichi Yasuoka
requires_python>=3.6
licenseGPL
keywords nlp multilingual
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            [![Current PyPI packages](https://badge.fury.io/py/multicombo.svg)](https://pypi.org/project/multicombo/)

# MultiCOMBO

Multilingual POS-Tagger and Dependency-Parser with [COMBO-pytorch](https://gitlab.clarin-pl.eu/syntactic-tools/combo) and [spaCy](https://spacy.io)

## Basic usage

```py
>>> import multicombo
>>> nlp=multicombo.load()
>>> doc=nlp('Who plays "La vie en rose"?')
>>> print(multicombo.to_conllu(doc))
# text = Who plays "La vie en rose"?
1	Who	_	PRON	_	PronType=Int	2	nsubj	_	Translit=who
2	plays	_	VERB	_	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	0	root	_	_
3	"	_	PUNCT	_	_	5	punct	_	SpaceAfter=No
4	La	_	DET	_	Definite=Def|Gender=Fem|Number=Sing|PronType=Art	5	det	_	Translit=la
5	vie	_	NOUN	_	Gender=Fem|Number=Sing	2	obj	_	_
6	en	_	ADP	_	_	7	case	_	_
7	rose	_	NOUN	_	Number=Sing	5	nmod	_	SpaceAfter=No
8	"	_	PUNCT	_	_	5	punct	_	SpaceAfter=No
9	?	_	PUNCT	_	_	2	punct	_	SpaceAfter=No

>>> import deplacy
>>> deplacy.render(doc)
Who   PRON  <════════════╗   nsubj
plays VERB  ═══════════╗═╝═╗ ROOT
"     PUNCT <══════╗   ║   ║ punct
La    DET   <════╗ ║   ║   ║ det
vie   NOUN  ═══╗═╝═╝═╗<╝   ║ obj
en    ADP   <╗ ║     ║     ║ case
rose  NOUN  ═╝<╝     ║     ║ nmod
"     PUNCT <════════╝     ║ punct
?     PUNCT <══════════════╝ punct

>>> deplacy.serve(doc)
http://127.0.0.1:5000
```
![trial.svg](https://raw.githubusercontent.com/KoichiYasuoka/MultiCOMBO/main/trial.png)
`multicombo.load(lang="xx")` loads spaCy Language pipeline with [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) and `spacy.lang.xx.MultiLanguage` tokenizer. Other language specific tokenizers can be loaded with the option `lang`, while several languages require additional packages:
* `lang="ja"` Japanese requires [SudachiPy](https://pypi.org/project/SudachiPy/) and [SudachiDict-core](https://pypi.org/project/SudachiDict-core/).
* `lang="th"` Thai requires [PyThaiNLP](https://pypi.org/project/pythainlp/).
* `lang="vi"` Vietnamese requires [pyvi](https://pypi.org/project/pyvi/).

## Installation for Linux

```sh
pip3 install allennlp@git+https://github.com/allenai/allennlp
pip3 install 'transformers<4.31'
pip3 install multicombo
```

## Installation for Cygwin64

Make sure to get `python37-devel` `python37-pip` `python37-cython` `python37-numpy` `python37-cffi` `gcc-g++` `mingw64-x86_64-gcc-g++` `gcc-fortran` `git` `curl` `make` `cmake` `libopenblas` `liblapack-devel` `libhdf5-devel` `libfreetype-devel` `libuv-devel` packages, and then:
```sh
curl -L https://raw.githubusercontent.com/KoichiYasuoka/UniDic-COMBO/master/cygwin64.sh | sh
pip3.7 install multicombo
```

## Installation for Jupyter Notebook (Google Colaboratory)

Try [notebook](https://colab.research.google.com/github/KoichiYasuoka/MultiCOMBO/blob/main/multicombo.ipynb).


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/KoichiYasuoka/MultiCOMBO",
    "name": "multicombo",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "NLP Multilingual",
    "author": "Koichi Yasuoka",
    "author_email": "yasuoka@kanji.zinbun.kyoto-u.ac.jp",
    "download_url": null,
    "platform": null,
    "description": "[![Current PyPI packages](https://badge.fury.io/py/multicombo.svg)](https://pypi.org/project/multicombo/)\n\n# MultiCOMBO\n\nMultilingual POS-Tagger and Dependency-Parser with [COMBO-pytorch](https://gitlab.clarin-pl.eu/syntactic-tools/combo) and [spaCy](https://spacy.io)\n\n## Basic usage\n\n```py\n>>> import multicombo\n>>> nlp=multicombo.load()\n>>> doc=nlp('Who plays \"La vie en rose\"?')\n>>> print(multicombo.to_conllu(doc))\n# text = Who plays \"La vie en rose\"?\n1\tWho\t_\tPRON\t_\tPronType=Int\t2\tnsubj\t_\tTranslit=who\n2\tplays\t_\tVERB\t_\tMood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin\t0\troot\t_\t_\n3\t\"\t_\tPUNCT\t_\t_\t5\tpunct\t_\tSpaceAfter=No\n4\tLa\t_\tDET\t_\tDefinite=Def|Gender=Fem|Number=Sing|PronType=Art\t5\tdet\t_\tTranslit=la\n5\tvie\t_\tNOUN\t_\tGender=Fem|Number=Sing\t2\tobj\t_\t_\n6\ten\t_\tADP\t_\t_\t7\tcase\t_\t_\n7\trose\t_\tNOUN\t_\tNumber=Sing\t5\tnmod\t_\tSpaceAfter=No\n8\t\"\t_\tPUNCT\t_\t_\t5\tpunct\t_\tSpaceAfter=No\n9\t?\t_\tPUNCT\t_\t_\t2\tpunct\t_\tSpaceAfter=No\n\n>>> import deplacy\n>>> deplacy.render(doc)\nWho   PRON  <\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2557   nsubj\nplays VERB  \u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2557\u2550\u255d\u2550\u2557 ROOT\n\"     PUNCT <\u2550\u2550\u2550\u2550\u2550\u2550\u2557   \u2551   \u2551 punct\nLa    DET   <\u2550\u2550\u2550\u2550\u2557 \u2551   \u2551   \u2551 det\nvie   NOUN  \u2550\u2550\u2550\u2557\u2550\u255d\u2550\u255d\u2550\u2557<\u255d   \u2551 obj\nen    ADP   <\u2557 \u2551     \u2551     \u2551 case\nrose  NOUN  \u2550\u255d<\u255d     \u2551     \u2551 nmod\n\"     PUNCT <\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u255d     \u2551 punct\n?     PUNCT <\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u255d punct\n\n>>> deplacy.serve(doc)\nhttp://127.0.0.1:5000\n```\n![trial.svg](https://raw.githubusercontent.com/KoichiYasuoka/MultiCOMBO/main/trial.png)\n`multicombo.load(lang=\"xx\")` loads spaCy Language pipeline with [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) and `spacy.lang.xx.MultiLanguage` tokenizer. Other language specific tokenizers can be loaded with the option `lang`, while several languages require additional packages:\n* `lang=\"ja\"` Japanese requires [SudachiPy](https://pypi.org/project/SudachiPy/) and [SudachiDict-core](https://pypi.org/project/SudachiDict-core/).\n* `lang=\"th\"` Thai requires [PyThaiNLP](https://pypi.org/project/pythainlp/).\n* `lang=\"vi\"` Vietnamese requires [pyvi](https://pypi.org/project/pyvi/).\n\n## Installation for Linux\n\n```sh\npip3 install allennlp@git+https://github.com/allenai/allennlp\npip3 install 'transformers<4.31'\npip3 install multicombo\n```\n\n## Installation for Cygwin64\n\nMake sure to get `python37-devel` `python37-pip` `python37-cython` `python37-numpy` `python37-cffi` `gcc-g++` `mingw64-x86_64-gcc-g++` `gcc-fortran` `git` `curl` `make` `cmake` `libopenblas` `liblapack-devel` `libhdf5-devel` `libfreetype-devel` `libuv-devel` packages, and then:\n```sh\ncurl -L https://raw.githubusercontent.com/KoichiYasuoka/UniDic-COMBO/master/cygwin64.sh | sh\npip3.7 install multicombo\n```\n\n## Installation for Jupyter Notebook (Google Colaboratory)\n\nTry [notebook](https://colab.research.google.com/github/KoichiYasuoka/MultiCOMBO/blob/main/multicombo.ipynb).\n\n",
    "bugtrack_url": null,
    "license": "GPL",
    "summary": "Multilingual POS-tagger and Dependency-parser",
    "version": "0.8.6",
    "project_urls": {
        "COMBO-pytorch": "https://gitlab.clarin-pl.eu/syntactic-tools/combo",
        "Homepage": "https://github.com/KoichiYasuoka/MultiCOMBO",
        "Source": "https://github.com/KoichiYasuoka/MultiCOMBO",
        "Tracker": "https://github.com/KoichiYasuoka/MultiCOMBO/issues"
    },
    "split_keywords": [
        "nlp",
        "multilingual"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1e8aaf1e954c43c3ccdc17a8ed5d936e691675941594e61c115ceee1e0e35b9b",
                "md5": "b742619e8935ef42114a051fc09da393",
                "sha256": "ed024eac37c93d61ddc4bb777ee565815b4ad6defe1c7fdc4146d1234fda5936"
            },
            "downloads": -1,
            "filename": "multicombo-0.8.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b742619e8935ef42114a051fc09da393",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 16779,
            "upload_time": "2025-07-13T08:22:12",
            "upload_time_iso_8601": "2025-07-13T08:22:12.035114Z",
            "url": "https://files.pythonhosted.org/packages/1e/8a/af1e954c43c3ccdc17a8ed5d936e691675941594e61c115ceee1e0e35b9b/multicombo-0.8.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-13 08:22:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "KoichiYasuoka",
    "github_project": "MultiCOMBO",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "multicombo"
}
        
Elapsed time: 0.42197s