spacy-pythainlp

Name	spacy-pythainlp JSON
Version	0.1 JSON
	download
home_page	https://github.com/PyThaiNLP/spaCy-PyThaiNLP
Summary	PyThaiNLP For spaCy
upload_time	2023-01-03 15:30:29
maintainer
docs_url	None
author	Wannaphong Phatthiyaphaibun
requires_python	>=3.7
license	Apache Software License 2.0
keywords	pythainlp nlp natural language processing text analytics text processing localization computational linguistics thainlp thai nlp thai language
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # spaCy-PyThaiNLP
This package wraps the PyThaiNLP library to add support Thai for spaCy.

**Support List**
- Word segmentation
- Part-of-speech
- Named entity recognition
- Sentence segmentation
- Dependency parsing
- Word vector


## Install

> pip install spacy-pythainlp

## How to use


**Example**
```python
import spacy
import spacy_pythainlp.core

nlp = spacy.blank("th")
# Segment the Doc into sentences
nlp.add_pipe(
   "pythainlp", 
)

data=nlp("ผมเป็นคนไทย   แต่มะลิอยากไปโรงเรียนส่วนผมจะไปไหน  ผมอยากไปเที่ยว")
print(list(list(data.sents)))
# output: [ผมเป็นคนไทย   แต่มะลิอยากไปโรงเรียนส่วนผมจะไปไหน  , ผมอยากไปเที่ยว]
```

You can config the setting in the nlp.add_pipe.
```python
nlp.add_pipe(
    "pythainlp", 
    config={
        "pos_engine": "perceptron",
        "pos": True,
        "pos_corpus": "orchid_ud",
        "sent_engine": "crfcut",
        "sent": True,
        "ner_engine": "thainer",
        "ner": True,
        "tokenize_engine": "newmm",
        "tokenize": False,
        "dependency_parsing": False,
        "dependency_parsing_engine": "esupar",
        "dependency_parsing_model": None,
        "word_vector": True,
        "word_vector_model": "thai2fit_wv"
    }
)
```

- tokenize: Bool (True or False) to change the word tokenize. (the default spaCy is newmm of PyThaiNLP)
- tokenize_engine: The tokenize engine. You can read more: [Options for engine](https://pythainlp.github.io/docs/3.1/api/tokenize.html#pythainlp.tokenize.word_tokenize)
- sent: Bool (True or False) to turn on the sentence tokenizer.
- sent_engine: The sentence tokenizer engine. You can read more: [Options for engine](https://pythainlp.github.io/docs/3.1/api/tokenize.html#pythainlp.tokenize.sent_tokenize)
- pos:  Bool (True or False) to turn on the part-of-speech.
- pos_engine: The part-of-speech engine. You can read more: [Options for engine](https://pythainlp.github.io/docs/3.1/api/tag.html#pythainlp.tag.pos_tag)
- ner: Bool (True or False) to turn on the NER.
- ner_engine: The NER engine. You can read more: [Options for engine](https://pythainlp.github.io/docs/3.1/api/tag.html#pythainlp.tag.NER)
- dependency_parsing: Bool (True or False) to turn on the Dependency parsing.
- dependency_parsing_engine: The Dependency parsing engine. You can read more: [Options for engine](https://pythainlp.github.io/docs/3.1/api/parse.html#pythainlp.parse.dependency_parsing)
- dependency_parsing_model: The Dependency parsing model. You can read more: [Options for model](https://pythainlp.github.io/docs/3.1/api/parse.html#pythainlp.parse.dependency_parsing)
- word_vector: Bool (True or False) to turn on the word vector.
- word_vector_model: The word vector model. You can read more: [Options for model](https://pythainlp.github.io/docs/3.1/api/word_vector.html#pythainlp.word_vector.WordVector)

**Note: If you turn on Dependency parsing, word segmentation and sentence segmentation are turn off to use word segmentation and sentence segmentation from Dependency parsing.**

## License

```
   Copyright 2016-2023 PyThaiNLP Project

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/PyThaiNLP/spaCy-PyThaiNLP",
    "name": "spacy-pythainlp",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "pythainlp,NLP,natural language processing,text analytics,text processing,localization,computational linguistics,ThaiNLP,Thai NLP,Thai language",
    "author": "Wannaphong Phatthiyaphaibun",
    "author_email": "wannaphong@yahoo.com",
    "download_url": "https://files.pythonhosted.org/packages/cd/31/2e99b56c472413a9ff62817dde4f8325787c82503129ab5f42e615455873/spacy-pythainlp-0.1.tar.gz",
    "platform": null,
    "description": "# spaCy-PyThaiNLP\nThis package wraps the PyThaiNLP library to add support Thai for spaCy.\n\n**Support List**\n- Word segmentation\n- Part-of-speech\n- Named entity recognition\n- Sentence segmentation\n- Dependency parsing\n- Word vector\n\n\n## Install\n\n> pip install spacy-pythainlp\n\n## How to use\n\n\n**Example**\n```python\nimport spacy\nimport spacy_pythainlp.core\n\nnlp = spacy.blank(\"th\")\n# Segment the Doc into sentences\nnlp.add_pipe(\n   \"pythainlp\", \n)\n\ndata=nlp(\"\u0e1c\u0e21\u0e40\u0e1b\u0e47\u0e19\u0e04\u0e19\u0e44\u0e17\u0e22   \u0e41\u0e15\u0e48\u0e21\u0e30\u0e25\u0e34\u0e2d\u0e22\u0e32\u0e01\u0e44\u0e1b\u0e42\u0e23\u0e07\u0e40\u0e23\u0e35\u0e22\u0e19\u0e2a\u0e48\u0e27\u0e19\u0e1c\u0e21\u0e08\u0e30\u0e44\u0e1b\u0e44\u0e2b\u0e19  \u0e1c\u0e21\u0e2d\u0e22\u0e32\u0e01\u0e44\u0e1b\u0e40\u0e17\u0e35\u0e48\u0e22\u0e27\")\nprint(list(list(data.sents)))\n# output: [\u0e1c\u0e21\u0e40\u0e1b\u0e47\u0e19\u0e04\u0e19\u0e44\u0e17\u0e22   \u0e41\u0e15\u0e48\u0e21\u0e30\u0e25\u0e34\u0e2d\u0e22\u0e32\u0e01\u0e44\u0e1b\u0e42\u0e23\u0e07\u0e40\u0e23\u0e35\u0e22\u0e19\u0e2a\u0e48\u0e27\u0e19\u0e1c\u0e21\u0e08\u0e30\u0e44\u0e1b\u0e44\u0e2b\u0e19  , \u0e1c\u0e21\u0e2d\u0e22\u0e32\u0e01\u0e44\u0e1b\u0e40\u0e17\u0e35\u0e48\u0e22\u0e27]\n```\n\nYou can config the setting in the nlp.add_pipe.\n```python\nnlp.add_pipe(\n    \"pythainlp\", \n    config={\n        \"pos_engine\": \"perceptron\",\n        \"pos\": True,\n        \"pos_corpus\": \"orchid_ud\",\n        \"sent_engine\": \"crfcut\",\n        \"sent\": True,\n        \"ner_engine\": \"thainer\",\n        \"ner\": True,\n        \"tokenize_engine\": \"newmm\",\n        \"tokenize\": False,\n        \"dependency_parsing\": False,\n        \"dependency_parsing_engine\": \"esupar\",\n        \"dependency_parsing_model\": None,\n        \"word_vector\": True,\n        \"word_vector_model\": \"thai2fit_wv\"\n    }\n)\n```\n\n- tokenize: Bool (True or False) to change the word tokenize. (the default spaCy is newmm of PyThaiNLP)\n- tokenize_engine: The tokenize engine. You can read more: [Options for engine](https://pythainlp.github.io/docs/3.1/api/tokenize.html#pythainlp.tokenize.word_tokenize)\n- sent: Bool (True or False) to turn on the sentence tokenizer.\n- sent_engine: The sentence tokenizer engine. You can read more: [Options for engine](https://pythainlp.github.io/docs/3.1/api/tokenize.html#pythainlp.tokenize.sent_tokenize)\n- pos:  Bool (True or False) to turn on the part-of-speech.\n- pos_engine: The part-of-speech engine. You can read more: [Options for engine](https://pythainlp.github.io/docs/3.1/api/tag.html#pythainlp.tag.pos_tag)\n- ner: Bool (True or False) to turn on the NER.\n- ner_engine: The NER engine. You can read more: [Options for engine](https://pythainlp.github.io/docs/3.1/api/tag.html#pythainlp.tag.NER)\n- dependency_parsing: Bool (True or False) to turn on the Dependency parsing.\n- dependency_parsing_engine: The Dependency parsing engine. You can read more: [Options for engine](https://pythainlp.github.io/docs/3.1/api/parse.html#pythainlp.parse.dependency_parsing)\n- dependency_parsing_model: The Dependency parsing model. You can read more: [Options for model](https://pythainlp.github.io/docs/3.1/api/parse.html#pythainlp.parse.dependency_parsing)\n- word_vector: Bool (True or False) to turn on the word vector.\n- word_vector_model: The word vector model. You can read more: [Options for model](https://pythainlp.github.io/docs/3.1/api/word_vector.html#pythainlp.word_vector.WordVector)\n\n**Note: If you turn on Dependency parsing, word segmentation and sentence segmentation are turn off to use word segmentation and sentence segmentation from Dependency parsing.**\n\n## License\n\n```\n   Copyright 2016-2023 PyThaiNLP Project\n\n   Licensed under the Apache License, Version 2.0 (the \"License\");\n   you may not use this file except in compliance with the License.\n   You may obtain a copy of the License at\n\n       http://www.apache.org/licenses/LICENSE-2.0\n\n   Unless required by applicable law or agreed to in writing, software\n   distributed under the License is distributed on an \"AS IS\" BASIS,\n   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n   See the License for the specific language governing permissions and\n   limitations under the License.\n```\n",
    "bugtrack_url": null,
    "license": "Apache Software License 2.0",
    "summary": "PyThaiNLP For spaCy",
    "version": "0.1",
    "split_keywords": [
        "pythainlp",
        "nlp",
        "natural language processing",
        "text analytics",
        "text processing",
        "localization",
        "computational linguistics",
        "thainlp",
        "thai nlp",
        "thai language"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fe83c221fa4fcd5b6eb5448e50ff818260ac9bc879814711e78e1eb49089a46e",
                "md5": "ab00e1f38a7d91d02f254db97f2fdece",
                "sha256": "3094bb8ebb2a9a7bea003ecd5220f57007ed9f9b58d180d46dfebb36c5dc715d"
            },
            "downloads": -1,
            "filename": "spacy_pythainlp-0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ab00e1f38a7d91d02f254db97f2fdece",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 9111,
            "upload_time": "2023-01-03T15:30:28",
            "upload_time_iso_8601": "2023-01-03T15:30:28.559145Z",
            "url": "https://files.pythonhosted.org/packages/fe/83/c221fa4fcd5b6eb5448e50ff818260ac9bc879814711e78e1eb49089a46e/spacy_pythainlp-0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cd312e99b56c472413a9ff62817dde4f8325787c82503129ab5f42e615455873",
                "md5": "ce9b14006d5ef64529573fabd30a9d7c",
                "sha256": "d60cc0593b1abcf6fbc84403ba4747ae92ffb8e434261d64200c6e5e594dd931"
            },
            "downloads": -1,
            "filename": "spacy-pythainlp-0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "ce9b14006d5ef64529573fabd30a9d7c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 8523,
            "upload_time": "2023-01-03T15:30:29",
            "upload_time_iso_8601": "2023-01-03T15:30:29.798803Z",
            "url": "https://files.pythonhosted.org/packages/cd/31/2e99b56c472413a9ff62817dde4f8325787c82503129ab5f42e615455873/spacy-pythainlp-0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-03 15:30:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "PyThaiNLP",
    "github_project": "spaCy-PyThaiNLP",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "spacy-pythainlp"
}

Wannaphong Phatthiyaphaibun