Name | swt-nlp JSON |
Version |
0.0.57
JSON |
| download |
home_page | |
Summary | simple nlp pipeline |
upload_time | 2023-02-02 08:43:58 |
maintainer | |
docs_url | None |
author | @muuusiiik |
requires_python | >=3.7 |
license | Apache Software License 2.0 |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# SWT-NLP PACKAGE
### PACKAGE INSTALLATION
``` shell
pip install swt-nlp
```
### KEYTERM EXTRACTION
#### OBJECTIVE
* extract new keyterm from corpus
#### DEMO
demo code for keyterm extraction
``` python
from swt.nlp.basis import keyterm_extractor
from tests.keyterm_extraction.test_keyterm_extraction_fit_modules import Mockup
# corpus in format list of plain text
small_content = Mockup.small_corpus()
# small_content[:5] = [
# 'อยากกระโดดน้ำที่แม่น้ำโขง',
# 'แม่น้ำที่จังหวัดกาญจนบุรีนี่สุดยอดมาก',
# 'เหล้ามีหลายยี่ห้อ แสงโสม แม่น้ำโขงหรืออะไรก็มีหมดเลย',
# 'ข้อความนี้เกี่ยวกับชิมช็อปใช้',
# 'รัฐบาลผลักดันชิมช็อปใช้มากขึ้น']
# extract new terms
kt = keyterm_extractor()
# - in case of using a custom tokenizer
# - this example is using word_tokenizer of pythainlp with keep_whitespace=False setting
# custom_tokenizer = lambda t: word_tokenize(t, keep_whitespace=False) # your own callable tokenizer function
# kt = keyterm_extractor(tokenizer=custom_tokenizer)
kt.fit(small_content)
new_terms = kt.extract()
# new_terms = ['ชิมช็อปใช้', 'แม่น้ำโขง']
```
### HOW TO BUILD A PACKAGE TO PYPI
prerequisite
``` shell
pip install setuptools wheel tqdm twine
```
build and upload package
``` shell
# preparing tar.gz package
python setup.py sdist
# uploading package to pypi server
python -m twine upload dist/{package.tar.gz} --verbose
```
install package
``` shell
# install latest version
pip install swt-nlp --upgrade
# specific version with no cache
pip install swt-nlp==0.0.11 --no-cache-dir
```
install package by wheel
``` shell
# build wheel
python setup.py bdist_wheel
# install package by wheel
# use --force-reinstall if needed
pip install dist/{package.whl}
```
Raw data
{
"_id": null,
"home_page": "",
"name": "swt-nlp",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "",
"keywords": "",
"author": "@muuusiiik",
"author_email": "muuusiiikd@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/de/bb/c3a81fe353c73d7d7239c67f0b88a49adda27738e59afedc878af61ff1bb/swt-nlp-0.0.57.tar.gz",
"platform": null,
"description": "# SWT-NLP PACKAGE\n### PACKAGE INSTALLATION\n``` shell\npip install swt-nlp\n```\n\n### KEYTERM EXTRACTION\n#### OBJECTIVE\n* extract new keyterm from corpus\n\n#### DEMO\ndemo code for keyterm extraction\n``` python\nfrom swt.nlp.basis import keyterm_extractor\nfrom tests.keyterm_extraction.test_keyterm_extraction_fit_modules import Mockup\n\n# corpus in format list of plain text\nsmall_content = Mockup.small_corpus()\n# small_content[:5] = [\n# '\u0e2d\u0e22\u0e32\u0e01\u0e01\u0e23\u0e30\u0e42\u0e14\u0e14\u0e19\u0e49\u0e33\u0e17\u0e35\u0e48\u0e41\u0e21\u0e48\u0e19\u0e49\u0e33\u0e42\u0e02\u0e07',\n# '\u0e41\u0e21\u0e48\u0e19\u0e49\u0e33\u0e17\u0e35\u0e48\u0e08\u0e31\u0e07\u0e2b\u0e27\u0e31\u0e14\u0e01\u0e32\u0e0d\u0e08\u0e19\u0e1a\u0e38\u0e23\u0e35\u0e19\u0e35\u0e48\u0e2a\u0e38\u0e14\u0e22\u0e2d\u0e14\u0e21\u0e32\u0e01',\n# '\u0e40\u0e2b\u0e25\u0e49\u0e32\u0e21\u0e35\u0e2b\u0e25\u0e32\u0e22\u0e22\u0e35\u0e48\u0e2b\u0e49\u0e2d \u0e41\u0e2a\u0e07\u0e42\u0e2a\u0e21 \u0e41\u0e21\u0e48\u0e19\u0e49\u0e33\u0e42\u0e02\u0e07\u0e2b\u0e23\u0e37\u0e2d\u0e2d\u0e30\u0e44\u0e23\u0e01\u0e47\u0e21\u0e35\u0e2b\u0e21\u0e14\u0e40\u0e25\u0e22',\n# '\u0e02\u0e49\u0e2d\u0e04\u0e27\u0e32\u0e21\u0e19\u0e35\u0e49\u0e40\u0e01\u0e35\u0e48\u0e22\u0e27\u0e01\u0e31\u0e1a\u0e0a\u0e34\u0e21\u0e0a\u0e47\u0e2d\u0e1b\u0e43\u0e0a\u0e49',\n# '\u0e23\u0e31\u0e10\u0e1a\u0e32\u0e25\u0e1c\u0e25\u0e31\u0e01\u0e14\u0e31\u0e19\u0e0a\u0e34\u0e21\u0e0a\u0e47\u0e2d\u0e1b\u0e43\u0e0a\u0e49\u0e21\u0e32\u0e01\u0e02\u0e36\u0e49\u0e19']\n\n# extract new terms\nkt = keyterm_extractor()\n# - in case of using a custom tokenizer\n# - this example is using word_tokenizer of pythainlp with keep_whitespace=False setting\n# custom_tokenizer = lambda t: word_tokenize(t, keep_whitespace=False) # your own callable tokenizer function \n# kt = keyterm_extractor(tokenizer=custom_tokenizer)\nkt.fit(small_content)\nnew_terms = kt.extract()\n# new_terms = ['\u0e0a\u0e34\u0e21\u0e0a\u0e47\u0e2d\u0e1b\u0e43\u0e0a\u0e49', '\u0e41\u0e21\u0e48\u0e19\u0e49\u0e33\u0e42\u0e02\u0e07']\n```\n\n\n### HOW TO BUILD A PACKAGE TO PYPI\nprerequisite\n``` shell\npip install setuptools wheel tqdm twine\n```\n\nbuild and upload package\n``` shell\n# preparing tar.gz package \npython setup.py sdist\n# uploading package to pypi server\npython -m twine upload dist/{package.tar.gz} --verbose\n```\n\ninstall package\n``` shell\n# install latest version\npip install swt-nlp --upgrade\n# specific version with no cache\npip install swt-nlp==0.0.11 --no-cache-dir\n```\n\ninstall package by wheel\n``` shell\n# build wheel \npython setup.py bdist_wheel\n\n# install package by wheel \n# use --force-reinstall if needed\npip install dist/{package.whl}\n```\n\n\n\n",
"bugtrack_url": null,
"license": "Apache Software License 2.0",
"summary": "simple nlp pipeline",
"version": "0.0.57",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "debbc3a81fe353c73d7d7239c67f0b88a49adda27738e59afedc878af61ff1bb",
"md5": "126950da74959429bc549a239f67fac7",
"sha256": "9f3c5a7fc0717730ef0f629631605aad1571b7996646a6fb9e0c8302c4b5296f"
},
"downloads": -1,
"filename": "swt-nlp-0.0.57.tar.gz",
"has_sig": false,
"md5_digest": "126950da74959429bc549a239f67fac7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 26634,
"upload_time": "2023-02-02T08:43:58",
"upload_time_iso_8601": "2023-02-02T08:43:58.424316Z",
"url": "https://files.pythonhosted.org/packages/de/bb/c3a81fe353c73d7d7239c67f0b88a49adda27738e59afedc878af61ff1bb/swt-nlp-0.0.57.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-02-02 08:43:58",
"github": false,
"gitlab": false,
"bitbucket": false,
"lcname": "swt-nlp"
}