kwextractor


Namekwextractor JSON
Version 0.0.7 PyPI version JSON
download
home_page
SummaryExtract keywords for vietnamese text.
upload_time2023-07-12 13:20:42
maintainer
docs_urlNone
authorTrinh Do Duy Hung
requires_python
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Welcome to Keywords Extractor 🐣
This is a simple library for extracting keywords from a text. It is based on the [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) algorithm for extracting keywords. Beside, it also uses the [YAKE](https://github.com/LIAAD/yake) and [RapidFuzz](https://github.com/maxbachmann/RapidFuzz) libraries.

Fast and easy to use. But it still works better than the other libraries I've tried. I think so  (●´ω`●).

If you have any complain, my mail here  (>‘o’)>: trinhhungsss492@gmail.com. Or my facebook here  (。◕‿◕。): https://www.facebook.com/trinhdoduyhungss.


## Installation
Download and install through pip with wheel support:
```bash
pip install kwextractor
```
## Usage
```python
from kwextractor.process.extract_keywords import ExtractKeywords
from kwextractor.process.extract_numverse import ExtractNumverse
from kwextractor.process.replacing_w2n import ReplacingWtoN
keywords = ExtractKeywords().extract_keywords("tôi thích nghe các bản nhạc của Trịnh Công Sơn")
print(keywords) # "bản nhạc,Trịnh Công Sơn"
num_verse = ExtractNumverse().extract_numverse("sinh cho tui bài thơ gồm hai chục câu nhé",20) #20 is the maximum value returned. It can be any integer number, you can set it fit your need.
print(num_verse) # 20
replacing_w2n = ReplacingWtoN().replacing_w2n("cho hỏi làm sao để sinh ra mười bài thơ")
print(replacing_w2n) # "cho hỏi làm sao để sinh ra 10 bài thơ"
keywords = ExtractKeywords().extract_keywords("Tổng thống Mỹ Donald Trump đã đề nghị các nước thành viên NATO tăng cường đầu tư trong lĩnh vực an ninh, đặc biệt là trong lĩnh vực phòng chống tấn công từ các quốc gia có thể xâm nhập vào các thành phố của các nước thành viên. Đây là lần đầu tiên tổng thống Mỹ đề nghị các nước thành viên NATO tăng cường đầu tư trong lĩnh vực an ninh.")
print(keywords) # "cường đầu,quốc gia,xâm nhập,ninh đặc,an ninh,Donald Trump,Tổng thống lĩnh vực phòng chống tấn công,NATO"
```

🤘 Version v0.0.3:  Customize is available now🤘


## Customize
```python
from kwextractor.process.extract_keywords import ExtractKeywords
text = "tôi thích nghe các bản nhạc của Trịnh Công Sơn"
fake_data = {
    "author": [
        "Trịnh Thăng Bình",
        "Lê Bảo Bình",
        "Phan Mạnh Quỳnh",
        "Karik",
        "Ngô Kiến Huy",
        "Chí Tâm",
        "Trang Yue",
        "B Ray",
        "ERIK",
        "Emcee L (Da LAB)",
        "Badbies",
        "Vũ",
        "Sơn Tùng M-TP"
    ]
}
kw = ExtractKeywords(lan='vi', data_keywords=fake_data, return_group=True) # all parameters: data_keywords, lan, ngram, stop_words
print(kw.extract_keywords(text)) #{'author': ['bản nhạc', 'Trịnh Công Sơn']}
```


## Features


| Feature | Description                                                                                                                                                       | Available at version |
| --- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|
|🍎 **Extract keywords from a sentence** | Extract keywords from a sentence. If the sentence has more than one keyword, the keywords will be separated by a comma. And empty if the sentence has no keyword. | ✅ v0.0.1 ⇪           |
|🍎 **Extract keywords from a paragraph** | Extract keywords from a paragraph and return a list of keywords                                                                                                   | ✅ v0.0.2 ⇪           |
|🍎 **Extract num-string from a sentence** | Extract num-string (number as text) from a sentence. Only return 1 number as a integer in a sentence.                                                             | ✅ v0.0.1 ⇪           |
|🍎 **Replace num-string with a number** | Replace num-string with a number in the sentence.                                                                                                                 | ✅ v0.0.1 ⇪           |

## Development
### Install dependencies
```bash
pip install -r requirements.txt
```
### Build
```bash
python setup.py bdist_wheel
```
### Test
```bash
pytest
```

Any question? (ு८ு) 
```
_/﹋\_
(҂`_´)
<,︻╦╤─ ҉ – – 🍎
_/﹋\_
```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "kwextractor",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Trinh Do Duy Hung",
    "author_email": "trinhhungsss492@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/65/ab/19b0c3cdadf05523d97c25c78dd2ad966470f9c7d9ba2b9e9b83577e0983/kwextractor-0.0.7.tar.gz",
    "platform": null,
    "description": "# Welcome to Keywords Extractor \ud83d\udc23\r\nThis is a simple library for extracting keywords from a text. It is based on the [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) algorithm for extracting keywords. Beside, it also uses the [YAKE](https://github.com/LIAAD/yake) and [RapidFuzz](https://github.com/maxbachmann/RapidFuzz) libraries.\r\n\r\nFast and easy to use. But it still works better than the other libraries I've tried. I think so  (\u25cf\u00b4\u03c9\uff40\u25cf).\r\n\r\nIf you have any complain, my mail here  (>\u2018o\u2019)>: trinhhungsss492@gmail.com. Or my facebook here  (\uff61\u25d5\u203f\u25d5\uff61): https://www.facebook.com/trinhdoduyhungss.\r\n\r\n\r\n## Installation\r\nDownload and install through pip with wheel support:\r\n```bash\r\npip install kwextractor\r\n```\r\n## Usage\r\n```python\r\nfrom kwextractor.process.extract_keywords import ExtractKeywords\r\nfrom kwextractor.process.extract_numverse import ExtractNumverse\r\nfrom kwextractor.process.replacing_w2n import ReplacingWtoN\r\nkeywords = ExtractKeywords().extract_keywords(\"t\u00f4i th\u00edch nghe c\u00e1c b\u1ea3n nh\u1ea1c c\u1ee7a Tr\u1ecbnh C\u00f4ng S\u01a1n\")\r\nprint(keywords) # \"b\u1ea3n nh\u1ea1c,Tr\u1ecbnh C\u00f4ng S\u01a1n\"\r\nnum_verse = ExtractNumverse().extract_numverse(\"sinh cho tui b\u00e0i th\u01a1 g\u1ed3m hai ch\u1ee5c c\u00e2u nh\u00e9\",20) #20 is the maximum value returned. It can be any integer number, you can set it fit your need.\r\nprint(num_verse) # 20\r\nreplacing_w2n = ReplacingWtoN().replacing_w2n(\"cho h\u1ecfi l\u00e0m sao \u0111\u1ec3 sinh ra m\u01b0\u1eddi b\u00e0i th\u01a1\")\r\nprint(replacing_w2n) # \"cho h\u1ecfi l\u00e0m sao \u0111\u1ec3 sinh ra 10 b\u00e0i th\u01a1\"\r\nkeywords = ExtractKeywords().extract_keywords(\"T\u1ed5ng th\u1ed1ng M\u1ef9 Donald Trump \u0111\u00e3 \u0111\u1ec1 ngh\u1ecb c\u00e1c n\u01b0\u1edbc th\u00e0nh vi\u00ean NATO t\u0103ng c\u01b0\u1eddng \u0111\u1ea7u t\u01b0 trong l\u0129nh v\u1ef1c an ninh, \u0111\u1eb7c bi\u1ec7t l\u00e0 trong l\u0129nh v\u1ef1c ph\u00f2ng ch\u1ed1ng t\u1ea5n c\u00f4ng t\u1eeb c\u00e1c qu\u1ed1c gia c\u00f3 th\u1ec3 x\u00e2m nh\u1eadp v\u00e0o c\u00e1c th\u00e0nh ph\u1ed1 c\u1ee7a c\u00e1c n\u01b0\u1edbc th\u00e0nh vi\u00ean. \u0110\u00e2y l\u00e0 l\u1ea7n \u0111\u1ea7u ti\u00ean t\u1ed5ng th\u1ed1ng M\u1ef9 \u0111\u1ec1 ngh\u1ecb c\u00e1c n\u01b0\u1edbc th\u00e0nh vi\u00ean NATO t\u0103ng c\u01b0\u1eddng \u0111\u1ea7u t\u01b0 trong l\u0129nh v\u1ef1c an ninh.\")\r\nprint(keywords) # \"c\u01b0\u1eddng \u0111\u1ea7u,qu\u1ed1c gia,x\u00e2m nh\u1eadp,ninh \u0111\u1eb7c,an ninh,Donald Trump,T\u1ed5ng th\u1ed1ng l\u0129nh v\u1ef1c ph\u00f2ng ch\u1ed1ng t\u1ea5n c\u00f4ng,NATO\"\r\n```\r\n\r\n\ud83e\udd18 Version v0.0.3:  Customize is available now\ud83e\udd18\r\n\r\n\r\n## Customize\r\n```python\r\nfrom kwextractor.process.extract_keywords import ExtractKeywords\r\ntext = \"t\u00f4i th\u00edch nghe c\u00e1c b\u1ea3n nh\u1ea1c c\u1ee7a Tr\u1ecbnh C\u00f4ng S\u01a1n\"\r\nfake_data = {\r\n    \"author\": [\r\n        \"Tr\u1ecbnh Th\u0103ng B\u00ecnh\",\r\n        \"L\u00ea B\u1ea3o B\u00ecnh\",\r\n        \"Phan M\u1ea1nh Qu\u1ef3nh\",\r\n        \"Karik\",\r\n        \"Ng\u00f4 Ki\u1ebfn Huy\",\r\n        \"Ch\u00ed T\u00e2m\",\r\n        \"Trang Yue\",\r\n        \"B Ray\",\r\n        \"ERIK\",\r\n        \"Emcee L (Da LAB)\",\r\n        \"Badbies\",\r\n        \"V\u0169\",\r\n        \"S\u01a1n T\u00f9ng M-TP\"\r\n    ]\r\n}\r\nkw = ExtractKeywords(lan='vi', data_keywords=fake_data, return_group=True) # all parameters: data_keywords, lan, ngram, stop_words\r\nprint(kw.extract_keywords(text)) #{'author': ['b\u1ea3n nh\u1ea1c', 'Tr\u1ecbnh C\u00f4ng S\u01a1n']}\r\n```\r\n\r\n\r\n## Features\r\n\r\n\r\n| Feature | Description                                                                                                                                                       | Available at version |\r\n| --- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|\r\n|\ud83c\udf4e **Extract keywords from a sentence** | Extract keywords from a sentence. If the sentence has more than one keyword, the keywords will be separated by a comma. And empty if the sentence has no keyword. | \u2705 v0.0.1 \u21ea           |\r\n|\ud83c\udf4e **Extract keywords from a paragraph** | Extract keywords from a paragraph and return a list of keywords                                                                                                   | \u2705 v0.0.2 \u21ea           |\r\n|\ud83c\udf4e **Extract num-string from a sentence** | Extract num-string (number as text) from a sentence. Only return 1 number as a integer in a sentence.                                                             | \u2705 v0.0.1 \u21ea           |\r\n|\ud83c\udf4e **Replace num-string with a number** | Replace num-string with a number in the sentence.                                                                                                                 | \u2705 v0.0.1 \u21ea           |\r\n\r\n## Development\r\n### Install dependencies\r\n```bash\r\npip install -r requirements.txt\r\n```\r\n### Build\r\n```bash\r\npython setup.py bdist_wheel\r\n```\r\n### Test\r\n```bash\r\npytest\r\n```\r\n\r\nAny question? (\u0bc1\u096e\u0bc1) \r\n```\r\n_/\ufe4b\\_\r\n(\u0482`_\u00b4)\r\n<,\ufe3b\u2566\u2564\u2500 \u0489 \u2013 \u2013 \ud83c\udf4e\r\n_/\ufe4b\\_\r\n```\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Extract keywords for vietnamese text.",
    "version": "0.0.7",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2331c4bd9ee23f27041dd48ac793f76636e1f135e9575ca5121e2cdd79861d67",
                "md5": "c76d0a701da1648c4187dd0e168ed581",
                "sha256": "705dd555bc0213d11df5e9a2449a7f180f7f7a514928951ccc0a2e104bee9989"
            },
            "downloads": -1,
            "filename": "kwextractor-0.0.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c76d0a701da1648c4187dd0e168ed581",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 81534,
            "upload_time": "2023-07-12T13:20:40",
            "upload_time_iso_8601": "2023-07-12T13:20:40.123780Z",
            "url": "https://files.pythonhosted.org/packages/23/31/c4bd9ee23f27041dd48ac793f76636e1f135e9575ca5121e2cdd79861d67/kwextractor-0.0.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "65ab19b0c3cdadf05523d97c25c78dd2ad966470f9c7d9ba2b9e9b83577e0983",
                "md5": "9803a706e8b74befede8a7c578090c5d",
                "sha256": "21c1e5d30172e6e33a283d71f7e6a10061531907829fc6be189e4fb16d6dd326"
            },
            "downloads": -1,
            "filename": "kwextractor-0.0.7.tar.gz",
            "has_sig": false,
            "md5_digest": "9803a706e8b74befede8a7c578090c5d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 81727,
            "upload_time": "2023-07-12T13:20:42",
            "upload_time_iso_8601": "2023-07-12T13:20:42.270396Z",
            "url": "https://files.pythonhosted.org/packages/65/ab/19b0c3cdadf05523d97c25c78dd2ad966470f9c7d9ba2b9e9b83577e0983/kwextractor-0.0.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-07-12 13:20:42",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "kwextractor"
}
        
Elapsed time: 0.23779s