# Perke
[![tests](https://github.com/alirezatheh/perke/workflows/tests/badge.svg)](https://github.com/alirezatheh/perke/actions/workflows/tests.yaml)
[![pre-commit.ci](https://results.pre-commit.ci/badge/github/AlirezaTheH/perke/main.svg)](https://results.pre-commit.ci/latest/github/alirezatheh/perke/main)
[![PyPI Version](https://img.shields.io/pypi/v/perke)](https://pypi.python.org/pypi/perke)
[![Python Versions](https://img.shields.io/pypi/pyversions/perke)](https://pypi.org/project/perke)
[![Documentation Status](https://readthedocs.org/projects/perke/badge/?version=stable)](https://perke.readthedocs.io/en/stable/?badge=stable)
Perke is a Python keyphrase extraction package for Persian language. It
provides an end-to-end keyphrase extraction pipeline in which each component
can be easily modified or extended to develop new models.
## Installation
- The easiest way to install is from PyPI:
```bash
pip install perke
```
Alternatively, you can install directly from GitHub:
```bash
pip install git+https://github.com/alirezatheh/perke.git
```
- Perke also requires a trained POS tagger model. We use
[Hazm's](https://github.com/roshan-research/hazm) POS tagger model. You can
easily download latest [Hazm's](https://github.com/roshan-research/hazm) POS
tagger using the following command:
```bash
python -m perke download
```
Alternatively, you can use another model with same tag names and structure,
and put it in the
[`resources`](https://github.com/alirezatheh/perke/tree/main/perke/resources)
directory.
## Simple Example
Perke provides a standardized API for extracting keyphrases from a text. Start
by typing the 4 lines below to use `TextRank` keyphrase extractor.
```python
from perke.unsupervised.graph_based import TextRank
# 1. Create a TextRank extractor.
extractor = TextRank()
# 2. Load the text.
extractor.load_text(input='text or path/to/input_file')
# 3. Build the graph representation of the text and weight the
# words. Keyphrase candidates are composed of the 33 percent
# highest weighted words.
extractor.weight_candidates(top_t_percent=0.33)
# 4. Get the 10 highest weighted candidates as keyphrases.
keyphrases = extractor.get_n_best(n=10)
```
For more in depth examples see the
[`examples`](https://github.com/alirezatheh/perke/tree/main/examples)
directory.
## Documentation
Documentation and references are available at
[Read The Docs](https://perke.readthedocs.io).
## Implemented Models
Perke currently, implements the following keyphrase extraction models:
- Unsupervised models
- Graph-based models
- TextRank: [article](http://www.aclweb.org/anthology/W04-3252.pdf)
by Mihalcea and Tarau, 2004
- SingleRank: [article](https://www.aaai.org/Papers/AAAI/2008/AAAI08-136.pdf)
by Wan and Xiao, 2008
- TopicRank: [article](http://aclweb.org/anthology/I13-1062.pdf)
by Bougouin, Boudin and Daille, 2013
- PositionRank: [article](http://www.aclweb.org/anthology/P17-1102.pdf)
by Florescu and Caragea, 2017
- MultipartiteRank: [article](https://www.aclweb.org/anthology/N18-2105.pdf)
by Boudin, 2018
## Acknowledgements
Perke is inspired by [pke](https://github.com/boudinfl/pke).
Raw data
{
"_id": null,
"home_page": "https://github.com/alirezatheh/perke",
"name": "perke",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "nlp,natural-language-processing,information-retrieval,computational-linguistics,persian-language,persian-nlp,persian,keyphrase-extraction,keyphrase-extractor,keyphrase,keyword-extraction,keyword-extractor,keyword,machine-learning,ml,unsupervised-learning",
"author": "Alireza Hosseini",
"author_email": "alirezatheh@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/33/a3/49f2b59bed4f550b0275de5bfc3bbb6c8f143ba648fe187a881cf30bc0f9/perke-0.4.4.tar.gz",
"platform": null,
"description": "# Perke\n[![tests](https://github.com/alirezatheh/perke/workflows/tests/badge.svg)](https://github.com/alirezatheh/perke/actions/workflows/tests.yaml)\n[![pre-commit.ci](https://results.pre-commit.ci/badge/github/AlirezaTheH/perke/main.svg)](https://results.pre-commit.ci/latest/github/alirezatheh/perke/main)\n[![PyPI Version](https://img.shields.io/pypi/v/perke)](https://pypi.python.org/pypi/perke)\n[![Python Versions](https://img.shields.io/pypi/pyversions/perke)](https://pypi.org/project/perke)\n[![Documentation Status](https://readthedocs.org/projects/perke/badge/?version=stable)](https://perke.readthedocs.io/en/stable/?badge=stable)\n\nPerke is a Python keyphrase extraction package for Persian language. It\nprovides an end-to-end keyphrase extraction pipeline in which each component\ncan be easily modified or extended to develop new models.\n\n## Installation\n- The easiest way to install is from PyPI:\n ```bash\n pip install perke\n ```\n Alternatively, you can install directly from GitHub:\n ```bash\n pip install git+https://github.com/alirezatheh/perke.git\n ```\n- Perke also requires a trained POS tagger model. We use\n [Hazm's](https://github.com/roshan-research/hazm) POS tagger model. You can\n easily download latest [Hazm's](https://github.com/roshan-research/hazm) POS\n tagger using the following command:\n ```bash\n python -m perke download\n ```\n Alternatively, you can use another model with same tag names and structure,\n and put it in the\n [`resources`](https://github.com/alirezatheh/perke/tree/main/perke/resources)\n directory.\n\n## Simple Example\nPerke provides a standardized API for extracting keyphrases from a text. Start\nby typing the 4 lines below to use `TextRank` keyphrase extractor.\n\n\n```python\nfrom perke.unsupervised.graph_based import TextRank\n\n# 1. Create a TextRank extractor.\nextractor = TextRank()\n\n# 2. Load the text.\nextractor.load_text(input='text or path/to/input_file')\n\n# 3. Build the graph representation of the text and weight the\n# words. Keyphrase candidates are composed of the 33 percent\n# highest weighted words.\nextractor.weight_candidates(top_t_percent=0.33)\n\n# 4. Get the 10 highest weighted candidates as keyphrases.\nkeyphrases = extractor.get_n_best(n=10)\n```\n\nFor more in depth examples see the\n[`examples`](https://github.com/alirezatheh/perke/tree/main/examples)\ndirectory.\n\n## Documentation\nDocumentation and references are available at\n[Read The Docs](https://perke.readthedocs.io).\n\n## Implemented Models\nPerke currently, implements the following keyphrase extraction models:\n\n- Unsupervised models\n - Graph-based models\n - TextRank: [article](http://www.aclweb.org/anthology/W04-3252.pdf)\n by Mihalcea and Tarau, 2004\n - SingleRank: [article](https://www.aaai.org/Papers/AAAI/2008/AAAI08-136.pdf)\n by Wan and Xiao, 2008\n - TopicRank: [article](http://aclweb.org/anthology/I13-1062.pdf)\n by Bougouin, Boudin and Daille, 2013\n - PositionRank: [article](http://www.aclweb.org/anthology/P17-1102.pdf)\n by Florescu and Caragea, 2017\n - MultipartiteRank: [article](https://www.aclweb.org/anthology/N18-2105.pdf)\n by Boudin, 2018\n\n## Acknowledgements\nPerke is inspired by [pke](https://github.com/boudinfl/pke).\n",
"bugtrack_url": null,
"license": "",
"summary": "A keyphrase extractor for Persian",
"version": "0.4.4",
"project_urls": {
"Bug Tracker": "https://github.com/alirezatheh/perke/issues",
"Documentation": "https://perke.readthedocs.io",
"Homepage": "https://github.com/alirezatheh/perke",
"Source Code": "https://github.com/alirezatheh/perke"
},
"split_keywords": [
"nlp",
"natural-language-processing",
"information-retrieval",
"computational-linguistics",
"persian-language",
"persian-nlp",
"persian",
"keyphrase-extraction",
"keyphrase-extractor",
"keyphrase",
"keyword-extraction",
"keyword-extractor",
"keyword",
"machine-learning",
"ml",
"unsupervised-learning"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a25715d359c899837adfd6482b48dd6d7fdda46bd1e537859c5b136b5afb798d",
"md5": "4552cd02e9c49d84a3966825d09413bb",
"sha256": "dc8f0777079e77e0b09ed4842b2e5632316548c6a52858de9d992cd7936008a2"
},
"downloads": -1,
"filename": "perke-0.4.4-py3-none-any.whl",
"has_sig": false,
"md5_digest": "4552cd02e9c49d84a3966825d09413bb",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 25258,
"upload_time": "2023-06-25T09:51:54",
"upload_time_iso_8601": "2023-06-25T09:51:54.796793Z",
"url": "https://files.pythonhosted.org/packages/a2/57/15d359c899837adfd6482b48dd6d7fdda46bd1e537859c5b136b5afb798d/perke-0.4.4-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "33a349f2b59bed4f550b0275de5bfc3bbb6c8f143ba648fe187a881cf30bc0f9",
"md5": "ba7197beff7ae59a0253793e0b368f08",
"sha256": "a2277223d68d51e4a70ebf1ed0d7b91f6804c05e66c623750e7cc2ecddcc8617"
},
"downloads": -1,
"filename": "perke-0.4.4.tar.gz",
"has_sig": false,
"md5_digest": "ba7197beff7ae59a0253793e0b368f08",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 20086,
"upload_time": "2023-06-25T09:52:07",
"upload_time_iso_8601": "2023-06-25T09:52:07.502062Z",
"url": "https://files.pythonhosted.org/packages/33/a3/49f2b59bed4f550b0275de5bfc3bbb6c8f143ba648fe187a881cf30bc0f9/perke-0.4.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-06-25 09:52:07",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "alirezatheh",
"github_project": "perke",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "perke"
}