# EntropyBasedKeyPhraseExtraction
This is the official implementation of the EntropyRank key phrase extractor from https://openreview.net/forum?id=WCTtOfIhsJ. Please cite the paper and star this repo if you find EntropyRank useful! Thanks!
```
@inproceedings{
tsvetkov2023entropyrank,
title={EntropyRank: Unsupervised Keyphrase Extraction via Side-Information Optimization for Language Model-based Text Compression},
author={Alexander Tsvetkov and Alon Kipnis},
booktitle={ICML 2023 Workshop Neural Compression: From Information Theory to Applications},
year={2023},
url={https://openreview.net/forum?id=WCTtOfIhsJ}
}
```
## Installation
To install directly:
```
pip install entropyrank
```
To install from repository, from src/entropyrank run:
```
pip install -r requirements.txt
```
You also need to download the 'en_core_web_sm' model for spaCy, which can be done by running:
```
spacy download en_core_web_sm
```
## Usage
To use the package, import `EntropyRank` from the module and create an instance of it:
```python
from entropyrank import EntropyRank
extractor = EntropyRank()
```
Then, you can extract key phrases from a given text using the `extract_key_phrases` method:
```python
phrases = extractor.extract_key_phrases(
text=text,
number_of_key_phrases=3,
)
```
The parameters of the `extract_key_phrases` method are:
- `text`: the input text to extract key phrases from.
- `number_of_key_phrases`: the number of key phrases to extract.
- `exclude_start_words_count`: the number of words to exclude from the start of each key phrase when calculating its entropy.
- `partition_method`: can be STOP_WORDS or NOUN_PHRASES, decides how to partition the candidates.
- `ranking_method`: can be FIRST_WORD_ENTROPY or SUM_ENTROPY, whether to use the sum of entropy of the phrase or just the entropy of the first word
- `normalize_by_word_statistics`: a boolean indicating whether we want to normalize the entropy values by entropy statistics of word position.
- `remove_personal_names`: a boolean indicating whether to remove personal names from the evaluations or not.
## Evaluation Demo
You can run the evaluation_demo notebook included in this repository under src/eval to get the benchmark results on common key phrase extraction tasks reported in the paper.
Make sure to run pip install -r evaluation-requirements.txt beforehand
Raw data
{
"_id": null,
"home_page": "",
"name": "entropyrank",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "",
"keywords": "EntropyRank,entropy,keyphrase extraction,keywords extraction",
"author": "",
"author_email": "Alexander Tsvetkov <tsalex1992@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/01/39/8c941b1db2bbacb8e67463dfc24664596246b0b3e5a1b05f96bb320bf007/entropyrank-1.0.3.tar.gz",
"platform": null,
"description": "# EntropyBasedKeyPhraseExtraction\nThis is the official implementation of the EntropyRank key phrase extractor from https://openreview.net/forum?id=WCTtOfIhsJ. Please cite the paper and star this repo if you find EntropyRank useful! Thanks!\n\n```\n@inproceedings{\ntsvetkov2023entropyrank,\ntitle={EntropyRank: Unsupervised Keyphrase Extraction via Side-Information Optimization for Language Model-based Text Compression},\nauthor={Alexander Tsvetkov and Alon Kipnis},\nbooktitle={ICML 2023 Workshop Neural Compression: From Information Theory to Applications},\nyear={2023},\nurl={https://openreview.net/forum?id=WCTtOfIhsJ}\n}\n```\n## Installation\n\nTo install directly:\n\n```\npip install entropyrank\n```\n\nTo install from repository, from src/entropyrank run:\n\n```\npip install -r requirements.txt\n```\n\nYou also need to download the 'en_core_web_sm' model for spaCy, which can be done by running:\n\n```\nspacy download en_core_web_sm\n```\n\n## Usage\n\nTo use the package, import `EntropyRank` from the module and create an instance of it:\n\n```python\nfrom entropyrank import EntropyRank\n\nextractor = EntropyRank()\n```\n\nThen, you can extract key phrases from a given text using the `extract_key_phrases` method:\n\n```python\nphrases = extractor.extract_key_phrases(\n text=text,\n number_of_key_phrases=3,\n)\n```\n\nThe parameters of the `extract_key_phrases` method are:\n\n- `text`: the input text to extract key phrases from.\n- `number_of_key_phrases`: the number of key phrases to extract.\n- `exclude_start_words_count`: the number of words to exclude from the start of each key phrase when calculating its entropy.\n- `partition_method`: can be STOP_WORDS or NOUN_PHRASES, decides how to partition the candidates.\n- `ranking_method`: can be FIRST_WORD_ENTROPY or SUM_ENTROPY, whether to use the sum of entropy of the phrase or just the entropy of the first word\n- `normalize_by_word_statistics`: a boolean indicating whether we want to normalize the entropy values by entropy statistics of word position.\n- `remove_personal_names`: a boolean indicating whether to remove personal names from the evaluations or not.\n\n## Evaluation Demo\n\nYou can run the evaluation_demo notebook included in this repository under src/eval to get the benchmark results on common key phrase extraction tasks reported in the paper.\nMake sure to run pip install -r evaluation-requirements.txt beforehand\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Entropy Rank keyphrase extractor",
"version": "1.0.3",
"project_urls": {
"Bug Tracker": "https://github.com/tsalex1992/EntropyRank/issues",
"Homepage": "https://github.com/tsalex1992/EntropyRank"
},
"split_keywords": [
"entropyrank",
"entropy",
"keyphrase extraction",
"keywords extraction"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "bb0a0d72593a0ffafb50107e3a1672228116460915d6bf9604492cf76f34c682",
"md5": "5682da0311923c16c8e9335bf6755ea6",
"sha256": "a7ca9497be3028e6902ffbf73a60f474da1fbfe4699a5616587b1597afc9b3c5"
},
"downloads": -1,
"filename": "entropyrank-1.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "5682da0311923c16c8e9335bf6755ea6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 40928,
"upload_time": "2023-12-20T15:14:12",
"upload_time_iso_8601": "2023-12-20T15:14:12.123136Z",
"url": "https://files.pythonhosted.org/packages/bb/0a/0d72593a0ffafb50107e3a1672228116460915d6bf9604492cf76f34c682/entropyrank-1.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "01398c941b1db2bbacb8e67463dfc24664596246b0b3e5a1b05f96bb320bf007",
"md5": "7d4e88eeb013cec50e9fada5d8e32a24",
"sha256": "e3ce0da8d3fef36e1ce849f681a447dca6cb9e0e75e1aca81a1ea56c42262cde"
},
"downloads": -1,
"filename": "entropyrank-1.0.3.tar.gz",
"has_sig": false,
"md5_digest": "7d4e88eeb013cec50e9fada5d8e32a24",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 50655,
"upload_time": "2023-12-20T15:14:14",
"upload_time_iso_8601": "2023-12-20T15:14:14.171402Z",
"url": "https://files.pythonhosted.org/packages/01/39/8c941b1db2bbacb8e67463dfc24664596246b0b3e5a1b05f96bb320bf007/entropyrank-1.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-20 15:14:14",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "tsalex1992",
"github_project": "EntropyRank",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "entropyrank"
}