entropyrank


Nameentropyrank JSON
Version 1.0.3 PyPI version JSON
download
home_page
SummaryEntropy Rank keyphrase extractor
upload_time2023-12-20 15:14:14
maintainer
docs_urlNone
author
requires_python>=3.10
licenseApache-2.0
keywords entropyrank entropy keyphrase extraction keywords extraction
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # EntropyBasedKeyPhraseExtraction
This is the official implementation of the EntropyRank key phrase extractor from https://openreview.net/forum?id=WCTtOfIhsJ. Please cite the paper and star this repo if you find EntropyRank useful! Thanks!

```
@inproceedings{
tsvetkov2023entropyrank,
title={EntropyRank: Unsupervised Keyphrase Extraction via Side-Information Optimization for Language Model-based Text Compression},
author={Alexander Tsvetkov and Alon Kipnis},
booktitle={ICML 2023 Workshop Neural Compression: From Information Theory to Applications},
year={2023},
url={https://openreview.net/forum?id=WCTtOfIhsJ}
}
```
## Installation

To install directly:

```
pip install entropyrank
```

To install from repository, from src/entropyrank run:

```
pip install -r requirements.txt
```

You also need to download the 'en_core_web_sm' model for spaCy, which can be done by running:

```
spacy download en_core_web_sm
```

## Usage

To use the package, import `EntropyRank` from the module and create an instance of it:

```python
from entropyrank import EntropyRank

extractor = EntropyRank()
```

Then, you can extract key phrases from a given text using the `extract_key_phrases` method:

```python
phrases = extractor.extract_key_phrases(
    text=text,
    number_of_key_phrases=3,
)
```

The parameters of the `extract_key_phrases` method are:

- `text`: the input text to extract key phrases from.
- `number_of_key_phrases`: the number of key phrases to extract.
- `exclude_start_words_count`: the number of words to exclude from the start of each key phrase when calculating its entropy.
- `partition_method`: can be STOP_WORDS or NOUN_PHRASES, decides how to partition the candidates.
- `ranking_method`: can be FIRST_WORD_ENTROPY or SUM_ENTROPY, whether to use the sum of entropy of the phrase or just the entropy of the first word
- `normalize_by_word_statistics`: a boolean indicating whether we want to normalize the entropy values by entropy statistics of word position.
- `remove_personal_names`: a boolean indicating whether to remove personal names from the evaluations or not.

## Evaluation Demo

You can run the evaluation_demo notebook included in this repository under src/eval to get the benchmark results on common key phrase extraction tasks reported in the paper.
Make sure to run pip install -r evaluation-requirements.txt beforehand

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "entropyrank",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "EntropyRank,entropy,keyphrase extraction,keywords extraction",
    "author": "",
    "author_email": "Alexander Tsvetkov <tsalex1992@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/01/39/8c941b1db2bbacb8e67463dfc24664596246b0b3e5a1b05f96bb320bf007/entropyrank-1.0.3.tar.gz",
    "platform": null,
    "description": "# EntropyBasedKeyPhraseExtraction\nThis is the official implementation of the EntropyRank key phrase extractor from https://openreview.net/forum?id=WCTtOfIhsJ. Please cite the paper and star this repo if you find EntropyRank useful! Thanks!\n\n```\n@inproceedings{\ntsvetkov2023entropyrank,\ntitle={EntropyRank: Unsupervised Keyphrase Extraction via Side-Information Optimization for Language Model-based Text Compression},\nauthor={Alexander Tsvetkov and Alon Kipnis},\nbooktitle={ICML 2023 Workshop Neural Compression: From Information Theory to Applications},\nyear={2023},\nurl={https://openreview.net/forum?id=WCTtOfIhsJ}\n}\n```\n## Installation\n\nTo install directly:\n\n```\npip install entropyrank\n```\n\nTo install from repository, from src/entropyrank run:\n\n```\npip install -r requirements.txt\n```\n\nYou also need to download the 'en_core_web_sm' model for spaCy, which can be done by running:\n\n```\nspacy download en_core_web_sm\n```\n\n## Usage\n\nTo use the package, import `EntropyRank` from the module and create an instance of it:\n\n```python\nfrom entropyrank import EntropyRank\n\nextractor = EntropyRank()\n```\n\nThen, you can extract key phrases from a given text using the `extract_key_phrases` method:\n\n```python\nphrases = extractor.extract_key_phrases(\n    text=text,\n    number_of_key_phrases=3,\n)\n```\n\nThe parameters of the `extract_key_phrases` method are:\n\n- `text`: the input text to extract key phrases from.\n- `number_of_key_phrases`: the number of key phrases to extract.\n- `exclude_start_words_count`: the number of words to exclude from the start of each key phrase when calculating its entropy.\n- `partition_method`: can be STOP_WORDS or NOUN_PHRASES, decides how to partition the candidates.\n- `ranking_method`: can be FIRST_WORD_ENTROPY or SUM_ENTROPY, whether to use the sum of entropy of the phrase or just the entropy of the first word\n- `normalize_by_word_statistics`: a boolean indicating whether we want to normalize the entropy values by entropy statistics of word position.\n- `remove_personal_names`: a boolean indicating whether to remove personal names from the evaluations or not.\n\n## Evaluation Demo\n\nYou can run the evaluation_demo notebook included in this repository under src/eval to get the benchmark results on common key phrase extraction tasks reported in the paper.\nMake sure to run pip install -r evaluation-requirements.txt beforehand\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Entropy Rank keyphrase extractor",
    "version": "1.0.3",
    "project_urls": {
        "Bug Tracker": "https://github.com/tsalex1992/EntropyRank/issues",
        "Homepage": "https://github.com/tsalex1992/EntropyRank"
    },
    "split_keywords": [
        "entropyrank",
        "entropy",
        "keyphrase extraction",
        "keywords extraction"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bb0a0d72593a0ffafb50107e3a1672228116460915d6bf9604492cf76f34c682",
                "md5": "5682da0311923c16c8e9335bf6755ea6",
                "sha256": "a7ca9497be3028e6902ffbf73a60f474da1fbfe4699a5616587b1597afc9b3c5"
            },
            "downloads": -1,
            "filename": "entropyrank-1.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5682da0311923c16c8e9335bf6755ea6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 40928,
            "upload_time": "2023-12-20T15:14:12",
            "upload_time_iso_8601": "2023-12-20T15:14:12.123136Z",
            "url": "https://files.pythonhosted.org/packages/bb/0a/0d72593a0ffafb50107e3a1672228116460915d6bf9604492cf76f34c682/entropyrank-1.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "01398c941b1db2bbacb8e67463dfc24664596246b0b3e5a1b05f96bb320bf007",
                "md5": "7d4e88eeb013cec50e9fada5d8e32a24",
                "sha256": "e3ce0da8d3fef36e1ce849f681a447dca6cb9e0e75e1aca81a1ea56c42262cde"
            },
            "downloads": -1,
            "filename": "entropyrank-1.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "7d4e88eeb013cec50e9fada5d8e32a24",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 50655,
            "upload_time": "2023-12-20T15:14:14",
            "upload_time_iso_8601": "2023-12-20T15:14:14.171402Z",
            "url": "https://files.pythonhosted.org/packages/01/39/8c941b1db2bbacb8e67463dfc24664596246b0b3e5a1b05f96bb320bf007/entropyrank-1.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-20 15:14:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "tsalex1992",
    "github_project": "EntropyRank",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "entropyrank"
}
        
Elapsed time: 0.54334s