rake-nltk

Name	rake-nltk JSON
Version	1.0.6 JSON
	download
home_page	https://csurfer.github.io/rake-nltk
Summary	RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.
upload_time	2021-09-15 05:13:18
maintainer
docs_url	None
author	csurfer
requires_python	>=3.6,<4.0
license	MIT
keywords	nlp text-mining algorithms development
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            rake-nltk
=========

|pypiv| |pyv| |Licence| |Build Status| |Coverage Status|

RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain
independent keyword extraction algorithm which tries to determine key
phrases in a body of text by analyzing the frequency of word appearance
and its co-occurance with other words in the text.

|Demo|

Features
--------

* Ridiculously simple interface.
* Configurable word and sentence tokenizers, language based stop words etc
* Configurable ranking metric.

Setup
-----

Using pip
~~~~~~~~~

.. code:: bash

    pip install rake-nltk

Directly from the repository
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: bash

    git clone https://github.com/csurfer/rake-nltk.git
    python rake-nltk/setup.py install

Quick Start
-----------

.. code:: python

    from rake_nltk import Rake

    # Uses stopwords for english from NLTK, and all puntuation characters by
    # default
    r = Rake()

    # Extraction given the text.
    r.extract_keywords_from_text(<text to process>)

    # Extraction given the list of strings where each string is a sentence.
    r.extract_keywords_from_sentences(<list of sentences>)

    # To get keyword phrases ranked highest to lowest.
    r.get_ranked_phrases()

    # To get keyword phrases ranked highest to lowest with scores.
    r.get_ranked_phrases_with_scores()

Debugging Setup
---------------

If you see a stopwords error, it means that you do not have the corpus
`stopwords` downloaded from NLTK. You can download it using command below.

.. code:: bash

    python -c "import nltk; nltk.download('stopwords')"

References
----------

This is a python implementation of the algorithm as mentioned in paper
`Automatic keyword extraction from individual documents by Stuart Rose,
Dave Engel, Nick Cramer and Wendy Cowley`_

Why I chose to implement it myself?
-----------------------------------

-  It is extremely fun to implement algorithms by reading papers. It is
   the digital equivalent of DIY kits.
-  There are some rather popular implementations out there, in python(\ `aneesha/RAKE`_) and
   node(\ `waseem18/node-rake`_) but neither seemed to use the power of `NLTK`_. By making NLTK
   an integral part of the implementation I get the flexibility and power to extend it in other
   creative ways, if I see fit later, without having to implement everything myself.
-  I plan to use it in my other pet projects to come and wanted it to be
   modular and tunable and this way I have complete control.

Contributing
------------

Bug Reports and Feature Requests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Please use `issue tracker`_ for reporting bugs or feature requests.

Development
~~~~~~~~~~~

1. Checkout the repository.
2. Make your changes and add/update relavent tests.
3. Install **`poetry`** using **`pip install poetry`**.
4. Run **`poetry install`** to create project's virtual environment.
5. Run tests using **`poetry run tox`** (Any python versions which you don't have checked out will fail this). Fix failing tests and repeat.
6. Make documentation changes that are relavant.
7. Install **`pre-commit`** using **`pip install pre-commit`** and run **`pre-commit run --all-files`** to do lint checks.
8. Generate documentation using **`poetry run sphinx-build -b html docs/ docs/_build/html`**.
9. Generate **`requirements.txt`** for automated testing using **`poetry export --dev --without-hashes -f requirements.txt > requirements.txt`**.
10. Commit the changes and raise a pull request.

Buy the developer a cup of coffee!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you found the utility helpful you can buy me a cup of coffee using

|Donate|

.. |Donate| image:: https://www.paypalobjects.com/webstatic/en_US/i/btn/png/silver-pill-paypal-44px.png
   :target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=3BSBW7D45C4YN&lc=US&currency_code=USD&bn=PP%2dDonationsBF%3abtn_donate_SM%2egif%3aNonHosted

.. _Automatic keyword extraction from individual documents by Stuart Rose, Dave Engel, Nick Cramer and Wendy Cowley: https://www.researchgate.net/profile/Stuart_Rose/publication/227988510_Automatic_Keyword_Extraction_from_Individual_Documents/links/55071c570cf27e990e04c8bb.pdf
.. _aneesha/RAKE: https://github.com/aneesha/RAKE
.. _waseem18/node-rake: https://github.com/waseem18/node-rake
.. _NLTK: http://www.nltk.org/
.. _issue tracker: https://github.com/csurfer/rake-nltk/issues

.. |Build Status| image:: https://github.com/csurfer/rake-nltk/actions/workflows/pytest.yml/badge.svg
   :target: https://github.com/csurfer/rake-nltk/actions
.. |Licence| image:: https://img.shields.io/badge/license-MIT-blue.svg
   :target: https://raw.githubusercontent.com/csurfer/rake-nltk/master/LICENSE
.. |Coverage Status| image:: https://codecov.io/gh/csurfer/rake-nltk/branch/master/graph/badge.svg?token=ghRhWVec9X
   :target: https://codecov.io/gh/csurfer/rake-nltk
.. |Demo| image:: http://i.imgur.com/wVOzU7y.gif
.. |pypiv| image:: https://img.shields.io/pypi/v/rake-nltk.svg
   :target: https://pypi.python.org/pypi/rake-nltk
.. |pyv| image:: https://img.shields.io/pypi/pyversions/rake-nltk.svg
   :target: https://pypi.python.org/pypi/rake-nltk

Raw data

            {
    "_id": null,
    "home_page": "https://csurfer.github.io/rake-nltk",
    "name": "rake-nltk",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6,<4.0",
    "maintainer_email": "",
    "keywords": "nlp,text-mining,algorithms,development",
    "author": "csurfer",
    "author_email": "sharma.vishwas88@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/da/b1/53392b9ba76fdb1e9de3198f63eb1cb92529c80201e0709162d140134b30/rake-nltk-1.0.6.tar.gz",
    "platform": "",
    "description": "rake-nltk\n=========\n\n|pypiv| |pyv| |Licence| |Build Status| |Coverage Status|\n\nRAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain\nindependent keyword extraction algorithm which tries to determine key\nphrases in a body of text by analyzing the frequency of word appearance\nand its co-occurance with other words in the text.\n\n|Demo|\n\nFeatures\n--------\n\n* Ridiculously simple interface.\n* Configurable word and sentence tokenizers, language based stop words etc\n* Configurable ranking metric.\n\nSetup\n-----\n\nUsing pip\n~~~~~~~~~\n\n.. code:: bash\n\n    pip install rake-nltk\n\nDirectly from the repository\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code:: bash\n\n    git clone https://github.com/csurfer/rake-nltk.git\n    python rake-nltk/setup.py install\n\nQuick Start\n-----------\n\n.. code:: python\n\n    from rake_nltk import Rake\n\n    # Uses stopwords for english from NLTK, and all puntuation characters by\n    # default\n    r = Rake()\n\n    # Extraction given the text.\n    r.extract_keywords_from_text(<text to process>)\n\n    # Extraction given the list of strings where each string is a sentence.\n    r.extract_keywords_from_sentences(<list of sentences>)\n\n    # To get keyword phrases ranked highest to lowest.\n    r.get_ranked_phrases()\n\n    # To get keyword phrases ranked highest to lowest with scores.\n    r.get_ranked_phrases_with_scores()\n\nDebugging Setup\n---------------\n\nIf you see a stopwords error, it means that you do not have the corpus\n`stopwords` downloaded from NLTK. You can download it using command below.\n\n.. code:: bash\n\n    python -c \"import nltk; nltk.download('stopwords')\"\n\nReferences\n----------\n\nThis is a python implementation of the algorithm as mentioned in paper\n`Automatic keyword extraction from individual documents by Stuart Rose,\nDave Engel, Nick Cramer and Wendy Cowley`_\n\nWhy I chose to implement it myself?\n-----------------------------------\n\n-  It is extremely fun to implement algorithms by reading papers. It is\n   the digital equivalent of DIY kits.\n-  There are some rather popular implementations out there, in python(\\ `aneesha/RAKE`_) and\n   node(\\ `waseem18/node-rake`_) but neither seemed to use the power of `NLTK`_. By making NLTK\n   an integral part of the implementation I get the flexibility and power to extend it in other\n   creative ways, if I see fit later, without having to implement everything myself.\n-  I plan to use it in my other pet projects to come and wanted it to be\n   modular and tunable and this way I have complete control.\n\nContributing\n------------\n\nBug Reports and Feature Requests\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nPlease use `issue tracker`_ for reporting bugs or feature requests.\n\nDevelopment\n~~~~~~~~~~~\n\n1. Checkout the repository.\n2. Make your changes and add/update relavent tests.\n3. Install **`poetry`** using **`pip install poetry`**.\n4. Run **`poetry install`** to create project's virtual environment.\n5. Run tests using **`poetry run tox`** (Any python versions which you don't have checked out will fail this). Fix failing tests and repeat.\n6. Make documentation changes that are relavant.\n7. Install **`pre-commit`** using **`pip install pre-commit`** and run **`pre-commit run --all-files`** to do lint checks.\n8. Generate documentation using **`poetry run sphinx-build -b html docs/ docs/_build/html`**.\n9. Generate **`requirements.txt`** for automated testing using **`poetry export --dev --without-hashes -f requirements.txt > requirements.txt`**.\n10. Commit the changes and raise a pull request.\n\nBuy the developer a cup of coffee!\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIf you found the utility helpful you can buy me a cup of coffee using\n\n|Donate|\n\n.. |Donate| image:: https://www.paypalobjects.com/webstatic/en_US/i/btn/png/silver-pill-paypal-44px.png\n   :target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=3BSBW7D45C4YN&lc=US&currency_code=USD&bn=PP%2dDonationsBF%3abtn_donate_SM%2egif%3aNonHosted\n\n.. _Automatic keyword extraction from individual documents by Stuart Rose, Dave Engel, Nick Cramer and Wendy Cowley: https://www.researchgate.net/profile/Stuart_Rose/publication/227988510_Automatic_Keyword_Extraction_from_Individual_Documents/links/55071c570cf27e990e04c8bb.pdf\n.. _aneesha/RAKE: https://github.com/aneesha/RAKE\n.. _waseem18/node-rake: https://github.com/waseem18/node-rake\n.. _NLTK: http://www.nltk.org/\n.. _issue tracker: https://github.com/csurfer/rake-nltk/issues\n\n.. |Build Status| image:: https://github.com/csurfer/rake-nltk/actions/workflows/pytest.yml/badge.svg\n   :target: https://github.com/csurfer/rake-nltk/actions\n.. |Licence| image:: https://img.shields.io/badge/license-MIT-blue.svg\n   :target: https://raw.githubusercontent.com/csurfer/rake-nltk/master/LICENSE\n.. |Coverage Status| image:: https://codecov.io/gh/csurfer/rake-nltk/branch/master/graph/badge.svg?token=ghRhWVec9X\n   :target: https://codecov.io/gh/csurfer/rake-nltk\n.. |Demo| image:: http://i.imgur.com/wVOzU7y.gif\n.. |pypiv| image:: https://img.shields.io/pypi/v/rake-nltk.svg\n   :target: https://pypi.python.org/pypi/rake-nltk\n.. |pyv| image:: https://img.shields.io/pypi/pyversions/rake-nltk.svg\n   :target: https://pypi.python.org/pypi/rake-nltk\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.",
    "version": "1.0.6",
    "split_keywords": [
        "nlp",
        "text-mining",
        "algorithms",
        "development"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3be518876d587142df57b1c70ef752da34664bb7dd383710ccf3ccaefba2aa0c",
                "md5": "47317c3149911d055055cbf3b9d44143",
                "sha256": "1c1ffdb64cae8cb99d169d53a5ffa4635f1c4abd3a02c6e22d5d083136bdc5c1"
            },
            "downloads": -1,
            "filename": "rake_nltk-1.0.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "47317c3149911d055055cbf3b9d44143",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6,<4.0",
            "size": 9103,
            "upload_time": "2021-09-15T05:13:19",
            "upload_time_iso_8601": "2021-09-15T05:13:19.937867Z",
            "url": "https://files.pythonhosted.org/packages/3b/e5/18876d587142df57b1c70ef752da34664bb7dd383710ccf3ccaefba2aa0c/rake_nltk-1.0.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dab153392b9ba76fdb1e9de3198f63eb1cb92529c80201e0709162d140134b30",
                "md5": "f916f6b2ceb4e191bc61ccf6e9d7c16f",
                "sha256": "7813d680b2ce77b51cdac1757f801a87ff47682c9dbd2982aea3b66730346122"
            },
            "downloads": -1,
            "filename": "rake-nltk-1.0.6.tar.gz",
            "has_sig": false,
            "md5_digest": "f916f6b2ceb4e191bc61ccf6e9d7c16f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6,<4.0",
            "size": 9089,
            "upload_time": "2021-09-15T05:13:18",
            "upload_time_iso_8601": "2021-09-15T05:13:18.346931Z",
            "url": "https://files.pythonhosted.org/packages/da/b1/53392b9ba76fdb1e9de3198f63eb1cb92529c80201e0709162d140134b30/rake-nltk-1.0.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-09-15 05:13:18",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "rake-nltk"
}

csurfer