Name | rake-nltk JSON |
Version |
1.0.6
JSON |
| download |
home_page | https://csurfer.github.io/rake-nltk |
Summary | RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text. |
upload_time | 2021-09-15 05:13:18 |
maintainer | |
docs_url | None |
author | csurfer |
requires_python | >=3.6,<4.0 |
license | MIT |
keywords |
nlp
text-mining
algorithms
development
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
|
coveralls test coverage |
|
rake-nltk
=========
|pypiv| |pyv| |Licence| |Build Status| |Coverage Status|
RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain
independent keyword extraction algorithm which tries to determine key
phrases in a body of text by analyzing the frequency of word appearance
and its co-occurance with other words in the text.
|Demo|
Features
--------
* Ridiculously simple interface.
* Configurable word and sentence tokenizers, language based stop words etc
* Configurable ranking metric.
Setup
-----
Using pip
~~~~~~~~~
.. code:: bash
pip install rake-nltk
Directly from the repository
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: bash
git clone https://github.com/csurfer/rake-nltk.git
python rake-nltk/setup.py install
Quick Start
-----------
.. code:: python
from rake_nltk import Rake
# Uses stopwords for english from NLTK, and all puntuation characters by
# default
r = Rake()
# Extraction given the text.
r.extract_keywords_from_text(<text to process>)
# Extraction given the list of strings where each string is a sentence.
r.extract_keywords_from_sentences(<list of sentences>)
# To get keyword phrases ranked highest to lowest.
r.get_ranked_phrases()
# To get keyword phrases ranked highest to lowest with scores.
r.get_ranked_phrases_with_scores()
Debugging Setup
---------------
If you see a stopwords error, it means that you do not have the corpus
`stopwords` downloaded from NLTK. You can download it using command below.
.. code:: bash
python -c "import nltk; nltk.download('stopwords')"
References
----------
This is a python implementation of the algorithm as mentioned in paper
`Automatic keyword extraction from individual documents by Stuart Rose,
Dave Engel, Nick Cramer and Wendy Cowley`_
Why I chose to implement it myself?
-----------------------------------
- It is extremely fun to implement algorithms by reading papers. It is
the digital equivalent of DIY kits.
- There are some rather popular implementations out there, in python(\ `aneesha/RAKE`_) and
node(\ `waseem18/node-rake`_) but neither seemed to use the power of `NLTK`_. By making NLTK
an integral part of the implementation I get the flexibility and power to extend it in other
creative ways, if I see fit later, without having to implement everything myself.
- I plan to use it in my other pet projects to come and wanted it to be
modular and tunable and this way I have complete control.
Contributing
------------
Bug Reports and Feature Requests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Please use `issue tracker`_ for reporting bugs or feature requests.
Development
~~~~~~~~~~~
1. Checkout the repository.
2. Make your changes and add/update relavent tests.
3. Install **`poetry`** using **`pip install poetry`**.
4. Run **`poetry install`** to create project's virtual environment.
5. Run tests using **`poetry run tox`** (Any python versions which you don't have checked out will fail this). Fix failing tests and repeat.
6. Make documentation changes that are relavant.
7. Install **`pre-commit`** using **`pip install pre-commit`** and run **`pre-commit run --all-files`** to do lint checks.
8. Generate documentation using **`poetry run sphinx-build -b html docs/ docs/_build/html`**.
9. Generate **`requirements.txt`** for automated testing using **`poetry export --dev --without-hashes -f requirements.txt > requirements.txt`**.
10. Commit the changes and raise a pull request.
Buy the developer a cup of coffee!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you found the utility helpful you can buy me a cup of coffee using
|Donate|
.. |Donate| image:: https://www.paypalobjects.com/webstatic/en_US/i/btn/png/silver-pill-paypal-44px.png
:target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=3BSBW7D45C4YN&lc=US¤cy_code=USD&bn=PP%2dDonationsBF%3abtn_donate_SM%2egif%3aNonHosted
.. _Automatic keyword extraction from individual documents by Stuart Rose, Dave Engel, Nick Cramer and Wendy Cowley: https://www.researchgate.net/profile/Stuart_Rose/publication/227988510_Automatic_Keyword_Extraction_from_Individual_Documents/links/55071c570cf27e990e04c8bb.pdf
.. _aneesha/RAKE: https://github.com/aneesha/RAKE
.. _waseem18/node-rake: https://github.com/waseem18/node-rake
.. _NLTK: http://www.nltk.org/
.. _issue tracker: https://github.com/csurfer/rake-nltk/issues
.. |Build Status| image:: https://github.com/csurfer/rake-nltk/actions/workflows/pytest.yml/badge.svg
:target: https://github.com/csurfer/rake-nltk/actions
.. |Licence| image:: https://img.shields.io/badge/license-MIT-blue.svg
:target: https://raw.githubusercontent.com/csurfer/rake-nltk/master/LICENSE
.. |Coverage Status| image:: https://codecov.io/gh/csurfer/rake-nltk/branch/master/graph/badge.svg?token=ghRhWVec9X
:target: https://codecov.io/gh/csurfer/rake-nltk
.. |Demo| image:: http://i.imgur.com/wVOzU7y.gif
.. |pypiv| image:: https://img.shields.io/pypi/v/rake-nltk.svg
:target: https://pypi.python.org/pypi/rake-nltk
.. |pyv| image:: https://img.shields.io/pypi/pyversions/rake-nltk.svg
:target: https://pypi.python.org/pypi/rake-nltk
Raw data
{
"_id": null,
"home_page": "https://csurfer.github.io/rake-nltk",
"name": "rake-nltk",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.6,<4.0",
"maintainer_email": "",
"keywords": "nlp,text-mining,algorithms,development",
"author": "csurfer",
"author_email": "sharma.vishwas88@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/da/b1/53392b9ba76fdb1e9de3198f63eb1cb92529c80201e0709162d140134b30/rake-nltk-1.0.6.tar.gz",
"platform": "",
"description": "rake-nltk\n=========\n\n|pypiv| |pyv| |Licence| |Build Status| |Coverage Status|\n\nRAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain\nindependent keyword extraction algorithm which tries to determine key\nphrases in a body of text by analyzing the frequency of word appearance\nand its co-occurance with other words in the text.\n\n|Demo|\n\nFeatures\n--------\n\n* Ridiculously simple interface.\n* Configurable word and sentence tokenizers, language based stop words etc\n* Configurable ranking metric.\n\nSetup\n-----\n\nUsing pip\n~~~~~~~~~\n\n.. code:: bash\n\n pip install rake-nltk\n\nDirectly from the repository\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code:: bash\n\n git clone https://github.com/csurfer/rake-nltk.git\n python rake-nltk/setup.py install\n\nQuick Start\n-----------\n\n.. code:: python\n\n from rake_nltk import Rake\n\n # Uses stopwords for english from NLTK, and all puntuation characters by\n # default\n r = Rake()\n\n # Extraction given the text.\n r.extract_keywords_from_text(<text to process>)\n\n # Extraction given the list of strings where each string is a sentence.\n r.extract_keywords_from_sentences(<list of sentences>)\n\n # To get keyword phrases ranked highest to lowest.\n r.get_ranked_phrases()\n\n # To get keyword phrases ranked highest to lowest with scores.\n r.get_ranked_phrases_with_scores()\n\nDebugging Setup\n---------------\n\nIf you see a stopwords error, it means that you do not have the corpus\n`stopwords` downloaded from NLTK. You can download it using command below.\n\n.. code:: bash\n\n python -c \"import nltk; nltk.download('stopwords')\"\n\nReferences\n----------\n\nThis is a python implementation of the algorithm as mentioned in paper\n`Automatic keyword extraction from individual documents by Stuart Rose,\nDave Engel, Nick Cramer and Wendy Cowley`_\n\nWhy I chose to implement it myself?\n-----------------------------------\n\n- It is extremely fun to implement algorithms by reading papers. It is\n the digital equivalent of DIY kits.\n- There are some rather popular implementations out there, in python(\\ `aneesha/RAKE`_) and\n node(\\ `waseem18/node-rake`_) but neither seemed to use the power of `NLTK`_. By making NLTK\n an integral part of the implementation I get the flexibility and power to extend it in other\n creative ways, if I see fit later, without having to implement everything myself.\n- I plan to use it in my other pet projects to come and wanted it to be\n modular and tunable and this way I have complete control.\n\nContributing\n------------\n\nBug Reports and Feature Requests\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nPlease use `issue tracker`_ for reporting bugs or feature requests.\n\nDevelopment\n~~~~~~~~~~~\n\n1. Checkout the repository.\n2. Make your changes and add/update relavent tests.\n3. Install **`poetry`** using **`pip install poetry`**.\n4. Run **`poetry install`** to create project's virtual environment.\n5. Run tests using **`poetry run tox`** (Any python versions which you don't have checked out will fail this). Fix failing tests and repeat.\n6. Make documentation changes that are relavant.\n7. Install **`pre-commit`** using **`pip install pre-commit`** and run **`pre-commit run --all-files`** to do lint checks.\n8. Generate documentation using **`poetry run sphinx-build -b html docs/ docs/_build/html`**.\n9. Generate **`requirements.txt`** for automated testing using **`poetry export --dev --without-hashes -f requirements.txt > requirements.txt`**.\n10. Commit the changes and raise a pull request.\n\nBuy the developer a cup of coffee!\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIf you found the utility helpful you can buy me a cup of coffee using\n\n|Donate|\n\n.. |Donate| image:: https://www.paypalobjects.com/webstatic/en_US/i/btn/png/silver-pill-paypal-44px.png\n :target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=3BSBW7D45C4YN&lc=US¤cy_code=USD&bn=PP%2dDonationsBF%3abtn_donate_SM%2egif%3aNonHosted\n\n.. _Automatic keyword extraction from individual documents by Stuart Rose, Dave Engel, Nick Cramer and Wendy Cowley: https://www.researchgate.net/profile/Stuart_Rose/publication/227988510_Automatic_Keyword_Extraction_from_Individual_Documents/links/55071c570cf27e990e04c8bb.pdf\n.. _aneesha/RAKE: https://github.com/aneesha/RAKE\n.. _waseem18/node-rake: https://github.com/waseem18/node-rake\n.. _NLTK: http://www.nltk.org/\n.. _issue tracker: https://github.com/csurfer/rake-nltk/issues\n\n.. |Build Status| image:: https://github.com/csurfer/rake-nltk/actions/workflows/pytest.yml/badge.svg\n :target: https://github.com/csurfer/rake-nltk/actions\n.. |Licence| image:: https://img.shields.io/badge/license-MIT-blue.svg\n :target: https://raw.githubusercontent.com/csurfer/rake-nltk/master/LICENSE\n.. |Coverage Status| image:: https://codecov.io/gh/csurfer/rake-nltk/branch/master/graph/badge.svg?token=ghRhWVec9X\n :target: https://codecov.io/gh/csurfer/rake-nltk\n.. |Demo| image:: http://i.imgur.com/wVOzU7y.gif\n.. |pypiv| image:: https://img.shields.io/pypi/v/rake-nltk.svg\n :target: https://pypi.python.org/pypi/rake-nltk\n.. |pyv| image:: https://img.shields.io/pypi/pyversions/rake-nltk.svg\n :target: https://pypi.python.org/pypi/rake-nltk\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.",
"version": "1.0.6",
"project_urls": {
"Homepage": "https://csurfer.github.io/rake-nltk",
"Repository": "https://github.com/csurfer/rake-nltk"
},
"split_keywords": [
"nlp",
"text-mining",
"algorithms",
"development"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3be518876d587142df57b1c70ef752da34664bb7dd383710ccf3ccaefba2aa0c",
"md5": "47317c3149911d055055cbf3b9d44143",
"sha256": "1c1ffdb64cae8cb99d169d53a5ffa4635f1c4abd3a02c6e22d5d083136bdc5c1"
},
"downloads": -1,
"filename": "rake_nltk-1.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "47317c3149911d055055cbf3b9d44143",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.6,<4.0",
"size": 9103,
"upload_time": "2021-09-15T05:13:19",
"upload_time_iso_8601": "2021-09-15T05:13:19.937867Z",
"url": "https://files.pythonhosted.org/packages/3b/e5/18876d587142df57b1c70ef752da34664bb7dd383710ccf3ccaefba2aa0c/rake_nltk-1.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "dab153392b9ba76fdb1e9de3198f63eb1cb92529c80201e0709162d140134b30",
"md5": "f916f6b2ceb4e191bc61ccf6e9d7c16f",
"sha256": "7813d680b2ce77b51cdac1757f801a87ff47682c9dbd2982aea3b66730346122"
},
"downloads": -1,
"filename": "rake-nltk-1.0.6.tar.gz",
"has_sig": false,
"md5_digest": "f916f6b2ceb4e191bc61ccf6e9d7c16f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6,<4.0",
"size": 9089,
"upload_time": "2021-09-15T05:13:18",
"upload_time_iso_8601": "2021-09-15T05:13:18.346931Z",
"url": "https://files.pythonhosted.org/packages/da/b1/53392b9ba76fdb1e9de3198f63eb1cb92529c80201e0709162d140134b30/rake-nltk-1.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2021-09-15 05:13:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "csurfer",
"github_project": "rake-nltk",
"travis_ci": true,
"coveralls": true,
"github_actions": true,
"requirements": [],
"tox": true,
"lcname": "rake-nltk"
}