![Tests](https://github.com/yaniv-shulman/chunkey-bert/actions/workflows/linting_and_tests.yml/badge.svg?branch=main)
[![phorm.ai](https://img.shields.io/badge/ask%20phorm.ai-8A2BE2)](https://www.phorm.ai/query?projectId=f7ddaf97-2b90-4515-a364-855258454655)
[![Pyversions](https://img.shields.io/pypi/pyversions/chunkey-bert.svg?style=flat-square)](https://pypi.python.org/pypi/chunkey-bert)
# ChunkeyBERT #
## Overview ##
ChunkeyBert is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings for unsupervised
keyphrase extraction from text documents. ChunkeyBert is a modification of the
[KeyBERT method](https://towardsdatascience.com/keyword-extraction-with-bert-724efca412ea) to handle documents with
arbitrary length with better results. ChunkeyBERT works by chunking the documents and uses KeyBERT to extract candidate
keywords/keyphrases from all chunks followed by a similarity based selection stage to produce the final keywords for the
entire document. ChunkeyBert can use any document chunking method as long as it can be wrapped in a simple function,
however it can also work without a chunker and process the entire document as a single chunk. ChunkeyBert works with any
configuration of KeyBERT and can handle batches of documents.
## Installation ##
Install from [PyPI](https://pypi.org/project/rsklpr/) using pip (preferred method):
```bash
pip install chunkey-bert
```
## Experimental results ##
Very limited experimental results and demonstration of the library on a small number of documents is available at
https://nbviewer.org/github/yaniv-shulman/chunkey-bert/tree/main/src/experiments/.
## Contribution and feedback ##
Contributions and feedback are most welcome. Please see
[CONTRIBUTING.md](https://github.com/yaniv-shulman/chunkey-bert/tree/main/CONTRIBUTING.md) for further details.
Raw data
{
"_id": null,
"home_page": "https://github.com/yaniv-shulman/chunkey-keybert",
"name": "chunkey-bert",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.12,>=3.9",
"maintainer_email": null,
"keywords": "machine learning",
"author": "Yaniv Shulman",
"author_email": "yaniv@shulman.info",
"download_url": "https://files.pythonhosted.org/packages/b8/93/a18712c152e5291adcf1f135199e0e818f5314d3ab0df7ee657f32de1671/chunkey_bert-0.2.0.tar.gz",
"platform": null,
"description": "![Tests](https://github.com/yaniv-shulman/chunkey-bert/actions/workflows/linting_and_tests.yml/badge.svg?branch=main)\n[![phorm.ai](https://img.shields.io/badge/ask%20phorm.ai-8A2BE2)](https://www.phorm.ai/query?projectId=f7ddaf97-2b90-4515-a364-855258454655)\n[![Pyversions](https://img.shields.io/pypi/pyversions/chunkey-bert.svg?style=flat-square)](https://pypi.python.org/pypi/chunkey-bert)\n\n# ChunkeyBERT #\n## Overview ##\nChunkeyBert is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings for unsupervised \nkeyphrase extraction from text documents. ChunkeyBert is a modification of the \n[KeyBERT method](https://towardsdatascience.com/keyword-extraction-with-bert-724efca412ea) to handle documents with \narbitrary length with better results. ChunkeyBERT works by chunking the documents and uses KeyBERT to extract candidate\nkeywords/keyphrases from all chunks followed by a similarity based selection stage to produce the final keywords for the\nentire document. ChunkeyBert can use any document chunking method as long as it can be wrapped in a simple function, \nhowever it can also work without a chunker and process the entire document as a single chunk. ChunkeyBert works with any\nconfiguration of KeyBERT and can handle batches of documents. \n\n## Installation ##\nInstall from [PyPI](https://pypi.org/project/rsklpr/) using pip (preferred method):\n```bash\npip install chunkey-bert\n```\n\n## Experimental results ##\nVery limited experimental results and demonstration of the library on a small number of documents is available at \n https://nbviewer.org/github/yaniv-shulman/chunkey-bert/tree/main/src/experiments/.\n\n\n## Contribution and feedback ##\nContributions and feedback are most welcome. Please see\n[CONTRIBUTING.md](https://github.com/yaniv-shulman/chunkey-bert/tree/main/CONTRIBUTING.md) for further details.\n",
"bugtrack_url": null,
"license": null,
"summary": "Modification of the KeyBERT method to extract keywords and keyphrases using chunks. This provides better results, especialy when handling long documents.",
"version": "0.2.0",
"project_urls": {
"Homepage": "https://github.com/yaniv-shulman/chunkey-keybert",
"Repository": "https://github.com/yaniv-shulman/chunkey-bert"
},
"split_keywords": [
"machine",
"learning"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "e3188ec7ace43589906f70a54f89ecc61bd24164d88415616baa6e7b31521a03",
"md5": "42f1228ff05562d753ad722fa1e57f5d",
"sha256": "250b25912548e17c679e39599d069ecac588f19ac37c6ee7b04277e7c2621d31"
},
"downloads": -1,
"filename": "chunkey_bert-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "42f1228ff05562d753ad722fa1e57f5d",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<3.12,>=3.9",
"size": 7156,
"upload_time": "2024-06-07T04:13:40",
"upload_time_iso_8601": "2024-06-07T04:13:40.333246Z",
"url": "https://files.pythonhosted.org/packages/e3/18/8ec7ace43589906f70a54f89ecc61bd24164d88415616baa6e7b31521a03/chunkey_bert-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b893a18712c152e5291adcf1f135199e0e818f5314d3ab0df7ee657f32de1671",
"md5": "50d5b8ba35be5008932b58f3e831c3b6",
"sha256": "2764e83d0ec420ceb18eb0ca5f1c7818afcdb55ce3d32f96439eea2f38a14b9f"
},
"downloads": -1,
"filename": "chunkey_bert-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "50d5b8ba35be5008932b58f3e831c3b6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<3.12,>=3.9",
"size": 7505,
"upload_time": "2024-06-07T04:13:42",
"upload_time_iso_8601": "2024-06-07T04:13:42.407924Z",
"url": "https://files.pythonhosted.org/packages/b8/93/a18712c152e5291adcf1f135199e0e818f5314d3ab0df7ee657f32de1671/chunkey_bert-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-06-07 04:13:42",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "yaniv-shulman",
"github_project": "chunkey-keybert",
"github_not_found": true,
"lcname": "chunkey-bert"
}