chunkey-bert


Namechunkey-bert JSON
Version 0.2.0 PyPI version JSON
download
home_pagehttps://github.com/yaniv-shulman/chunkey-keybert
SummaryModification of the KeyBERT method to extract keywords and keyphrases using chunks. This provides better results, especialy when handling long documents.
upload_time2024-06-07 04:13:42
maintainerNone
docs_urlNone
authorYaniv Shulman
requires_python<3.12,>=3.9
licenseNone
keywords machine learning
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![Tests](https://github.com/yaniv-shulman/chunkey-bert/actions/workflows/linting_and_tests.yml/badge.svg?branch=main)
[![phorm.ai](https://img.shields.io/badge/ask%20phorm.ai-8A2BE2)](https://www.phorm.ai/query?projectId=f7ddaf97-2b90-4515-a364-855258454655)
[![Pyversions](https://img.shields.io/pypi/pyversions/chunkey-bert.svg?style=flat-square)](https://pypi.python.org/pypi/chunkey-bert)

# ChunkeyBERT #
## Overview ##
ChunkeyBert is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings for unsupervised 
keyphrase extraction from text documents. ChunkeyBert is a modification of the 
[KeyBERT method](https://towardsdatascience.com/keyword-extraction-with-bert-724efca412ea) to handle documents with 
arbitrary length with better results. ChunkeyBERT works by chunking the documents and uses KeyBERT to extract candidate
keywords/keyphrases from all chunks followed by a similarity based selection stage to produce the final keywords for the
entire document. ChunkeyBert can use any document chunking method as long as it can be wrapped in a simple function, 
however it can also work without a chunker and process the entire document as a single chunk. ChunkeyBert works with any
configuration of KeyBERT and can handle batches of documents. 

## Installation ##
Install from [PyPI](https://pypi.org/project/rsklpr/) using pip (preferred method):
```bash
pip install chunkey-bert
```

## Experimental results ##
Very limited experimental results and demonstration of the library on a small number of documents is available at 
 https://nbviewer.org/github/yaniv-shulman/chunkey-bert/tree/main/src/experiments/.


## Contribution and feedback ##
Contributions and feedback are most welcome. Please see
[CONTRIBUTING.md](https://github.com/yaniv-shulman/chunkey-bert/tree/main/CONTRIBUTING.md) for further details.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/yaniv-shulman/chunkey-keybert",
    "name": "chunkey-bert",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.12,>=3.9",
    "maintainer_email": null,
    "keywords": "machine learning",
    "author": "Yaniv Shulman",
    "author_email": "yaniv@shulman.info",
    "download_url": "https://files.pythonhosted.org/packages/b8/93/a18712c152e5291adcf1f135199e0e818f5314d3ab0df7ee657f32de1671/chunkey_bert-0.2.0.tar.gz",
    "platform": null,
    "description": "![Tests](https://github.com/yaniv-shulman/chunkey-bert/actions/workflows/linting_and_tests.yml/badge.svg?branch=main)\n[![phorm.ai](https://img.shields.io/badge/ask%20phorm.ai-8A2BE2)](https://www.phorm.ai/query?projectId=f7ddaf97-2b90-4515-a364-855258454655)\n[![Pyversions](https://img.shields.io/pypi/pyversions/chunkey-bert.svg?style=flat-square)](https://pypi.python.org/pypi/chunkey-bert)\n\n# ChunkeyBERT #\n## Overview ##\nChunkeyBert is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings for unsupervised \nkeyphrase extraction from text documents. ChunkeyBert is a modification of the \n[KeyBERT method](https://towardsdatascience.com/keyword-extraction-with-bert-724efca412ea) to handle documents with \narbitrary length with better results. ChunkeyBERT works by chunking the documents and uses KeyBERT to extract candidate\nkeywords/keyphrases from all chunks followed by a similarity based selection stage to produce the final keywords for the\nentire document. ChunkeyBert can use any document chunking method as long as it can be wrapped in a simple function, \nhowever it can also work without a chunker and process the entire document as a single chunk. ChunkeyBert works with any\nconfiguration of KeyBERT and can handle batches of documents. \n\n## Installation ##\nInstall from [PyPI](https://pypi.org/project/rsklpr/) using pip (preferred method):\n```bash\npip install chunkey-bert\n```\n\n## Experimental results ##\nVery limited experimental results and demonstration of the library on a small number of documents is available at \n https://nbviewer.org/github/yaniv-shulman/chunkey-bert/tree/main/src/experiments/.\n\n\n## Contribution and feedback ##\nContributions and feedback are most welcome. Please see\n[CONTRIBUTING.md](https://github.com/yaniv-shulman/chunkey-bert/tree/main/CONTRIBUTING.md) for further details.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Modification of the KeyBERT method to extract keywords and keyphrases using chunks. This provides better results, especialy when handling long documents.",
    "version": "0.2.0",
    "project_urls": {
        "Homepage": "https://github.com/yaniv-shulman/chunkey-keybert",
        "Repository": "https://github.com/yaniv-shulman/chunkey-bert"
    },
    "split_keywords": [
        "machine",
        "learning"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e3188ec7ace43589906f70a54f89ecc61bd24164d88415616baa6e7b31521a03",
                "md5": "42f1228ff05562d753ad722fa1e57f5d",
                "sha256": "250b25912548e17c679e39599d069ecac588f19ac37c6ee7b04277e7c2621d31"
            },
            "downloads": -1,
            "filename": "chunkey_bert-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "42f1228ff05562d753ad722fa1e57f5d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.12,>=3.9",
            "size": 7156,
            "upload_time": "2024-06-07T04:13:40",
            "upload_time_iso_8601": "2024-06-07T04:13:40.333246Z",
            "url": "https://files.pythonhosted.org/packages/e3/18/8ec7ace43589906f70a54f89ecc61bd24164d88415616baa6e7b31521a03/chunkey_bert-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b893a18712c152e5291adcf1f135199e0e818f5314d3ab0df7ee657f32de1671",
                "md5": "50d5b8ba35be5008932b58f3e831c3b6",
                "sha256": "2764e83d0ec420ceb18eb0ca5f1c7818afcdb55ce3d32f96439eea2f38a14b9f"
            },
            "downloads": -1,
            "filename": "chunkey_bert-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "50d5b8ba35be5008932b58f3e831c3b6",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.12,>=3.9",
            "size": 7505,
            "upload_time": "2024-06-07T04:13:42",
            "upload_time_iso_8601": "2024-06-07T04:13:42.407924Z",
            "url": "https://files.pythonhosted.org/packages/b8/93/a18712c152e5291adcf1f135199e0e818f5314d3ab0df7ee657f32de1671/chunkey_bert-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-06-07 04:13:42",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "yaniv-shulman",
    "github_project": "chunkey-keybert",
    "github_not_found": true,
    "lcname": "chunkey-bert"
}
        
Elapsed time: 0.48933s