ConTextMining

Name	ConTextMining JSON
Version	0.0.2 JSON
	download
home_page	None
Summary	Complementing topic models with few-shot in-context learning to generate interpretable topics
upload_time	2024-09-23 19:05:42
maintainer	None
docs_url	None
author	Charles Alba
requires_python	None
license	None
keywords	generative ai text mining topic modeling llms
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            `ConTextMining` is a package generate interpretable topics labels from the keywords of topic models (e.g, `LDA`, `BERTopic`) through few-shot in-context learning. 





[![pypi package](https://img.shields.io/badge/pypi_package-v0.0.2-brightgreen)](https://pypi.org/project/ConTextMining/) [![GitHub Source Code](https://img.shields.io/badge/github_source_code-source_code?logo=github&color=green)](https://github.com/cja5553/ConTextMining) 





## Requirements  

### Required packages

The following packages are required for `ConTextMining`. 



- `torch` (to learn how to install, please refer to [pytorch.org/](https://pytorch.org/))

- `transformers`

- `tokenizers`

- `huggingface-hub`

- `flash_attn`

- `accelerate`



To install these packages, you can do the following:



```bash

pip install torch transformers tokenizers huggingface-hub flash_attn accelerate

```



### GPU requirements

You require at least one GPU to use `ConTextMining`.  

VRAM requirements depend on factors like number of keywords or topics used to topic labels you wish to generate.  

However, at least 8GB of VRAM is recommended



### huggingface access token

You will need a huggingface access token. To obtain one:  

1. you'd first need to create a [huggingface](https://huggingface.co) account if you do not have one. 

2. Create and store a new access token. To learn more, please refer to [huggingface.co/docs/hub/en/security-tokens](https://huggingface.co/docs/hub/en/security-tokens).  

3. Note: Some pre-trained large language models (LLMs) may require permissions. For more information, please refer to [huggingface.co/docs/hub/en/models-gated](https://huggingface.co/docs/hub/en/models-gated).  







## Installation

To install in python, simply do the following: 

```bash

pip install ConTextMining

```



## Quick Start

Here we provide a quick example on how you can execute `ConTextMining` to conveniently generate interpretable topics labels from the keywords of topic models. 

```python

from ConTextMining import get_topic_labels



# specify your huggingface access token. To learn how to obtain one, refer to huggingface.co/docs/hub/en/security-tokens

hf_access_token="<your huggingface access token>" 



# specify the huggingface model id. Choose between "microsoft/Phi-3-mini-4k-instruct", "meta-llama/Meta-Llama-3.1-8B-Instruct" or "google/gemma-2-2b-it"

model_id="meta-llama/Meta-Llama-3.1-8B-Instruct"



# specify the keywords for the few-shot learning examples

keywords_examples = [

    "olympic, year, said, games, team",

    "mr, bush, president, white, house",

    "sadi, report, evidence, findings, defense",

    "french, union, germany, workers, paris",

    "japanese, year, tokyo, matsui, said"

]



# specify the labels CORRESPONDING TO THE INDEX of the keywords of 'keyword_examples' above. 

labels_examples = [

    "sports",

    "politics",

    "research",

    "france",

    "japan"

]



# specify your topic modeling keywords of wish to generate coherently topic labels. 

topic_modeling_keywords ='''Topic 1: [amazing, really, place, phenomenon, pleasant],

Topic 2: [loud, awful, sunday, like, slow],

Topic 3: [spinach, carrots, green, salad, dressing],

Topic 4: [mango, strawberry, vanilla, banana, peanut],

Topic 5: [fish, roll, salmon, fresh, good]'''





print(get_topic_labels(topic_modeling_keywords=topic_modeling_keywords, keywords_examples=keywords_examples, labels_examples=labels_examples, model_id=model_id, access_token=hf_access_token))

```

You will now get the interpretable topic model labels for all 5 topics! 



## Documentation



```python

ConTextMining.get_topic_labels(*, topic_modeling_keywords, labels_examples,keywords_examples, model_id, access_token)

```



- `topic_modeling_keywords` *(str, required)*: keywords stemming from the outputs of topic models (keywords representing each cluster) for `ConTextMining` to label.  

- `keywords_examples` *(list, required)*: list-of-string(s) containing topic modeling keywords which serves as training examples for few-shot learning.  

- `labels_examples` *(list, required)*: list-of-string(s) containing the labels CORRESPONDING TO THE INDEX of the keywords of `keyword_examples` above.   

- `model_id` *(str, required)*: huggingface model_id of choice. For now, its a choice between "microsoft/Phi-3-mini-4k-instruct", "meta-llama/Meta-Llama-3.1-8B-Instruct", or "google/gemma-2-2b-it". Defaults to "google/gemma-2-2b-it".  

- `access_token` *(str, required)*: Huggingface access token. To learn how to obtain one, refer to [huggingface.co/docs/hub/en/security-tokens](https://huggingface.co/docs/hub/en/security-tokens). Defaults to `None`





## Citation

C Alba "ConText Mining: Complementing topic models with few-shot in-context learning to generate interpretable topics" Working paper. 



## Questions?



Contact me at [alba@wustl.edu](mailto:alba@wustl.edu)

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "ConTextMining",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "generative AI, text mining, topic modeling, LLMs",
    "author": "Charles Alba",
    "author_email": "alba@wustl.edu",
    "download_url": "https://files.pythonhosted.org/packages/a8/34/182fc3d327f5bae4eed37f487e92b48a3840faeb0d158e18bad276f1704e/contextmining-0.0.2.tar.gz",
    "platform": null,
    "description": "`ConTextMining` is a package generate interpretable topics labels from the keywords of topic models (e.g, `LDA`, `BERTopic`) through few-shot in-context learning. \r\n\r\n\r\n\r\n\r\n\r\n[![pypi package](https://img.shields.io/badge/pypi_package-v0.0.2-brightgreen)](https://pypi.org/project/ConTextMining/) [![GitHub Source Code](https://img.shields.io/badge/github_source_code-source_code?logo=github&color=green)](https://github.com/cja5553/ConTextMining) \r\n\r\n\r\n\r\n\r\n\r\n## Requirements  \r\n\r\n### Required packages\r\n\r\nThe following packages are required for `ConTextMining`. \r\n\r\n\r\n\r\n- `torch` (to learn how to install, please refer to [pytorch.org/](https://pytorch.org/))\r\n\r\n- `transformers`\r\n\r\n- `tokenizers`\r\n\r\n- `huggingface-hub`\r\n\r\n- `flash_attn`\r\n\r\n- `accelerate`\r\n\r\n\r\n\r\nTo install these packages, you can do the following:\r\n\r\n\r\n\r\n```bash\r\n\r\npip install torch transformers tokenizers huggingface-hub flash_attn accelerate\r\n\r\n```\r\n\r\n\r\n\r\n### GPU requirements\r\n\r\nYou require at least one GPU to use `ConTextMining`.  \r\n\r\nVRAM requirements depend on factors like number of keywords or topics used to topic labels you wish to generate.  \r\n\r\nHowever, at least 8GB of VRAM is recommended\r\n\r\n\r\n\r\n### huggingface access token\r\n\r\nYou will need a huggingface access token. To obtain one:  \r\n\r\n1. you'd first need to create a [huggingface](https://huggingface.co) account if you do not have one. \r\n\r\n2. Create and store a new access token. To learn more, please refer to [huggingface.co/docs/hub/en/security-tokens](https://huggingface.co/docs/hub/en/security-tokens).  \r\n\r\n3. Note: Some pre-trained large language models (LLMs) may require permissions. For more information, please refer to [huggingface.co/docs/hub/en/models-gated](https://huggingface.co/docs/hub/en/models-gated).  \r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n## Installation\r\n\r\nTo install in python, simply do the following: \r\n\r\n```bash\r\n\r\npip install ConTextMining\r\n\r\n```\r\n\r\n\r\n\r\n## Quick Start\r\n\r\nHere we provide a quick example on how you can execute `ConTextMining` to conveniently generate interpretable topics labels from the keywords of topic models. \r\n\r\n```python\r\n\r\nfrom ConTextMining import get_topic_labels\r\n\r\n\r\n\r\n# specify your huggingface access token. To learn how to obtain one, refer to huggingface.co/docs/hub/en/security-tokens\r\n\r\nhf_access_token=\"<your huggingface access token>\" \r\n\r\n\r\n\r\n# specify the huggingface model id. Choose between \"microsoft/Phi-3-mini-4k-instruct\", \"meta-llama/Meta-Llama-3.1-8B-Instruct\" or \"google/gemma-2-2b-it\"\r\n\r\nmodel_id=\"meta-llama/Meta-Llama-3.1-8B-Instruct\"\r\n\r\n\r\n\r\n# specify the keywords for the few-shot learning examples\r\n\r\nkeywords_examples = [\r\n\r\n    \"olympic, year, said, games, team\",\r\n\r\n    \"mr, bush, president, white, house\",\r\n\r\n    \"sadi, report, evidence, findings, defense\",\r\n\r\n    \"french, union, germany, workers, paris\",\r\n\r\n    \"japanese, year, tokyo, matsui, said\"\r\n\r\n]\r\n\r\n\r\n\r\n# specify the labels CORRESPONDING TO THE INDEX of the keywords of 'keyword_examples' above. \r\n\r\nlabels_examples = [\r\n\r\n    \"sports\",\r\n\r\n    \"politics\",\r\n\r\n    \"research\",\r\n\r\n    \"france\",\r\n\r\n    \"japan\"\r\n\r\n]\r\n\r\n\r\n\r\n# specify your topic modeling keywords of wish to generate coherently topic labels. \r\n\r\ntopic_modeling_keywords ='''Topic 1: [amazing, really, place, phenomenon, pleasant],\r\n\r\nTopic 2: [loud, awful, sunday, like, slow],\r\n\r\nTopic 3: [spinach, carrots, green, salad, dressing],\r\n\r\nTopic 4: [mango, strawberry, vanilla, banana, peanut],\r\n\r\nTopic 5: [fish, roll, salmon, fresh, good]'''\r\n\r\n\r\n\r\n\r\n\r\nprint(get_topic_labels(topic_modeling_keywords=topic_modeling_keywords, keywords_examples=keywords_examples, labels_examples=labels_examples, model_id=model_id, access_token=hf_access_token))\r\n\r\n```\r\n\r\nYou will now get the interpretable topic model labels for all 5 topics! \r\n\r\n\r\n\r\n## Documentation\r\n\r\n\r\n\r\n```python\r\n\r\nConTextMining.get_topic_labels(*, topic_modeling_keywords, labels_examples,keywords_examples, model_id, access_token)\r\n\r\n```\r\n\r\n\r\n\r\n- `topic_modeling_keywords` *(str, required)*: keywords stemming from the outputs of topic models (keywords representing each cluster) for `ConTextMining` to label.  \r\n\r\n- `keywords_examples` *(list, required)*: list-of-string(s) containing topic modeling keywords which serves as training examples for few-shot learning.  \r\n\r\n- `labels_examples` *(list, required)*: list-of-string(s) containing the labels CORRESPONDING TO THE INDEX of the keywords of `keyword_examples` above.   \r\n\r\n- `model_id` *(str, required)*: huggingface model_id of choice. For now, its a choice between \"microsoft/Phi-3-mini-4k-instruct\", \"meta-llama/Meta-Llama-3.1-8B-Instruct\", or \"google/gemma-2-2b-it\". Defaults to \"google/gemma-2-2b-it\".  \r\n\r\n- `access_token` *(str, required)*: Huggingface access token. To learn how to obtain one, refer to [huggingface.co/docs/hub/en/security-tokens](https://huggingface.co/docs/hub/en/security-tokens). Defaults to `None`\r\n\r\n\r\n\r\n\r\n\r\n## Citation\r\n\r\nC Alba \"ConText Mining: Complementing topic models with few-shot in-context learning to generate interpretable topics\" Working paper. \r\n\r\n\r\n\r\n## Questions?\r\n\r\n\r\n\r\nContact me at [alba@wustl.edu](mailto:alba@wustl.edu)\r\n\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Complementing topic models with few-shot in-context learning to generate interpretable topics",
    "version": "0.0.2",
    "project_urls": null,
    "split_keywords": [
        "generative ai",
        " text mining",
        " topic modeling",
        " llms"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a69dd500ff8ce7b776fc5a4531b630589034a50002107ddae8945757030bd23d",
                "md5": "c149ead56b180f0009b75c9e26a91d3e",
                "sha256": "db4ef650b9c57c04b5a25b0d63901eca07c69796ed22a22bc167dcaa92a90d2e"
            },
            "downloads": -1,
            "filename": "ConTextMining-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c149ead56b180f0009b75c9e26a91d3e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 6017,
            "upload_time": "2024-09-23T19:05:41",
            "upload_time_iso_8601": "2024-09-23T19:05:41.131436Z",
            "url": "https://files.pythonhosted.org/packages/a6/9d/d500ff8ce7b776fc5a4531b630589034a50002107ddae8945757030bd23d/ConTextMining-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a834182fc3d327f5bae4eed37f487e92b48a3840faeb0d158e18bad276f1704e",
                "md5": "c2ea0723d89b1f26d59dbec00df925f8",
                "sha256": "cca09917a4f0e324d023b5e06a9238a34b50c918233acddd9cef4cec4c877eaf"
            },
            "downloads": -1,
            "filename": "contextmining-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "c2ea0723d89b1f26d59dbec00df925f8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 5906,
            "upload_time": "2024-09-23T19:05:42",
            "upload_time_iso_8601": "2024-09-23T19:05:42.203015Z",
            "url": "https://files.pythonhosted.org/packages/a8/34/182fc3d327f5bae4eed37f487e92b48a3840faeb0d158e18bad276f1704e/contextmining-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-23 19:05:42",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "contextmining"
}

Charles Alba