TokenSHAP


NameTokenSHAP JSON
Version 0.2.2 PyPI version JSON
download
home_pagehttps://github.com/ronigold/TokenSHAP
SummaryPaper Implementation: "TokenSHAP: Interpreting Large Language Models with Monte Carlo Shapley Value Estimation"
upload_time2024-11-03 13:36:14
maintainerNone
docs_urlNone
authorRoni Goldshmidt
requires_python>=3.6
licenseNone
keywords shapley values interpretation nlp ai
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # TokenSHAP: Implementing the Paper with Monte Carlo Shapley Value Estimation

TokenSHAP is a Python library designed to implement the method described in the paper "TokenSHAP: Interpreting Large Language Models with Monte Carlo Shapley Value Estimation" (Goldshmidt & Horovicz, 2024). This package introduces a novel approach for interpreting large language models (LLMs) by estimating Shapley values for individual tokens, providing insights into how specific parts of the input contribute to the model’s decisions.

![Tokens Architecture](data/TokenSHAP_flow.png)

TokenSHAP offers a novel method for interpreting large language models (LLMs) using Monte Carlo Shapley value estimation. This Python library attributes importance to individual tokens within input prompts, enhancing our understanding of model decisions. By leveraging concepts from cooperative game theory adapted to the dynamic nature of natural language, TokenSHAP facilitates a deeper insight into how different parts of an input contribute to the model's response.

![Tokens Importance](data/plot.JPG)

## About TokenSHAP

The method introduces an efficient way to estimate the importance of tokens based on Shapley values, providing interpretable, quantitative measures of token importance. It addresses the combinatorial complexity of language inputs and demonstrates efficacy across various prompts and LLM architectures. TokenSHAP represents a significant advancement in making AI more transparent and trustworthy, particularly in critical applications such as healthcare diagnostics, legal analysis, and automated decision-making systems.

## Prerequisites

Before installing TokenSHAP, you need to have Ollama deployed and running. Ollama is required for TokenSHAP to interact with large language models.

To install and set up Ollama, please follow the instructions in the [Ollama GitHub repository](https://github.com/ollama/ollama).

## Installation

You can install TokenSHAP directly from PyPI using pip:

```bash
pip install tokenshap
```

Alternatively, to install from source:

```bash
git clone https://github.com/ronigold/TokenSHAP.git
cd TokenSHAP
pip install -r requirements.txt
```

## Usage

TokenSHAP is easy to use with any model that supports SHAP value computation for NLP. Here's a quick guide:

```python
# Import TokenSHAP
from token_shap import TokenSHAP

# Initialize with your model & tokenizer
model_name = "llama3"
tokenizer_path = "NousResearch/Hermes-2-Theta-Llama-3-8B"
ollama_api_url = "http://localhost:11434"  # Default Ollama API URL
tshap = TokenSHAP(model_name, tokenizer_path, ollama_api_url)

# Analyze token importance
prompt = "Why is the sky blue?"
results = tshap.analyze(prompt)
```

Results will include SHAP values for each token, indicating their contribution to the model's output.

For a more detailed example and usage guide, please refer to our [TokenSHAP Examples notebook](notebooks/TokenShap%20Examples.ipynb) in the repository.

## Key Features

- **Interpretability for LLMs:** Delivers a methodical approach to understanding how individual components of input affect LLM outputs.
- **Monte Carlo Shapley Estimation:** Utilizes a Monte Carlo approach to efficiently compute Shapley values for tokens, suitable for extensive texts and large models.
- **Versatile Application:** Applicable across various LLM architectures and prompt types, from factual questions to complex multi-sentence inputs.

## Contributing

We welcome contributions from the community, whether it's adding new features, improving documentation, or reporting bugs. Here's how you can contribute:

1. Fork the project
2. Create your feature branch (`git checkout -b feature/YourAmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/YourAmazingFeature`)
5. Open a pull request

## Support

For support, please email roni.goldshmidt@getnexar.com or miriam.horovicz@ni.com, or open an issue on our GitHub project page.

## License

TokenSHAP is distributed under the MIT License. See `LICENSE` file for more information.

## Citation

If you use TokenSHAP in your research, please cite our paper:

```
@article{goldshmidt2024tokenshap,
  title={TokenSHAP: Interpreting Large Language Models with Monte Carlo Shapley Value Estimation},
  author={Goldshmidt, Roni and Horovicz, Miriam},
  journal={arXiv preprint arXiv:2407.10114},
  year={2024}
}
```

You can find the full paper on arXiv: [https://arxiv.org/abs/2407.10114](https://arxiv.org/abs/2407.10114)

## Authors

- **Roni Goldshmidt**
- **Miriam Horovicz**

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ronigold/TokenSHAP",
    "name": "TokenSHAP",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": null,
    "keywords": "shapley values interpretation NLP AI",
    "author": "Roni Goldshmidt",
    "author_email": "roni.goldshmidt@getnexar.com",
    "download_url": "https://files.pythonhosted.org/packages/72/54/b13651f08011da61390b74b529a48f33caba455be969c8199006a8986097/tokenshap-0.2.2.tar.gz",
    "platform": null,
    "description": "# TokenSHAP: Implementing the Paper with Monte Carlo Shapley Value Estimation\n\nTokenSHAP is a Python library designed to implement the method described in the paper \"TokenSHAP: Interpreting Large Language Models with Monte Carlo Shapley Value Estimation\" (Goldshmidt & Horovicz, 2024). This package introduces a novel approach for interpreting large language models (LLMs) by estimating Shapley values for individual tokens, providing insights into how specific parts of the input contribute to the model\u2019s decisions.\n\n![Tokens Architecture](data/TokenSHAP_flow.png)\n\nTokenSHAP offers a novel method for interpreting large language models (LLMs) using Monte Carlo Shapley value estimation. This Python library attributes importance to individual tokens within input prompts, enhancing our understanding of model decisions. By leveraging concepts from cooperative game theory adapted to the dynamic nature of natural language, TokenSHAP facilitates a deeper insight into how different parts of an input contribute to the model's response.\n\n![Tokens Importance](data/plot.JPG)\n\n## About TokenSHAP\n\nThe method introduces an efficient way to estimate the importance of tokens based on Shapley values, providing interpretable, quantitative measures of token importance. It addresses the combinatorial complexity of language inputs and demonstrates efficacy across various prompts and LLM architectures. TokenSHAP represents a significant advancement in making AI more transparent and trustworthy, particularly in critical applications such as healthcare diagnostics, legal analysis, and automated decision-making systems.\n\n## Prerequisites\n\nBefore installing TokenSHAP, you need to have Ollama deployed and running. Ollama is required for TokenSHAP to interact with large language models.\n\nTo install and set up Ollama, please follow the instructions in the [Ollama GitHub repository](https://github.com/ollama/ollama).\n\n## Installation\n\nYou can install TokenSHAP directly from PyPI using pip:\n\n```bash\npip install tokenshap\n```\n\nAlternatively, to install from source:\n\n```bash\ngit clone https://github.com/ronigold/TokenSHAP.git\ncd TokenSHAP\npip install -r requirements.txt\n```\n\n## Usage\n\nTokenSHAP is easy to use with any model that supports SHAP value computation for NLP. Here's a quick guide:\n\n```python\n# Import TokenSHAP\nfrom token_shap import TokenSHAP\n\n# Initialize with your model & tokenizer\nmodel_name = \"llama3\"\ntokenizer_path = \"NousResearch/Hermes-2-Theta-Llama-3-8B\"\nollama_api_url = \"http://localhost:11434\"  # Default Ollama API URL\ntshap = TokenSHAP(model_name, tokenizer_path, ollama_api_url)\n\n# Analyze token importance\nprompt = \"Why is the sky blue?\"\nresults = tshap.analyze(prompt)\n```\n\nResults will include SHAP values for each token, indicating their contribution to the model's output.\n\nFor a more detailed example and usage guide, please refer to our [TokenSHAP Examples notebook](notebooks/TokenShap%20Examples.ipynb) in the repository.\n\n## Key Features\n\n- **Interpretability for LLMs:** Delivers a methodical approach to understanding how individual components of input affect LLM outputs.\n- **Monte Carlo Shapley Estimation:** Utilizes a Monte Carlo approach to efficiently compute Shapley values for tokens, suitable for extensive texts and large models.\n- **Versatile Application:** Applicable across various LLM architectures and prompt types, from factual questions to complex multi-sentence inputs.\n\n## Contributing\n\nWe welcome contributions from the community, whether it's adding new features, improving documentation, or reporting bugs. Here's how you can contribute:\n\n1. Fork the project\n2. Create your feature branch (`git checkout -b feature/YourAmazingFeature`)\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the branch (`git push origin feature/YourAmazingFeature`)\n5. Open a pull request\n\n## Support\n\nFor support, please email roni.goldshmidt@getnexar.com or miriam.horovicz@ni.com, or open an issue on our GitHub project page.\n\n## License\n\nTokenSHAP is distributed under the MIT License. See `LICENSE` file for more information.\n\n## Citation\n\nIf you use TokenSHAP in your research, please cite our paper:\n\n```\n@article{goldshmidt2024tokenshap,\n  title={TokenSHAP: Interpreting Large Language Models with Monte Carlo Shapley Value Estimation},\n  author={Goldshmidt, Roni and Horovicz, Miriam},\n  journal={arXiv preprint arXiv:2407.10114},\n  year={2024}\n}\n```\n\nYou can find the full paper on arXiv: [https://arxiv.org/abs/2407.10114](https://arxiv.org/abs/2407.10114)\n\n## Authors\n\n- **Roni Goldshmidt**\n- **Miriam Horovicz**\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Paper Implementation: \"TokenSHAP: Interpreting Large Language Models with Monte Carlo Shapley Value Estimation\"",
    "version": "0.2.2",
    "project_urls": {
        "Bug Reports": "https://github.com/ronigold/TokenSHAP/issues",
        "Homepage": "https://github.com/ronigold/TokenSHAP",
        "Source": "https://github.com/ronigold/TokenSHAP"
    },
    "split_keywords": [
        "shapley",
        "values",
        "interpretation",
        "nlp",
        "ai"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bce1e15d744ed22335734670bc1555dc40fad8850752a061880ed3025350f371",
                "md5": "803997cd59dc222254630b17f42efca8",
                "sha256": "15b4fc9a37a1fd0b62da8e44d658503abc8d81937afb1599186bc79ac8549e2f"
            },
            "downloads": -1,
            "filename": "TokenSHAP-0.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "803997cd59dc222254630b17f42efca8",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 12178,
            "upload_time": "2024-11-03T13:36:13",
            "upload_time_iso_8601": "2024-11-03T13:36:13.506286Z",
            "url": "https://files.pythonhosted.org/packages/bc/e1/e15d744ed22335734670bc1555dc40fad8850752a061880ed3025350f371/TokenSHAP-0.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7254b13651f08011da61390b74b529a48f33caba455be969c8199006a8986097",
                "md5": "87f18dc0e5f2cc11d255dd714f8b4bfb",
                "sha256": "1b696d573de0e2d50037791d930926dd82a5bb5274e4b13b2a41a60574c81262"
            },
            "downloads": -1,
            "filename": "tokenshap-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "87f18dc0e5f2cc11d255dd714f8b4bfb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 10349,
            "upload_time": "2024-11-03T13:36:14",
            "upload_time_iso_8601": "2024-11-03T13:36:14.691141Z",
            "url": "https://files.pythonhosted.org/packages/72/54/b13651f08011da61390b74b529a48f33caba455be969c8199006a8986097/tokenshap-0.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-03 13:36:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ronigold",
    "github_project": "TokenSHAP",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "tokenshap"
}
        
Elapsed time: 1.34399s