autocontrastive-gen

Name	autocontrastive-gen JSON
Version	0.2.0 JSON
	download
home_page	https://github.com/IBM/auto-contrastive-generation
Summary	Auto-Contrastive Text Generation
upload_time	2023-05-03 06:32:33
maintainer
docs_url	None
author	IBM Research
requires_python	>=3.9
license	Apache License 2.0
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # [![version](https://img.shields.io/pypi/v/autocontrastive-gen)](https://pypi.org/project/autocontrastive-gen/)  ![license](https://img.shields.io/github/license/IBM/auto-contrastive-generation)  ![python](https://img.shields.io/badge/python-3.9%20|%203.10-blue)
# Auto-Contrastive Generation

Code to experiment with **multi-exit** text generation, and to reproduce the **Auto-Contrastive Decoding** experiments from [Gera et al. (2023)](#reference). 

Using this library you can:

1. Run inference on multi-exit generative language models, either using a *specific model exit layer*, or contrasting between model layers with *Auto-Contrastive Decoding*;
2. Train new multi-exit models, either for language modeling or for a specific task;
3. Try out new methods and algorithms for combining and contrasting the outputs of different model layers.


**Table of contents**

[Quick start](#quick-start)

[Setting the Multi-Exit Configuration](#setting-the-multi-exit-configuration)

[Running language modeling benchmarks](#running-language-modeling-benchmarks)

[Pre-trained model checkpoints](#pre-trained-model-checkpoints)

[Reference](#reference)

[License](#license)

## Quick start
1. Install the library with `pip install autocontrastive-gen`
2. Choose your desired [multi-exit parameters](#setting-the-multi-exit-configuration) - which model exit layer(s) do you want to use, and how?
3. Load a pre-trained multi-exit model and use it however you wish, within your own workflow

For instance, the following code will initialize the multi-exit [GPT2 model we release](#pre-trained-model-checkpoints) to use Auto-Contrastive Decoding:
```python
from autocontrastive_gen.modeling.configuration import MultiExitConfiguration
from autocontrastive_gen.modeling.auto_model import AutoMultiExitModel

# initialize a pre-trained multi-exit model to use auto-contrast between layer 24 and layer 12
multi_exit_config = MultiExitConfiguration(use_original_head=False, 
                                           contrast_layer_indices=(24, 12))
model = AutoMultiExitModel.from_pretrained("IBM/gpt2-medium-multiexit", multi_exit_config=multi_exit_config)
```

Then, the initialized model can be used as usual through the `transformers` library. For example, this code will run text generation using the initialized model:

```python
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("IBM/gpt2-medium-multiexit")
prompt = tokenizer("humpty dumpty sat on", return_tensors='pt')
generated_ids = model.generate(**prompt, max_new_tokens=15)
print(tokenizer.batch_decode(generated_ids))
```
Similarly, you can **train the model** with the `transformers` library just as you would do with any other model.

Note that the model behavior in inference and/or training depends on your [choice of parameters](#setting-the-multi-exit-configuration) when initializing the `MultiExitConfiguration`.

## Setting the Multi-Exit Configuration
Model behavior is determined by the `MultiExitConfiguration` used to initialize it. Most of the the configuration parameters are related to inference-time text generation behavior, but some are relevant for model training as well. The following parameters can be set:

- `lm_head_layer_indices: Tuple[int, ...]`: the indices of model layers which are connected to language modeling exit heads. As this is a basic characteristic of the model, this parameter only needs to be set *once* for initial pre-training of these exit heads. *Otherwise (e.g., when using one of the [released models](#pre-trained-model-checkpoints)), there is no need to specify this parameter as it is read directly from the model's config file*.
- `freeze_parameters: bool`: whether to freeze the language model parameters when training the model. You may wish to set this to True if training new exit layers for an existing (single-exit) pre-trained model checkpoint, but the default (`False`) is applicable for most use cases.
- `use_original_head: bool`: whether to use the original language modeling head of the pre-trained checkpoint for text generation. Setting this parameter to `True` *turns off all special model behavior and ignores the additional exit heads*, and thus also renders the parameters below irrelevant.
- `output_layer_index: int`: choose a *single* specific model exit for generation instead of the original (top layer) exit head.
- `contrast_layer_indices: Tuple[Union[int, str], int]`: use an *auto-contrastive generation* setting, contrasting between the two specified exit layers. To perform contrast with the original LM head, pass the string 'original'. For example: `(24, 18)` will perform contrast between the exits at layers 24 and 18, and `('original', 12)` will perform contrast between the original LM head and the head at exit 12.
- `contrast_function: Callable` *(Advanced)*: enables setting a custom contrast function, that gets logits of next-token predictions from two exit heads (i.e., those specified in `contrast_layer_indices`) and returns a modified set of predictions. By default, the `calculate_contrasted_logits` function is used, which performs the contrast calculation described in [Gera et al. (2023)](#reference).


## Running language modeling benchmarks
For GPT-family models, it is possible to run benchmarks for a given multi-exit model with the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) library, for different generation settings.

This is done via `python -m autocontrastive_gen.evaluation.lm_eval_harness.run_lm_eval`, by specifying `--model multi_exit_gpt` and adding the desired [multi-exit configuration settings](#setting-the-multi-exit-configuration) to the `--model_args` runtime argument.

_For example:_
```powershell
python -m autocontrastive_gen.evaluation.lm_eval_harness.run_lm_eval \
--tasks lambada_openai \
--model multi_exit_gpt \
--model_args pretrained=IBM/gpt2-medium-multiexit,use_original_head=False,contrast_layer_indices='original;12' \
--output_path my_output_path
```
_For details on the benchmark tasks available and on additional runtime arguments of the evaluation script, refer to https://github.com/EleutherAI/lm-evaluation-harness#basic-usage._


## Pre-trained model checkpoints
We release the following model checkpoints:
- [**ibm/gpt2-medium-multiexit**](https://huggingface.co/ibm/gpt2-medium-multiexit) - identical to the [GPT-2 Medium](https://huggingface.co/gpt2-medium) pre-trained model checkpoint, but with 12 newly-trained linear exit heads.
- [**ibm/gpt-neo-125m-multiexit**](https://huggingface.co/ibm/gpt-neo-125m-multiexit) - identical to the [GPT Neo 125M](https://huggingface.co/EleutherAI/gpt-neo-125m) pre-trained model checkpoint, but with 6 newly-trained linear exit heads.

The new heads were trained on the English portion of the [CC-100](https://huggingface.co/datasets/cc100) dataset. For further details, refer to Appendix A of the paper. 

## Reference
Ariel Gera, Roni Friedman, Ofir Arviv, Chulaka Gunasekara, Benjamin Sznajder, Noam Slonim and Eyal Shnarch (2023). 
[The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers](https://arxiv.org/abs/2305.01628). ACL 2023.

Please cite: 
```
@inproceedings{gera2023autocontrastive,
  title={The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers},
  author={Gera, Ariel and Friedman, Roni and Arviv, Ofir and Gunasekara, Chulaka and Sznajder, Benjamin and Slonim, Noam and Shnarch, Eyal},
  booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  month={july},
  address={Toronto, Canada},
  year={2023}
}
```

## License
This work is released under the Apache 2.0 license. The full text of the license can be found in [LICENSE](LICENSE).

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/IBM/auto-contrastive-generation",
    "name": "autocontrastive-gen",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "",
    "keywords": "",
    "author": "IBM Research",
    "author_email": "ariel.gera1@ibm.com",
    "download_url": "https://files.pythonhosted.org/packages/7d/be/9c0f48f0c622cff92e8fd2eb95fbdaf6a73424053bb2d6a4d89e95ac4a8e/autocontrastive-gen-0.2.0.tar.gz",
    "platform": null,
    "description": "# [![version](https://img.shields.io/pypi/v/autocontrastive-gen)](https://pypi.org/project/autocontrastive-gen/)  ![license](https://img.shields.io/github/license/IBM/auto-contrastive-generation)  ![python](https://img.shields.io/badge/python-3.9%20|%203.10-blue)\n# Auto-Contrastive Generation\n\nCode to experiment with **multi-exit** text generation, and to reproduce the **Auto-Contrastive Decoding** experiments from [Gera et al. (2023)](#reference). \n\nUsing this library you can:\n\n1. Run inference on multi-exit generative language models, either using a *specific model exit layer*, or contrasting between model layers with *Auto-Contrastive Decoding*;\n2. Train new multi-exit models, either for language modeling or for a specific task;\n3. Try out new methods and algorithms for combining and contrasting the outputs of different model layers.\n\n\n**Table of contents**\n\n[Quick start](#quick-start)\n\n[Setting the Multi-Exit Configuration](#setting-the-multi-exit-configuration)\n\n[Running language modeling benchmarks](#running-language-modeling-benchmarks)\n\n[Pre-trained model checkpoints](#pre-trained-model-checkpoints)\n\n[Reference](#reference)\n\n[License](#license)\n\n## Quick start\n1. Install the library with `pip install autocontrastive-gen`\n2. Choose your desired [multi-exit parameters](#setting-the-multi-exit-configuration) - which model exit layer(s) do you want to use, and how?\n3. Load a pre-trained multi-exit model and use it however you wish, within your own workflow\n\nFor instance, the following code will initialize the multi-exit [GPT2 model we release](#pre-trained-model-checkpoints) to use Auto-Contrastive Decoding:\n```python\nfrom autocontrastive_gen.modeling.configuration import MultiExitConfiguration\nfrom autocontrastive_gen.modeling.auto_model import AutoMultiExitModel\n\n# initialize a pre-trained multi-exit model to use auto-contrast between layer 24 and layer 12\nmulti_exit_config = MultiExitConfiguration(use_original_head=False, \n                                           contrast_layer_indices=(24, 12))\nmodel = AutoMultiExitModel.from_pretrained(\"IBM/gpt2-medium-multiexit\", multi_exit_config=multi_exit_config)\n```\n\nThen, the initialized model can be used as usual through the `transformers` library. For example, this code will run text generation using the initialized model:\n\n```python\nfrom transformers import AutoTokenizer\n\ntokenizer = AutoTokenizer.from_pretrained(\"IBM/gpt2-medium-multiexit\")\nprompt = tokenizer(\"humpty dumpty sat on\", return_tensors='pt')\ngenerated_ids = model.generate(**prompt, max_new_tokens=15)\nprint(tokenizer.batch_decode(generated_ids))\n```\nSimilarly, you can **train the model** with the `transformers` library just as you would do with any other model.\n\nNote that the model behavior in inference and/or training depends on your [choice of parameters](#setting-the-multi-exit-configuration) when initializing the `MultiExitConfiguration`.\n\n## Setting the Multi-Exit Configuration\nModel behavior is determined by the `MultiExitConfiguration` used to initialize it. Most of the the configuration parameters are related to inference-time text generation behavior, but some are relevant for model training as well. The following parameters can be set:\n\n- `lm_head_layer_indices: Tuple[int, ...]`: the indices of model layers which are connected to language modeling exit heads. As this is a basic characteristic of the model, this parameter only needs to be set *once* for initial pre-training of these exit heads. *Otherwise (e.g., when using one of the [released models](#pre-trained-model-checkpoints)), there is no need to specify this parameter as it is read directly from the model's config file*.\n- `freeze_parameters: bool`: whether to freeze the language model parameters when training the model. You may wish to set this to True if training new exit layers for an existing (single-exit) pre-trained model checkpoint, but the default (`False`) is applicable for most use cases.\n- `use_original_head: bool`: whether to use the original language modeling head of the pre-trained checkpoint for text generation. Setting this parameter to `True` *turns off all special model behavior and ignores the additional exit heads*, and thus also renders the parameters below irrelevant.\n- `output_layer_index: int`: choose a *single* specific model exit for generation instead of the original (top layer) exit head.\n- `contrast_layer_indices: Tuple[Union[int, str], int]`: use an *auto-contrastive generation* setting, contrasting between the two specified exit layers. To perform contrast with the original LM head, pass the string 'original'. For example: `(24, 18)` will perform contrast between the exits at layers 24 and 18, and `('original', 12)` will perform contrast between the original LM head and the head at exit 12.\n- `contrast_function: Callable` *(Advanced)*: enables setting a custom contrast function, that gets logits of next-token predictions from two exit heads (i.e., those specified in `contrast_layer_indices`) and returns a modified set of predictions. By default, the `calculate_contrasted_logits` function is used, which performs the contrast calculation described in [Gera et al. (2023)](#reference).\n\n\n## Running language modeling benchmarks\nFor GPT-family models, it is possible to run benchmarks for a given multi-exit model with the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) library, for different generation settings.\n\nThis is done via `python -m autocontrastive_gen.evaluation.lm_eval_harness.run_lm_eval`, by specifying `--model multi_exit_gpt` and adding the desired [multi-exit configuration settings](#setting-the-multi-exit-configuration) to the `--model_args` runtime argument.\n\n_For example:_\n```powershell\npython -m autocontrastive_gen.evaluation.lm_eval_harness.run_lm_eval \\\n--tasks lambada_openai \\\n--model multi_exit_gpt \\\n--model_args pretrained=IBM/gpt2-medium-multiexit,use_original_head=False,contrast_layer_indices='original;12' \\\n--output_path my_output_path\n```\n_For details on the benchmark tasks available and on additional runtime arguments of the evaluation script, refer to https://github.com/EleutherAI/lm-evaluation-harness#basic-usage._\n\n\n## Pre-trained model checkpoints\nWe release the following model checkpoints:\n- [**ibm/gpt2-medium-multiexit**](https://huggingface.co/ibm/gpt2-medium-multiexit) - identical to the [GPT-2 Medium](https://huggingface.co/gpt2-medium) pre-trained model checkpoint, but with 12 newly-trained linear exit heads.\n- [**ibm/gpt-neo-125m-multiexit**](https://huggingface.co/ibm/gpt-neo-125m-multiexit) - identical to the [GPT Neo 125M](https://huggingface.co/EleutherAI/gpt-neo-125m) pre-trained model checkpoint, but with 6 newly-trained linear exit heads.\n\nThe new heads were trained on the English portion of the [CC-100](https://huggingface.co/datasets/cc100) dataset. For further details, refer to Appendix A of the paper. \n\n## Reference\nAriel Gera, Roni Friedman, Ofir Arviv, Chulaka Gunasekara, Benjamin Sznajder, Noam Slonim and Eyal Shnarch (2023). \n[The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers](https://arxiv.org/abs/2305.01628). ACL 2023.\n\nPlease cite: \n```\n@inproceedings{gera2023autocontrastive,\n  title={The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers},\n  author={Gera, Ariel and Friedman, Roni and Arviv, Ofir and Gunasekara, Chulaka and Sznajder, Benjamin and Slonim, Noam and Shnarch, Eyal},\n  booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},\n  month={july},\n  address={Toronto, Canada},\n  year={2023}\n}\n```\n\n## License\nThis work is released under the Apache 2.0 license. The full text of the license can be found in [LICENSE](LICENSE).\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "Auto-Contrastive Text Generation",
    "version": "0.2.0",
    "project_urls": {
        "Homepage": "https://github.com/IBM/auto-contrastive-generation"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "119196ac20f5b1056cec7b6ebf483af4d132b401180c3a28acd8e08b366a6e02",
                "md5": "9f92ffeeafe9e7ca294484559d6b13e2",
                "sha256": "c2b0f4236a4136cc1e63ac48f3d9f25e239061db6c3bbddbb89fa6ec0f553f04"
            },
            "downloads": -1,
            "filename": "autocontrastive_gen-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9f92ffeeafe9e7ca294484559d6b13e2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 40568,
            "upload_time": "2023-05-03T06:32:31",
            "upload_time_iso_8601": "2023-05-03T06:32:31.458016Z",
            "url": "https://files.pythonhosted.org/packages/11/91/96ac20f5b1056cec7b6ebf483af4d132b401180c3a28acd8e08b366a6e02/autocontrastive_gen-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7dbe9c0f48f0c622cff92e8fd2eb95fbdaf6a73424053bb2d6a4d89e95ac4a8e",
                "md5": "9e0c129e6076a1abfb6b6dd7585686af",
                "sha256": "4c09b8398c245dd51700e4e08d25483fe1e6bedf556b43f5300abe196aa13e79"
            },
            "downloads": -1,
            "filename": "autocontrastive-gen-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "9e0c129e6076a1abfb6b6dd7585686af",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 28690,
            "upload_time": "2023-05-03T06:32:33",
            "upload_time_iso_8601": "2023-05-03T06:32:33.429158Z",
            "url": "https://files.pythonhosted.org/packages/7d/be/9c0f48f0c622cff92e8fd2eb95fbdaf6a73424053bb2d6a4d89e95ac4a8e/autocontrastive-gen-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-03 06:32:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "IBM",
    "github_project": "auto-contrastive-generation",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "autocontrastive-gen"
}

IBM Research