zebra-qa

Name	zebra-qa JSON
Version	1.0.1 JSON
	download
home_page	https://github.com/sapienzanlp/zebra
Summary	ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering
upload_time	2024-12-26 15:24:18
maintainer	None
docs_url	None
author	Francesco Molfese
requires_python	>=3.10
license	Creative Commons Attribution-NonCommercial-ShareAlike 4.0
keywords	nlp sapienza sapienzanlp deep learning transformer pytorch retriever question answering commonsense large language models
VCS
bugtrack_url
requirements	torch transformers accelerate datasets rich scikit-learn tiktoken pytest bitsandbytes lightning hydra-core hydra_colorlog wandb art pprintpp colorama jsonlines loguru goldenretriever-core setuptools twine flash-attn
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            
<div align="center">
  <img src="https://github.com/SapienzaNLP/zebra/blob/master/assets/zebra.png?raw=true" width="100" height="100">
</div>

<div align="center">

# ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering

[![Conference](https://img.shields.io/badge/EMNLP-2024-4b44ce)](https://2024.emnlp.org/)
[![arXiv](https://img.shields.io/badge/arXiv-paper-b31b1b.svg)](https://arxiv.org/abs/2410.05077)
[![License: CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-sa/4.0/)
[![Hugging Face Collection](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Collection-FCD21D)](https://huggingface.co/collections/sapienzanlp/zebra-66e3ec50c8ce415ea7572d0e)
[![PyTorch](https://img.shields.io/badge/PyTorch-orange?logo=pytorch)](https://pytorch.org/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/release/python-310/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000)](https://github.com/psf/black)
</div>

<div align="center"> A retrieval augmentation framework for zero-shot commonsense question answering with LLMs. </div>

## 🛠️ Installation

Installation from PyPi

```bash
pip install zebra-qa
```

Installation from source

```bash
git clone https://github.com/sapienzanlp/zebra.git
cd zebra
conda create -n zebra python==3.10
conda activate zebra
pip install -e .
```

## 🚀 Quick Start

ZEBRA is a plug-and-play retrieval augmentation framework for **Commonsense Question Answering**. \
It is composed of three pipeline stages: *example retrieval*, *knowledge generation* and *informed reasoning*.

- Example retrieval: given a question, we retrieve relevant examples of question-knowledge pairs from a large collection
- Knowledge generation: we prompt an LLM to generate useful explanations for the given input question by leveraging the relationships in the retrieved question-knowledge pairs.
- Informed reasoning: we prompt the same LLM for the question answering task by taking advantage of the previously generated explanations.

Here is an example of how to use ZEBRA for question answering:

```python
from zebra import Zebra

# Load Zebra with language model, retriever, document index and explanations.
zebra = Zebra(
  model="meta-llama/Meta-Llama-3-8B-Instruct",
  retriever="sapienzanlp/zebra-retriever-e5-base-v2",
  document_index="sapienzanlp/zebra-kb"
)

# Provide a question and answer choices.
questions = [
    "What should you do if you see someone hurt and in need of help?",
    "If your friend is upset, what is the best way to support them?",
    "What should you do if your phone battery is running low in a public place?",
    "What should you do if you are running late for an important meeting?",
]

choices = [
    ["Walk away.", "Call for help.", "Take a photo for social media."],
    ["Listen to them and offer comfort.", "Tell them they are overreacting.", "Ignore them and walk away."],
    ["Borrow a stranger's phone.", "Use public charging station.", "Leave your phone unattended while it charges."],
    ["Rush through traffic.", "Call and inform them you will be late.", "Do not show up at all."],
]

# Generate knowledge and perform question answering.
zebra_output = zebra.pipeline(questions=questions, choices=choices)
```

The output contains, for each question, a list of generated explanations and the predicted answer:

```bash
  ZebraOutput(
    explanations=[
      [
        "Walking away would be neglecting the person's need for help and potentially putting them in danger.",
        'Calling for help, such as 911, is the most effective way to get the person the assistance they need.',
        "Taking a photo for social media might spread awareness, but it's not a direct way to help the person in need."
      ],
      [
        'Listening and offering comfort shows empathy and understanding.', 
        "Telling someone they're overreacting can be dismissive and unhelpful.", 
        'Ignoring someone in distress can be hurtful and unkind.'
      ],
      [
        "Borrow a stranger's phone: Unwise, as it's a security risk and may lead to theft or damage.", 
        "Use public charging station: Safe and convenient, as it's a designated charging area.", 
        'Leave your phone unattended while it charges: Not recommended, as it may be stolen or damaged.'
      ],
      [
        'Rush through traffic: This option is risky and may lead to accidents or stress.', 
        'Call and inform them you will be late: This is the most likely option, as it shows respect for the meeting and allows for adjustments.', 
        'Do not show up at all: This is unacceptable, as it shows disrespect for the meeting and may damage relationships.'
      ],
    ],
    answers=[
      "Call for help.",
      "Listen to them and offer comfort.",
      "Use public charging station.",
      "Call and inform them you will be late."
    ],
  )
```

You can also call the `zebra.pipeline` method with the `return_dict` parameter set to `True` to ask ZEBRA to return also the retrieved examples along with their explanations.

## Retriever Model

We trained our retriever on the CSQA dataset [(Talmor et. al 2019)](https://aclanthology.org/N19-1421/).
The retriever model can be found on 🤗 Hugging Face.

- 🦓 **Zebra Retriever**: [`sapienzanlp/zebra-retriever-e5-base-v2`](https://huggingface.co/sapienzanlp/zebra-retriever-e5-base-v2)

## Data

ZEBRA comes with a knowledge base called ZEBRA-KB containing examples of questions along with their automatically-generated list of explanations. To create the explanations, we prompt Google Gemini-1.5-Flash to generate useful knowledge given a question together with its choices and correct answer. The examples are taken from the training sets of the following question answering benchmarks:

| Dataset | Description | Link |
|---------|-------------|------|
| CSQA    | CommonsenseQA is a dataset for commonsense question answering. | [CSQA](https://www.tau-nlp.org/commonsenseqa) |
| ARC     | AI2 Reasoning Challenge is a dataset for science question answering. | [ARC](https://allenai.org/data/arc) |
| OBQA    | OpenBookQA is a dataset for open book question answering. | [OBQA](https://allenai.org/data/open-book-qa) |
| PIQA    | Physical Interaction QA is a dataset for physical commonsense reasoning. | [PIQA](https://yonatanbisk.com/piqa/) |
| QASC    | Question Answering via Sentence Composition is a dataset for multi-hop question answering. | [QASC](https://allenai.org/data/qasc) |
| CSQA2   | CommonsenseQA 2.0 is a dataset for commonsense question answering. | [CSQA2](https://github.com/allenai/csqa2) |
| WG      | Winograd Schema Challenge is a dataset for commonsense reasoning. | [WG](https://github.com/allenai/winogrande) |


This KB is where the retriever fetches relevant examples for the input question. The KB is organized in two components: the explanations and the document indexes.

The explanations are organized in splits, one for each training set (e.g. `csqa-train-gemini`). Each sample contains an ID (compliant with the original sample ID in the relative training set) and a list of explanations. There is also a dedicated split which contains all the samples of every split. You can access the explanations at the following link:

- **ZEBRA-KB Explanations** [`sapienzanlp/zebra-kb-explanations`](https://huggingface.co/datasets/sapienzanlp/zebra-kb-explanations)

Alternatively, you can also download the explanations on your local machine from the following [Google Drive link](https://drive.google.com/file/d/1eKBB1DaQQx-s5ibiZrrgfZpfDwDMxxWB/view?usp=sharing). For convenience, we provide a dedicated folder to store the downloaded explanations: `data/explanations`.

The document indexes contain the examples along with their embeddings. These indexes are needed to fetch relevant examples for a given input question through the retriever. Once the examples are retrieved, their IDs will be matched against the ones contained in the relative explanations split to create the desired input for the knowledge generation step, that is, a list of $k$ examples with their associated explanations.

Similar to the explanations, the document indexes are organized in splits, one for each training set. You can browse the available splits at the following [HuggingFace Collection link](https://huggingface.co/collections/sapienzanlp/zebra-66e3ec50c8ce415ea7572d0e).

We also provide a document index containing the splits of every training set:

- **ZEBRA-KB Document Index** [`sapienzanlp/zebra-kb`](https://huggingface.co/sapienzanlp/zebra-kb)

## Reproducibility

If you wish to reproduce our results, we provide the output of our [`retriever`](https://huggingface.co/sapienzanlp/zebra-retriever-e5-base-v2) for all the datasets at the following [Google Drive link](https://drive.google.com/file/d/1HFk_1pnIBN-3bDGm5Bx7d34mPpDjVHRz/view?usp=drive_link).

After you have downloaded the zip file, please unzip it and move its contents to the `data/retriever/outputs` folder. Then, you should be able to see something like `data/retriever/outputs/{dataset}` with some .jsonl files inside. Each .jsonl file contains the top $k=100$ examples fetched by the retriever for each input question of the dataset. The naming convention of the .jsonl file is: `{dataset}_{split}.{dataset}_train.jsonl`, where `{dataset}_{split}` specifies the dataset from which the input questions are drawn from (e.g. `csqa_dev`), while `{dataset}_train` specifies the document index in ZEBRA-KB from which the examples are drawn from (e.g. `csqa_train`).

We provide a script to run the entire ZEBRA pipeline offline over a dataset using a specific LLM. You can find the available datasets files for evaluation under the `data/datasets` folder. Once you have placed the retriever's outputs in the dedicated folder, the script expects only one input parameter: the model to be evaluated using the relative HuggingFace model ID (e.g. `meta-llama/Meta-Llama-3-8B-Instruct`).

Example on CSQA:

```bash
bash scripts/evaluation/csqa-dev.sh meta-llama/Meta-Llama-3-8B-Instruct
```

We also provide a script to run ZEBRA over all the datasets.

```bash
bash scripts/evaluation/zebra.sh meta-llama/Meta-Llama-3-8B-Instruct
```

These scripts will call **scripts/evaluation/run_zebra.py** with a predefined set of parameters.
If you wish to run additional experiments by modifying these parameters, you can either modify them directly inside the bash scripts or call the python script with command line arguments.

```bash
python scripts/evaluation/run_zebra.py --help

  Usage: run_zebra.py [ARGUMENTS] [OPTIONS] 

╭─ Arguments ────────────────────────────────────────────────────────────╮
│ *    model_name              TEXT  [default: None] [required]          │
│ *    retriever_output_path   TEXT  [default: None] [required]          │
│ *    data_path               TEXT  [default: None] [required]          │
│ *    explanations_path       TEXT  [default: None] [required]          │
│ *    dataset_tag             TEXT  [default: None] [required]          │
│ *    output_dir              TEXT  [default: None] [required]          │
╰────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────╮
│ --fewshot_data_path                        TEXT     [default: None]    │
│ --explanations_split                       TEXT     [default: None]    │
│ --plain                                    BOOL     [default: False]   │
│ --oracle                                   BOOL     [default: False]   │
│ --examples                                 BOOL     [default: False]   │
│ --max_generated_knowledge                  INTEGER  [default: None]    │
│ --add_negative_explanations                BOOL     [default: False]   │
│ --num_kg_examples                          INTEGER  [default: 1]       │
│ --num_qa_examples                          INTEGER  [default: 0]       │
│ --limit_samples                            INTEGER  [default: None]    │
│ --device                                   TEXT     [default: 'cuda']  │
│ --help                                     Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────╯
```

For example:

```bash
python scripts/evaluation/run_zebra.py \
  --model_name meta-llama/Meta-Llama-3-8B-Instruct \
  --data_path data/datasets/csqa/csqa-dev.jsonl \
  --retriever_output_path data/retriever/outputs/csqa/csqa_dev.csqa_train.jsonl \
  --fewshot_data_path data/datasets/csqa/csqa-train.jsonl \
  --explanations_path sapienzanlp/zebra-kb-explanations \
  --explanations_split csqa-train-gemini
  --dataset_tag csqa \ 
  --output_dir results/zebra/csqa \
  --plain \
  --oracle \
  --examples \
  --num_qa_examples 0 \
  --num_kg_examples 5
```

## 🦓 Zebra Pipeline

In the following sections, we provide a step-by-step guide on how to prepare the data to test ZEBRA on your dataset, train your own ZEBRA retriever and evaluate the models.

### Data Evaluation Format

To be able to run ZEBRA on your dataset, the data should have the following structure:

```python
{
  "id": str,  # Unique identifier for the question
  "question": dict  # Dictionary of the question
    {
      "stem": str, # The question text
      "choices": list[dict] # The object containing the choices for the question
        [
          {
            "label": str, # A Label for every choice, like "A", "B", "C" etc.
            "text": str # The choice text
          },
          ...
        ]
    }
  "answerKey": str # The correct label among the choices
}
```

All the datasets in the `data/datasets` folder already match this format. For convenience, we provide a script to parse a dataset in the desired format: `scripts/data/parse_dataset.py`.

### Retriever Training

Our retriever model can be found at the link in the [Models](#models) section.
We trained our retriever on the CSQA dataset [(Talmor et. al 2019)](https://aclanthology.org/N19-1421/). In particular, we format both the training and validation datasets as follows: 

```python
{
  "question": str  # The input passage
  "positive_ctxs": list[dict] # List of positive passages
    [
      {
        "text": str # The text of the positive passage
      },
      ...
    ]
}
```

Where each *question* and *positive* passage is formatted as:

```string
Q [SEP] C1 [SEP] C2 ... [SEP] Cn 
```

If you wish to train your own retriever with this dataset, you can run:

```bash
bash scripts/retriever/train.sh
```

The script will call **scripts/retriever/train.py** with a predefined set of parameters.
The data needed to train the retriever can be found in the `data/retriever/datasets` folder.
If you wish to run additional experiments by modifying these parameters, you can either modify them directly inside the bash scripts or call the python script with command line arguments.

```bash
python scripts/retriever/train.py --help

  Usage: train.py [ARGUMENTS] [OPTIONS] 

╭─ Arguments ───────────────────────────────────────────────────────────────────────────────╮
│ *    train_data_path         TEXT  [default: None] [required]                             │
│ *    dev_data_path           TEXT  [default: None] [required]                             │
╰───────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ─────────────────────────────────────────────────────────────────────────────────╮
│ --question_encoder                         TEXT     [default: intfloat/e5-base-v2]        │
│ --passage_encoder                          TEXT     [default: None]                       │
│ --document_index                           INTEGER  [default: None]                       │
│ --device                                   TEXT     [default: cuda]                       │
│ --precision                                TEXT     [default: 16]                         │
│ --question_batch_size                      INTEGER  [default: 64]                         │
│ --passage_batch_size                       INTEGER  [default: 200]                        │
│ --max_question_length                      INTEGER  [default: 256]                        │
│ --max_passage_length                       INTEGER  [default: 256]                        │
│ --max_steps                                INTEGER  [default: 25000]                      │
│ --num_workers                              INTEGER  [default: 4]                          │
│ --max_hard_negatives_to_mine               INTEGER  [default: 0]                          │
│ --wandb_online_mode                        BOOL     [default: False]                      │
│ --wandb_log_model                          BOOL     [default: False]                      │
│ --wandb_project_name                       TEXT     [default: zebra-retriever]            │
│ --wandb_experiment_name                    TEXT     [default: zebra-retriever-e5-base-v2] │
│ --help                                     Show this message and exit.                    │
╰───────────────────────────────────────────────────────────────────────────────────────────╯
```

For example:

```bash
python scripts/retriever/train.py \
  --train_data_path data/retriever/datasets/train.jsonl \
  --dev_data_path data/retriever/datasets/dev.jsonl
```


### 🦓 ZEBRA KB

As previously explained under the [Data](#data) section, the ZEBRA pipeline requires a document index containing examples such that the retriever can fetch the most relevant ones for an input question. The document index can contain:
- examples from the training set of the dataset under evaluation.
- examples from the training set of another dataset.
- examples from multiple training sets of a list of datasets.

You can either access the precomputed document indexes using the link provided in the [Data](#data) section, or you can generate your own document index by running:

```bash
python scripts/retriever/create_index.py --help

  Usage: create_index.py [ARGUMENTS] [OPTIONS] 

╭─ Arguments ─────────────────────────────────────────────────────────────────────────────╮
│ *    data_paths              TEXT  [default: None] [required]                           │
│ *    output_dir              TEXT  [default: None] [required]                           │
╰─────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ───────────────────────────────────────────────────────────────────────────────╮
│ --retriever_path             TEXT     [default: sapienzanlp/zebra-retriever-e5-base-v2] │
│ --batch_size                 INTEGER  [default: 512]                                    │
│ --num_workers                INTEGER  [default: 4]                                      │
│ --max_length                 INTEGER  [default: 512]                                    │
│ --device                     TEXT     [default: cuda]                                   │
│ --precision                  TEXT     [default: fp32]                                   │
│ --help                       Show this message and exit.                                │
╰─────────────────────────────────────────────────────────────────────────────────────────╯
```

For example: 

```bash
python zebra/retriever/create_index.py \
  --retriever_path sapienzanlp/zebra-retriever-e5-base-v2 \
  --data_path data/datasets/csqa/csqa-train.jsonl \
  --output_dir data/retriever/zebra_kb/csqa_train \
```

For convenience, we provide a folder to store the document indexes: `data/retriever/zebra_kb`.

### Retriever Inference

Once you have a retriever and a document index of the ZEBRA KB, you can retrieve the most relevant examples from the KB for a given input question by running:

```bash
python scripts/retriever/retriever_inference.py --help

  Usage: retriever_inference.py [ARGUMENTS] [OPTIONS] 

╭─ Arguments ─────────────────────────────────────────────────────────────────────────────╮
│ *    data_path               TEXT  [default: None] [required]                           │
│ *    output_path             TEXT  [default: None] [required]                           │
│ *    document_index_path     TEXT  [default: None] [required]                           │
╰─────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ───────────────────────────────────────────────────────────────────────────────╮
│ --retriever_path             TEXT     [default: sapienzanlp/zebra-retriever-e5-base-v2] │
│ --batch_size                 INTEGER  [default: 64]                                     │
│ --k                          INTEGER  [default: 100]                                    │
│ --device                     TEXT     [default: cuda]                                   │
│ --help                       Show this message and exit.                                │
╰─────────────────────────────────────────────────────────────────────────────────────────╯
```

For example:

```bash
python zebra/retriever/retriever_inference.py \
  --retriever_path sapienzanlp/zebra-retriever-e5-base-v2 \
  --data_path data/datasets/csqa/csqa-dev.jsonl \
  --document_index_path sapienzanlp/zebra-kb-csqa-train\
  --output_path data/retriever/outputs/csqa/csqa_dev.csqa_train.jsonl \
```

For convenience, we provide a folder to store the retriever outputs: `data/retriever/outputs`.

Once you have obtained the retriever's output for a given dataset, you can run the code explained under the [Reproducibility](#Reproducibility) section to obtain the output and the scores of the ZEBRA pipeline on that dataset.

## 📊 Performance

We evaluate the performance of ZEBRA on 8 well-established commonsense question answering datasets. The following table shows the results (accuracy) of the models before / after the application of ZEBRA.

|          Model           |       CSQA      |      ARC-C      |      ARC-E      |       OBQA      |       PIQA      |       QASC      |      CSQA2      |        WG       |       AVG       |  
| ------------------------ | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- | 
| Mistral-7B-Instruct-v0.2 | 68.2 / **73.3** | 72.4	/ **75.2** | 85.8	/ **87.4** | 68.8	/ **75.8** | 76.1	/ **80.2** | 66.1	/ **68.3** | 58.5	/ **67.5** | 55.8 / **60.7** | 68.9 / **73.5** |
| Phi3-small-8k-Instruct   | 77.2 / **80.9** | 90.4 / **91.6** | 96.9	/ **97.7** | 90.4	/ **91.2** | 86.6	/ **88.1** | **83.5**	/ 81.0 | 68.0	/ **74.6** | 79.1	/ **81.0** | 84.0 / **85.8** | 
| Meta-Llama-3-8b-Instruct | 73.9 / **78.7** | 79.4 / **83.5** | 91.7	/ **92.9** | 73.4	/ **79.6** | 78.3	/ **84.0** | 78.2	/ **79.1** | 64.3	/ **69.4** | 56.2	/ **63.2** | 74.4 / **78.8** | 
| Phi3-mini-128k-Instruct  | 73.4 / **74.8** | 85.7	/ **88.0** | 95.4	/ **96.0** | 82.8	/ **87.8** | 80.4	/ **84.2** | **74.7**	/ 73.9 | 59.3	/ **64.6** | 67.3	/ **72.9** | 77.4 / **80.5** | 

You can also download the official paper results at the following [Google Drive Link](https://drive.google.com/file/d/1l7bY-TkqnmVQn5M5ynQfT-0upMcRlMnT/view?usp=drive_link).

## [Baselines](baselines/README.md)

## Cite this work

If you use any part of this work, please consider citing the paper as follows:

```bibtex
@inproceedings{molfese-etal-2024-zebra,
    title = "{ZEBRA}: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering",
    author = "Molfese, Francesco Maria  and
      Conia, Simone  and
      Orlando, Riccardo  and
      Navigli, Roberto",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.1251",
    doi = "10.18653/v1/2024.emnlp-main.1251",
    pages = "22429--22444"
}
```

## 🪪 License

The data and software are licensed under [Creative Commons Attribution-NonCommercial-ShareAlike 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/).

## Acknowledgements
We gratefully acknowledge CREATIVE (CRoss-modalunderstanding and gEnerATIon of Visual and tExtual content) for supporting this work. Simone Conia gratefully acknowledges the support of Future AI Research ([PNRR MUR project PE0000013-FAIR](https://fondazione-fair.it/en/)), which fully funds his fellowship at Sapienza University of Rome since October 2023.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/sapienzanlp/zebra",
    "name": "zebra-qa",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "NLP Sapienza sapienzanlp deep learning transformer pytorch retriever question answering commonsense large language models",
    "author": "Francesco Molfese",
    "author_email": "molfesefrancesco@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/0f/d8/22a5171b6be01cfce126b220966bcd78e6a4cccde54614bb0ff584112a15/zebra_qa-1.0.1.tar.gz",
    "platform": null,
    "description": "\n<div align=\"center\">\n  <img src=\"https://github.com/SapienzaNLP/zebra/blob/master/assets/zebra.png?raw=true\" width=\"100\" height=\"100\">\n</div>\n\n<div align=\"center\">\n\n# ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering\n\n[![Conference](https://img.shields.io/badge/EMNLP-2024-4b44ce)](https://2024.emnlp.org/)\n[![arXiv](https://img.shields.io/badge/arXiv-paper-b31b1b.svg)](https://arxiv.org/abs/2410.05077)\n[![License: CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-sa/4.0/)\n[![Hugging Face Collection](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Collection-FCD21D)](https://huggingface.co/collections/sapienzanlp/zebra-66e3ec50c8ce415ea7572d0e)\n[![PyTorch](https://img.shields.io/badge/PyTorch-orange?logo=pytorch)](https://pytorch.org/)\n[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/release/python-310/)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000)](https://github.com/psf/black)\n</div>\n\n<div align=\"center\"> A retrieval augmentation framework for zero-shot commonsense question answering with LLMs. </div>\n\n## \ud83d\udee0\ufe0f Installation\n\nInstallation from PyPi\n\n```bash\npip install zebra-qa\n```\n\nInstallation from source\n\n```bash\ngit clone https://github.com/sapienzanlp/zebra.git\ncd zebra\nconda create -n zebra python==3.10\nconda activate zebra\npip install -e .\n```\n\n## \ud83d\ude80 Quick Start\n\nZEBRA is a plug-and-play retrieval augmentation framework for **Commonsense Question Answering**. \\\nIt is composed of three pipeline stages: *example retrieval*, *knowledge generation* and *informed reasoning*.\n\n- Example retrieval: given a question, we retrieve relevant examples of question-knowledge pairs from a large collection\n- Knowledge generation: we prompt an LLM to generate useful explanations for the given input question by leveraging the relationships in the retrieved question-knowledge pairs.\n- Informed reasoning: we prompt the same LLM for the question answering task by taking advantage of the previously generated explanations.\n\nHere is an example of how to use ZEBRA for question answering:\n\n```python\nfrom zebra import Zebra\n\n# Load Zebra with language model, retriever, document index and explanations.\nzebra = Zebra(\n  model=\"meta-llama/Meta-Llama-3-8B-Instruct\",\n  retriever=\"sapienzanlp/zebra-retriever-e5-base-v2\",\n  document_index=\"sapienzanlp/zebra-kb\"\n)\n\n# Provide a question and answer choices.\nquestions = [\n    \"What should you do if you see someone hurt and in need of help?\",\n    \"If your friend is upset, what is the best way to support them?\",\n    \"What should you do if your phone battery is running low in a public place?\",\n    \"What should you do if you are running late for an important meeting?\",\n]\n\nchoices = [\n    [\"Walk away.\", \"Call for help.\", \"Take a photo for social media.\"],\n    [\"Listen to them and offer comfort.\", \"Tell them they are overreacting.\", \"Ignore them and walk away.\"],\n    [\"Borrow a stranger's phone.\", \"Use public charging station.\", \"Leave your phone unattended while it charges.\"],\n    [\"Rush through traffic.\", \"Call and inform them you will be late.\", \"Do not show up at all.\"],\n]\n\n# Generate knowledge and perform question answering.\nzebra_output = zebra.pipeline(questions=questions, choices=choices)\n```\n\nThe output contains, for each question, a list of generated explanations and the predicted answer:\n\n```bash\n  ZebraOutput(\n    explanations=[\n      [\n        \"Walking away would be neglecting the person's need for help and potentially putting them in danger.\",\n        'Calling for help, such as 911, is the most effective way to get the person the assistance they need.',\n        \"Taking a photo for social media might spread awareness, but it's not a direct way to help the person in need.\"\n      ],\n      [\n        'Listening and offering comfort shows empathy and understanding.', \n        \"Telling someone they're overreacting can be dismissive and unhelpful.\", \n        'Ignoring someone in distress can be hurtful and unkind.'\n      ],\n      [\n        \"Borrow a stranger's phone: Unwise, as it's a security risk and may lead to theft or damage.\", \n        \"Use public charging station: Safe and convenient, as it's a designated charging area.\", \n        'Leave your phone unattended while it charges: Not recommended, as it may be stolen or damaged.'\n      ],\n      [\n        'Rush through traffic: This option is risky and may lead to accidents or stress.', \n        'Call and inform them you will be late: This is the most likely option, as it shows respect for the meeting and allows for adjustments.', \n        'Do not show up at all: This is unacceptable, as it shows disrespect for the meeting and may damage relationships.'\n      ],\n    ],\n    answers=[\n      \"Call for help.\",\n      \"Listen to them and offer comfort.\",\n      \"Use public charging station.\",\n      \"Call and inform them you will be late.\"\n    ],\n  )\n```\n\nYou can also call the `zebra.pipeline` method with the `return_dict` parameter set to `True` to ask ZEBRA to return also the retrieved examples along with their explanations.\n\n## Retriever Model\n\nWe trained our retriever on the CSQA dataset [(Talmor et. al 2019)](https://aclanthology.org/N19-1421/).\nThe retriever model can be found on \ud83e\udd17 Hugging Face.\n\n- \ud83e\udd93 **Zebra Retriever**: [`sapienzanlp/zebra-retriever-e5-base-v2`](https://huggingface.co/sapienzanlp/zebra-retriever-e5-base-v2)\n\n## Data\n\nZEBRA comes with a knowledge base called ZEBRA-KB containing examples of questions along with their automatically-generated list of explanations. To create the explanations, we prompt Google Gemini-1.5-Flash to generate useful knowledge given a question together with its choices and correct answer. The examples are taken from the training sets of the following question answering benchmarks:\n\n| Dataset | Description | Link |\n|---------|-------------|------|\n| CSQA    | CommonsenseQA is a dataset for commonsense question answering. | [CSQA](https://www.tau-nlp.org/commonsenseqa) |\n| ARC     | AI2 Reasoning Challenge is a dataset for science question answering. | [ARC](https://allenai.org/data/arc) |\n| OBQA    | OpenBookQA is a dataset for open book question answering. | [OBQA](https://allenai.org/data/open-book-qa) |\n| PIQA    | Physical Interaction QA is a dataset for physical commonsense reasoning. | [PIQA](https://yonatanbisk.com/piqa/) |\n| QASC    | Question Answering via Sentence Composition is a dataset for multi-hop question answering. | [QASC](https://allenai.org/data/qasc) |\n| CSQA2   | CommonsenseQA 2.0 is a dataset for commonsense question answering. | [CSQA2](https://github.com/allenai/csqa2) |\n| WG      | Winograd Schema Challenge is a dataset for commonsense reasoning. | [WG](https://github.com/allenai/winogrande) |\n\n\nThis KB is where the retriever fetches relevant examples for the input question. The KB is organized in two components: the explanations and the document indexes.\n\nThe explanations are organized in splits, one for each training set (e.g. `csqa-train-gemini`). Each sample contains an ID (compliant with the original sample ID in the relative training set) and a list of explanations. There is also a dedicated split which contains all the samples of every split. You can access the explanations at the following link:\n\n- **ZEBRA-KB Explanations** [`sapienzanlp/zebra-kb-explanations`](https://huggingface.co/datasets/sapienzanlp/zebra-kb-explanations)\n\nAlternatively, you can also download the explanations on your local machine from the following [Google Drive link](https://drive.google.com/file/d/1eKBB1DaQQx-s5ibiZrrgfZpfDwDMxxWB/view?usp=sharing). For convenience, we provide a dedicated folder to store the downloaded explanations: `data/explanations`.\n\nThe document indexes contain the examples along with their embeddings. These indexes are needed to fetch relevant examples for a given input question through the retriever. Once the examples are retrieved, their IDs will be matched against the ones contained in the relative explanations split to create the desired input for the knowledge generation step, that is, a list of $k$ examples with their associated explanations.\n\nSimilar to the explanations, the document indexes are organized in splits, one for each training set. You can browse the available splits at the following [HuggingFace Collection link](https://huggingface.co/collections/sapienzanlp/zebra-66e3ec50c8ce415ea7572d0e).\n\nWe also provide a document index containing the splits of every training set:\n\n- **ZEBRA-KB Document Index** [`sapienzanlp/zebra-kb`](https://huggingface.co/sapienzanlp/zebra-kb)\n\n## Reproducibility\n\nIf you wish to reproduce our results, we provide the output of our [`retriever`](https://huggingface.co/sapienzanlp/zebra-retriever-e5-base-v2) for all the datasets at the following [Google Drive link](https://drive.google.com/file/d/1HFk_1pnIBN-3bDGm5Bx7d34mPpDjVHRz/view?usp=drive_link).\n\nAfter you have downloaded the zip file, please unzip it and move its contents to the `data/retriever/outputs` folder. Then, you should be able to see something like `data/retriever/outputs/{dataset}` with some .jsonl files inside. Each .jsonl file contains the top $k=100$ examples fetched by the retriever for each input question of the dataset. The naming convention of the .jsonl file is: `{dataset}_{split}.{dataset}_train.jsonl`, where `{dataset}_{split}` specifies the dataset from which the input questions are drawn from (e.g. `csqa_dev`), while `{dataset}_train` specifies the document index in ZEBRA-KB from which the examples are drawn from (e.g. `csqa_train`).\n\nWe provide a script to run the entire ZEBRA pipeline offline over a dataset using a specific LLM. You can find the available datasets files for evaluation under the `data/datasets` folder. Once you have placed the retriever's outputs in the dedicated folder, the script expects only one input parameter: the model to be evaluated using the relative HuggingFace model ID (e.g. `meta-llama/Meta-Llama-3-8B-Instruct`).\n\nExample on CSQA:\n\n```bash\nbash scripts/evaluation/csqa-dev.sh meta-llama/Meta-Llama-3-8B-Instruct\n```\n\nWe also provide a script to run ZEBRA over all the datasets.\n\n```bash\nbash scripts/evaluation/zebra.sh meta-llama/Meta-Llama-3-8B-Instruct\n```\n\nThese scripts will call **scripts/evaluation/run_zebra.py** with a predefined set of parameters.\nIf you wish to run additional experiments by modifying these parameters, you can either modify them directly inside the bash scripts or call the python script with command line arguments.\n\n```bash\npython scripts/evaluation/run_zebra.py --help\n\n  Usage: run_zebra.py [ARGUMENTS] [OPTIONS] \n\n\u256d\u2500 Arguments \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 *    model_name              TEXT  [default: None] [required]          \u2502\n\u2502 *    retriever_output_path   TEXT  [default: None] [required]          \u2502\n\u2502 *    data_path               TEXT  [default: None] [required]          \u2502\n\u2502 *    explanations_path       TEXT  [default: None] [required]          \u2502\n\u2502 *    dataset_tag             TEXT  [default: None] [required]          \u2502\n\u2502 *    output_dir              TEXT  [default: None] [required]          \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 --fewshot_data_path                        TEXT     [default: None]    \u2502\n\u2502 --explanations_split                       TEXT     [default: None]    \u2502\n\u2502 --plain                                    BOOL     [default: False]   \u2502\n\u2502 --oracle                                   BOOL     [default: False]   \u2502\n\u2502 --examples                                 BOOL     [default: False]   \u2502\n\u2502 --max_generated_knowledge                  INTEGER  [default: None]    \u2502\n\u2502 --add_negative_explanations                BOOL     [default: False]   \u2502\n\u2502 --num_kg_examples                          INTEGER  [default: 1]       \u2502\n\u2502 --num_qa_examples                          INTEGER  [default: 0]       \u2502\n\u2502 --limit_samples                            INTEGER  [default: None]    \u2502\n\u2502 --device                                   TEXT     [default: 'cuda']  \u2502\n\u2502 --help                                     Show this message and exit. \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n```\n\nFor example:\n\n```bash\npython scripts/evaluation/run_zebra.py \\\n  --model_name meta-llama/Meta-Llama-3-8B-Instruct \\\n  --data_path data/datasets/csqa/csqa-dev.jsonl \\\n  --retriever_output_path data/retriever/outputs/csqa/csqa_dev.csqa_train.jsonl \\\n  --fewshot_data_path data/datasets/csqa/csqa-train.jsonl \\\n  --explanations_path sapienzanlp/zebra-kb-explanations \\\n  --explanations_split csqa-train-gemini\n  --dataset_tag csqa \\ \n  --output_dir results/zebra/csqa \\\n  --plain \\\n  --oracle \\\n  --examples \\\n  --num_qa_examples 0 \\\n  --num_kg_examples 5\n```\n\n## \ud83e\udd93 Zebra Pipeline\n\nIn the following sections, we provide a step-by-step guide on how to prepare the data to test ZEBRA on your dataset, train your own ZEBRA retriever and evaluate the models.\n\n### Data Evaluation Format\n\nTo be able to run ZEBRA on your dataset, the data should have the following structure:\n\n```python\n{\n  \"id\": str,  # Unique identifier for the question\n  \"question\": dict  # Dictionary of the question\n    {\n      \"stem\": str, # The question text\n      \"choices\": list[dict] #\u00a0The object containing the choices for the question\n        [\n          {\n            \"label\": str, # A Label for every choice, like \"A\", \"B\", \"C\" etc.\n            \"text\": str #\u00a0The choice text\n          },\n          ...\n        ]\n    }\n  \"answerKey\": str # The correct label among the choices\n}\n```\n\nAll the datasets in the `data/datasets` folder already match this format. For convenience, we provide a script to parse a dataset in the desired format: `scripts/data/parse_dataset.py`.\n\n### Retriever Training\n\nOur retriever model can be found at the link in the [Models](#models) section.\nWe trained our retriever on the CSQA dataset [(Talmor et. al 2019)](https://aclanthology.org/N19-1421/). In particular, we format both the training and validation datasets as follows: \n\n```python\n{\n  \"question\": str  # The input passage\n  \"positive_ctxs\": list[dict] # List of positive passages\n    [\n      {\n        \"text\": str #\u00a0The text of the positive passage\n      },\n      ...\n    ]\n}\n```\n\nWhere each *question* and *positive* passage is formatted as:\n\n```string\nQ [SEP] C1 [SEP] C2 ... [SEP] Cn \n```\n\nIf you wish to train your own retriever with this dataset, you can run:\n\n```bash\nbash scripts/retriever/train.sh\n```\n\nThe script will call **scripts/retriever/train.py** with a predefined set of parameters.\nThe data needed to train the retriever can be found in the `data/retriever/datasets` folder.\nIf you wish to run additional experiments by modifying these parameters, you can either modify them directly inside the bash scripts or call the python script with command line arguments.\n\n```bash\npython scripts/retriever/train.py --help\n\n  Usage: train.py [ARGUMENTS] [OPTIONS] \n\n\u256d\u2500 Arguments \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 *    train_data_path         TEXT  [default: None] [required]                             \u2502\n\u2502 *    dev_data_path           TEXT  [default: None] [required]                             \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 --question_encoder                         TEXT     [default: intfloat/e5-base-v2]        \u2502\n\u2502 --passage_encoder                          TEXT     [default: None]                       \u2502\n\u2502 --document_index                           INTEGER  [default: None]                       \u2502\n\u2502 --device                                   TEXT     [default: cuda]                       \u2502\n\u2502 --precision                                TEXT     [default: 16]                         \u2502\n\u2502 --question_batch_size                      INTEGER  [default: 64]                         \u2502\n\u2502 --passage_batch_size                       INTEGER  [default: 200]                        \u2502\n\u2502 --max_question_length                      INTEGER  [default: 256]                        \u2502\n\u2502 --max_passage_length                       INTEGER  [default: 256]                        \u2502\n\u2502 --max_steps                                INTEGER  [default: 25000]                      \u2502\n\u2502 --num_workers                              INTEGER  [default: 4]                          \u2502\n\u2502 --max_hard_negatives_to_mine               INTEGER  [default: 0]                          \u2502\n\u2502 --wandb_online_mode                        BOOL     [default: False]                      \u2502\n\u2502 --wandb_log_model                          BOOL     [default: False]                      \u2502\n\u2502 --wandb_project_name                       TEXT     [default: zebra-retriever]            \u2502\n\u2502 --wandb_experiment_name                    TEXT     [default: zebra-retriever-e5-base-v2] \u2502\n\u2502 --help                                     Show this message and exit.                    \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n```\n\nFor example:\n\n```bash\npython scripts/retriever/train.py \\\n  --train_data_path data/retriever/datasets/train.jsonl \\\n  --dev_data_path data/retriever/datasets/dev.jsonl\n```\n\n\n### \ud83e\udd93 ZEBRA KB\n\nAs previously explained under the [Data](#data) section, the ZEBRA pipeline requires a document index containing examples such that the retriever can fetch the most relevant ones for an input question. The document index can contain:\n- examples from the training set of the dataset under evaluation.\n- examples from the training set of another dataset.\n- examples from multiple training sets of a list of datasets.\n\nYou can either access the precomputed document indexes using the link provided in the [Data](#data) section, or you can generate your own document index by running:\n\n```bash\npython scripts/retriever/create_index.py --help\n\n  Usage: create_index.py [ARGUMENTS] [OPTIONS] \n\n\u256d\u2500 Arguments \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 *    data_paths              TEXT  [default: None] [required]                           \u2502\n\u2502 *    output_dir              TEXT  [default: None] [required]                           \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 --retriever_path             TEXT     [default: sapienzanlp/zebra-retriever-e5-base-v2] \u2502\n\u2502 --batch_size                 INTEGER  [default: 512]                                    \u2502\n\u2502 --num_workers                INTEGER  [default: 4]                                      \u2502\n\u2502 --max_length                 INTEGER  [default: 512]                                    \u2502\n\u2502 --device                     TEXT     [default: cuda]                                   \u2502\n\u2502 --precision                  TEXT     [default: fp32]                                   \u2502\n\u2502 --help                       Show this message and exit.                                \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n```\n\nFor example: \n\n```bash\npython zebra/retriever/create_index.py \\\n  --retriever_path sapienzanlp/zebra-retriever-e5-base-v2 \\\n  --data_path data/datasets/csqa/csqa-train.jsonl \\\n  --output_dir data/retriever/zebra_kb/csqa_train \\\n```\n\nFor convenience, we provide a folder to store the document indexes: `data/retriever/zebra_kb`.\n\n### Retriever Inference\n\nOnce you have a retriever and a document index of the ZEBRA KB, you can retrieve the most relevant examples from the KB for a given input question by running:\n\n```bash\npython scripts/retriever/retriever_inference.py --help\n\n  Usage: retriever_inference.py [ARGUMENTS] [OPTIONS] \n\n\u256d\u2500 Arguments \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 *    data_path               TEXT  [default: None] [required]                           \u2502\n\u2502 *    output_path             TEXT  [default: None] [required]                           \u2502\n\u2502 *    document_index_path     TEXT  [default: None] [required]                           \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n\u256d\u2500 Options \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256e\n\u2502 --retriever_path             TEXT     [default: sapienzanlp/zebra-retriever-e5-base-v2] \u2502\n\u2502 --batch_size                 INTEGER  [default: 64]                                     \u2502\n\u2502 --k                          INTEGER  [default: 100]                                    \u2502\n\u2502 --device                     TEXT     [default: cuda]                                   \u2502\n\u2502 --help                       Show this message and exit.                                \u2502\n\u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u256f\n```\n\nFor example:\n\n```bash\npython zebra/retriever/retriever_inference.py \\\n  --retriever_path sapienzanlp/zebra-retriever-e5-base-v2 \\\n  --data_path data/datasets/csqa/csqa-dev.jsonl \\\n  --document_index_path sapienzanlp/zebra-kb-csqa-train\\\n  --output_path data/retriever/outputs/csqa/csqa_dev.csqa_train.jsonl \\\n```\n\nFor convenience, we provide a folder to store the retriever outputs: `data/retriever/outputs`.\n\nOnce you have obtained the retriever's output for a given dataset, you can run the code explained under the [Reproducibility](#Reproducibility) section to obtain the output and the scores of the ZEBRA pipeline on that dataset.\n\n## \ud83d\udcca Performance\n\nWe evaluate the performance of ZEBRA on 8 well-established commonsense question answering datasets. The following table shows the results (accuracy) of the models before / after the application of ZEBRA.\n\n|          Model           |       CSQA      |      ARC-C      |      ARC-E      |       OBQA      |       PIQA      |       QASC      |      CSQA2      |        WG       |       AVG       |  \n| ------------------------ | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- | \n| Mistral-7B-Instruct-v0.2 | 68.2 / **73.3** | 72.4\t/ **75.2** | 85.8\t/ **87.4** | 68.8\t/ **75.8** | 76.1\t/ **80.2** | 66.1\t/ **68.3** | 58.5\t/ **67.5** | 55.8 / **60.7** | 68.9 / **73.5** |\n| Phi3-small-8k-Instruct   | 77.2 / **80.9** | 90.4 / **91.6** | 96.9\t/ **97.7** | 90.4\t/ **91.2** | 86.6\t/ **88.1** | **83.5**\t/ 81.0 | 68.0\t/ **74.6** | 79.1\t/ **81.0** | 84.0 / **85.8** | \n| Meta-Llama-3-8b-Instruct | 73.9 / **78.7** | 79.4 / **83.5** | 91.7\t/ **92.9** | 73.4\t/ **79.6** | 78.3\t/ **84.0** | 78.2\t/ **79.1** | 64.3\t/ **69.4** | 56.2\t/ **63.2** | 74.4 / **78.8** | \n| Phi3-mini-128k-Instruct  | 73.4 / **74.8** | 85.7\t/ **88.0** | 95.4\t/ **96.0** | 82.8\t/ **87.8** | 80.4\t/ **84.2** | **74.7**\t/ 73.9 | 59.3\t/ **64.6** | 67.3\t/ **72.9** | 77.4 / **80.5** | \n\nYou can also download the official paper results at the following [Google Drive Link](https://drive.google.com/file/d/1l7bY-TkqnmVQn5M5ynQfT-0upMcRlMnT/view?usp=drive_link).\n\n## [Baselines](baselines/README.md)\n\n## Cite this work\n\nIf you use any part of this work, please consider citing the paper as follows:\n\n```bibtex\n@inproceedings{molfese-etal-2024-zebra,\n    title = \"{ZEBRA}: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering\",\n    author = \"Molfese, Francesco Maria  and\n      Conia, Simone  and\n      Orlando, Riccardo  and\n      Navigli, Roberto\",\n    editor = \"Al-Onaizan, Yaser  and\n      Bansal, Mohit  and\n      Chen, Yun-Nung\",\n    booktitle = \"Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing\",\n    month = nov,\n    year = \"2024\",\n    address = \"Miami, Florida, USA\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2024.emnlp-main.1251\",\n    doi = \"10.18653/v1/2024.emnlp-main.1251\",\n    pages = \"22429--22444\"\n}\n```\n\n## \ud83e\udeaa License\n\nThe data and software are licensed under [Creative Commons Attribution-NonCommercial-ShareAlike 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/).\n\n## Acknowledgements\nWe gratefully acknowledge CREATIVE (CRoss-modalunderstanding and gEnerATIon of Visual and tExtual content) for supporting this work. Simone Conia gratefully acknowledges the support of Future AI Research ([PNRR MUR project PE0000013-FAIR](https://fondazione-fair.it/en/)), which fully funds his fellowship at Sapienza University of Rome since October 2023.\n",
    "bugtrack_url": null,
    "license": "Creative Commons Attribution-NonCommercial-ShareAlike 4.0",
    "summary": "ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering",
    "version": "1.0.1",
    "project_urls": {
        "Homepage": "https://github.com/sapienzanlp/zebra"
    },
    "split_keywords": [
        "nlp",
        "sapienza",
        "sapienzanlp",
        "deep",
        "learning",
        "transformer",
        "pytorch",
        "retriever",
        "question",
        "answering",
        "commonsense",
        "large",
        "language",
        "models"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4bcb8476c7e253a021f4990ed82a3debbeb6a6961bf785a047df68baf4a2df11",
                "md5": "04d382c682edaa90830edb5dd1b52cb9",
                "sha256": "330946dc0ac0b362de900ca546fb7e18a78124e7883a3604a3ff778e51bf836a"
            },
            "downloads": -1,
            "filename": "zebra_qa-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "04d382c682edaa90830edb5dd1b52cb9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 45754,
            "upload_time": "2024-12-26T15:24:16",
            "upload_time_iso_8601": "2024-12-26T15:24:16.042384Z",
            "url": "https://files.pythonhosted.org/packages/4b/cb/8476c7e253a021f4990ed82a3debbeb6a6961bf785a047df68baf4a2df11/zebra_qa-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0fd822a5171b6be01cfce126b220966bcd78e6a4cccde54614bb0ff584112a15",
                "md5": "80fc7d13e94e6042c32623e1f87619c2",
                "sha256": "678e7a439c39eba81f7f2cc93b100f50f229e7e4a655ea7d181619db0696b8d9"
            },
            "downloads": -1,
            "filename": "zebra_qa-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "80fc7d13e94e6042c32623e1f87619c2",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 46366,
            "upload_time": "2024-12-26T15:24:18",
            "upload_time_iso_8601": "2024-12-26T15:24:18.169744Z",
            "url": "https://files.pythonhosted.org/packages/0f/d8/22a5171b6be01cfce126b220966bcd78e6a4cccde54614bb0ff584112a15/zebra_qa-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-26 15:24:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sapienzanlp",
    "github_project": "zebra",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [
        {
            "name": "torch",
            "specs": [
                [
                    "==",
                    "2.2.0"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    "==",
                    "4.42.4"
                ]
            ]
        },
        {
            "name": "accelerate",
            "specs": [
                [
                    "==",
                    "0.30.1"
                ]
            ]
        },
        {
            "name": "datasets",
            "specs": [
                [
                    "==",
                    "2.16.1"
                ]
            ]
        },
        {
            "name": "rich",
            "specs": [
                [
                    "==",
                    "13.7.1"
                ]
            ]
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    "==",
                    "1.4.2"
                ]
            ]
        },
        {
            "name": "tiktoken",
            "specs": [
                [
                    "==",
                    "0.7.0"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    "==",
                    "8.2.2"
                ]
            ]
        },
        {
            "name": "bitsandbytes",
            "specs": [
                [
                    "==",
                    "0.43.3"
                ]
            ]
        },
        {
            "name": "lightning",
            "specs": [
                [
                    "==",
                    "2.2.5"
                ]
            ]
        },
        {
            "name": "hydra-core",
            "specs": [
                [
                    "==",
                    "1.3.2"
                ]
            ]
        },
        {
            "name": "hydra_colorlog",
            "specs": [
                [
                    "==",
                    "1.2.0"
                ]
            ]
        },
        {
            "name": "wandb",
            "specs": [
                [
                    "==",
                    "0.16.6"
                ]
            ]
        },
        {
            "name": "art",
            "specs": [
                [
                    "==",
                    "6.1"
                ]
            ]
        },
        {
            "name": "pprintpp",
            "specs": [
                [
                    "==",
                    "0.4.0"
                ]
            ]
        },
        {
            "name": "colorama",
            "specs": [
                [
                    "==",
                    "0.4.6"
                ]
            ]
        },
        {
            "name": "jsonlines",
            "specs": [
                [
                    "==",
                    "4.0.0"
                ]
            ]
        },
        {
            "name": "loguru",
            "specs": [
                [
                    "==",
                    "0.7.2"
                ]
            ]
        },
        {
            "name": "goldenretriever-core",
            "specs": [
                [
                    "==",
                    "0.9.4"
                ]
            ]
        },
        {
            "name": "setuptools",
            "specs": [
                [
                    "==",
                    "75.1.0"
                ]
            ]
        },
        {
            "name": "twine",
            "specs": [
                [
                    "==",
                    "5.1.1"
                ]
            ]
        },
        {
            "name": "flash-attn",
            "specs": [
                [
                    "==",
                    "2.5.9.post1"
                ]
            ]
        }
    ],
    "lcname": "zebra-qa"
}

Francesco Molfese