meta-critique

Name	meta-critique JSON
Version	0.1.2 JSON
	download
home_page	https://github.com/GAIR-NLP/MetaCritique
Summary	Evaluate the Quality of Critique
upload_time	2024-01-10 06:46:32
maintainer
docs_url	None
author	GAIR Research Group
requires_python
license	Apache License
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # The Critique of Critique

This is the official repository for [**The Critique of Critique**](https://arxiv.org/abs/2401.04518).


## Table of contents
- [Introduction](#Introduction)
- [Leaderboard](#leaderboard)
- [Quick Start](#quick-start)
- [Citation](#citation)


## Introduction
We introduce **MetaCritique**, a new judge that can effectively evaluate human-written or LLMs-generated critique by generating critique. 

**Meta-P**: precision score of MetaCritique that evaluates **factuality** of hypothesis critique.

**Meta-R**: recall score of MetaCritique that evaluates **comprehensiveness** of hypothesis critique.

**Meta-F1**: overall rating that is harmonic mean of precision score and recall score.

## Leaderboard
We release the benchmarking results of multiple critique models.

| Critique Model                                                                     | Meta-Precision | Meta-Recall  | Meta-F1 score |
|---------------------------------------------------------------------------|--| ---- | ---- |
| [AUTO-J](https://github.com/GAIR-NLP/auto-j)                                          | <u>76.43</u> | **70.65**  | **71.14** |
| [GPT 3.5](https://openai.com/blog/gpt-3-5-turbo-fine-tuning-and-api-updates)         | 80.79  | 64.27  | 68.72   |
| [UltraCM](https://github.com/OpenBMB/UltraFeedback)                                   | 73.64 | 66.77  | 67.79 |
| [Human Critique from Shepherd](https://github.com/facebookresearch/Shepherd)          | **83.19** | 60.65   |  64.02   |
| [SelFee](https://github.com/kaistAI/SelFee)                                           | 69.56  |  51.05  |  54.22 |

## Quick Start
#### Installation
```bash
pip install meta-critique
```

#### Usage
```python
from meta_critique import MetaCritique
api_key = ...  # here is your OpenAi key
inputs = [
            {"question": "<question>", "response": "<response1>", "hypothesis_critique": "<hypothesis_critique>"},
            {"question": "<question>", "response": "<response2>", "hypothesis_critique": "<hypothesis_critique>"},
          ...
        ]

meta_critique_instance = MetaCritique(
        model_type="gpt-4",
        batch_size=5,
        api_key=api_key,
        api_base=None,
        seed=None,
        cache_dir="tmp_cache",
    )
precision_score, recall_score, f1_score = meta_critique_instance.score(inputs)
```
where
* `question`: The user query for the model to generate the response.
* `response`: The response generated by the model.
* `hypothesis_critique`: The critique written by either human or LLMs.
* `reference_answer`: (Optional) The reference answer.
* `reference_critique`: (Optional) The reference critique.
  * str: a critique text
  * dict: {"critique": <reference_critique>, "aius": <optional_aius_from_reference_critique>}

You can find a test sample from eval_examples/test_samples.json

## Citation

If you find our work useful or use meta-critique, please cite our paper:
```
@article{sun2024metacritique,
  title={The Critique of Critique},
  author={Shichao Sun, Junlong Li, Weizhe Yuan, Ruifeng Yuan, Wenjie Li, Pengfei Liu},
  journal={arXiv preprint arXiv:2401.04518},
  year={2024}
}
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/GAIR-NLP/MetaCritique",
    "name": "meta-critique",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "GAIR Research Group",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/8b/f5/257472e8928db2e96182d16006ef79693f086cf396530d606e6de68a4374/meta-critique-0.1.2.tar.gz",
    "platform": null,
    "description": "# The Critique of Critique\n\nThis is the official repository for [**The Critique of Critique**](https://arxiv.org/abs/2401.04518).\n\n\n## Table of contents\n- [Introduction](#Introduction)\n- [Leaderboard](#leaderboard)\n- [Quick Start](#quick-start)\n- [Citation](#citation)\n\n\n## Introduction\nWe introduce **MetaCritique**, a new judge that can effectively evaluate human-written or LLMs-generated critique by generating critique. \n\n**Meta-P**: precision score of MetaCritique that evaluates **factuality** of hypothesis critique.\n\n**Meta-R**: recall score of MetaCritique that evaluates **comprehensiveness** of hypothesis critique.\n\n**Meta-F1**: overall rating that is harmonic mean of precision score and recall score.\n\n## Leaderboard\nWe release the benchmarking results of multiple critique models.\n\n| Critique Model                                                                     | Meta-Precision | Meta-Recall  | Meta-F1 score |\n|---------------------------------------------------------------------------|--| ---- | ---- |\n| [AUTO-J](https://github.com/GAIR-NLP/auto-j)                                          | <u>76.43</u> | **70.65**  | **71.14** |\n| [GPT 3.5](https://openai.com/blog/gpt-3-5-turbo-fine-tuning-and-api-updates)         | 80.79  | 64.27  | 68.72   |\n| [UltraCM](https://github.com/OpenBMB/UltraFeedback)                                   | 73.64 | 66.77  | 67.79 |\n| [Human Critique from Shepherd](https://github.com/facebookresearch/Shepherd)          | **83.19** | 60.65   |  64.02   |\n| [SelFee](https://github.com/kaistAI/SelFee)                                           | 69.56  |  51.05  |  54.22 |\n\n## Quick Start\n#### Installation\n```bash\npip install meta-critique\n```\n\n#### Usage\n```python\nfrom meta_critique import MetaCritique\napi_key = ...  # here is your OpenAi key\ninputs = [\n            {\"question\": \"<question>\", \"response\": \"<response1>\", \"hypothesis_critique\": \"<hypothesis_critique>\"},\n            {\"question\": \"<question>\", \"response\": \"<response2>\", \"hypothesis_critique\": \"<hypothesis_critique>\"},\n          ...\n        ]\n\nmeta_critique_instance = MetaCritique(\n        model_type=\"gpt-4\",\n        batch_size=5,\n        api_key=api_key,\n        api_base=None,\n        seed=None,\n        cache_dir=\"tmp_cache\",\n    )\nprecision_score, recall_score, f1_score = meta_critique_instance.score(inputs)\n```\nwhere\n* `question`: The user query for the model to generate the response.\n* `response`: The response generated by the model.\n* `hypothesis_critique`: The critique written by either human or LLMs.\n* `reference_answer`: (Optional) The reference answer.\n* `reference_critique`: (Optional) The reference critique.\n  * str: a critique text\n  * dict: {\"critique\": <reference_critique>, \"aius\": <optional_aius_from_reference_critique>}\n\nYou can find a test sample from eval_examples/test_samples.json\n\n## Citation\n\nIf you find our work useful or use meta-critique, please cite our paper:\n```\n@article{sun2024metacritique,\n  title={The Critique of Critique},\n  author={Shichao Sun, Junlong Li, Weizhe Yuan, Ruifeng Yuan, Wenjie Li, Pengfei Liu},\n  journal={arXiv preprint arXiv:2401.04518},\n  year={2024}\n}\n```\n\n",
    "bugtrack_url": null,
    "license": "Apache License",
    "summary": "Evaluate the Quality of Critique",
    "version": "0.1.2",
    "project_urls": {
        "Homepage": "https://github.com/GAIR-NLP/MetaCritique"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8bf5257472e8928db2e96182d16006ef79693f086cf396530d606e6de68a4374",
                "md5": "b09a118da34e00553b9708a4e06c7390",
                "sha256": "4fcc3eaf591cd7a37ffca07fd3ac4bcc08324833c05d632758e55069629d07fc"
            },
            "downloads": -1,
            "filename": "meta-critique-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "b09a118da34e00553b9708a4e06c7390",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 18815,
            "upload_time": "2024-01-10T06:46:32",
            "upload_time_iso_8601": "2024-01-10T06:46:32.659964Z",
            "url": "https://files.pythonhosted.org/packages/8b/f5/257472e8928db2e96182d16006ef79693f086cf396530d606e6de68a4374/meta-critique-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-10 06:46:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "GAIR-NLP",
    "github_project": "MetaCritique",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "meta-critique"
}

GAIR Research Group