# The Critique of Critique
This is the official repository for [**The Critique of Critique**](https://arxiv.org/abs/2401.04518).
## Table of contents
- [Introduction](#Introduction)
- [Leaderboard](#leaderboard)
- [Quick Start](#quick-start)
- [Citation](#citation)
## Introduction
We introduce **MetaCritique**, a new judge that can effectively evaluate human-written or LLMs-generated critique by generating critique.
**Meta-P**: precision score of MetaCritique that evaluates **factuality** of hypothesis critique.
**Meta-R**: recall score of MetaCritique that evaluates **comprehensiveness** of hypothesis critique.
**Meta-F1**: overall rating that is harmonic mean of precision score and recall score.
## Leaderboard
We release the benchmarking results of multiple critique models.
| Critique Model | Meta-Precision | Meta-Recall | Meta-F1 score |
|---------------------------------------------------------------------------|--| ---- | ---- |
| [AUTO-J](https://github.com/GAIR-NLP/auto-j) | <u>76.43</u> | **70.65** | **71.14** |
| [GPT 3.5](https://openai.com/blog/gpt-3-5-turbo-fine-tuning-and-api-updates) | 80.79 | 64.27 | 68.72 |
| [UltraCM](https://github.com/OpenBMB/UltraFeedback) | 73.64 | 66.77 | 67.79 |
| [Human Critique from Shepherd](https://github.com/facebookresearch/Shepherd) | **83.19** | 60.65 | 64.02 |
| [SelFee](https://github.com/kaistAI/SelFee) | 69.56 | 51.05 | 54.22 |
## Quick Start
#### Installation
```bash
pip install meta-critique
```
#### Usage
```python
from meta_critique import MetaCritique
api_key = ... # here is your OpenAi key
inputs = [
{"question": "<question>", "response": "<response1>", "hypothesis_critique": "<hypothesis_critique>"},
{"question": "<question>", "response": "<response2>", "hypothesis_critique": "<hypothesis_critique>"},
...
]
meta_critique_instance = MetaCritique(
model_type="gpt-4",
batch_size=5,
api_key=api_key,
api_base=None,
seed=None,
cache_dir="tmp_cache",
)
precision_score, recall_score, f1_score = meta_critique_instance.score(inputs)
```
where
* `question`: The user query for the model to generate the response.
* `response`: The response generated by the model.
* `hypothesis_critique`: The critique written by either human or LLMs.
* `reference_answer`: (Optional) The reference answer.
* `reference_critique`: (Optional) The reference critique.
* str: a critique text
* dict: {"critique": <reference_critique>, "aius": <optional_aius_from_reference_critique>}
You can find a test sample from eval_examples/test_samples.json
## Citation
If you find our work useful or use meta-critique, please cite our paper:
```
@article{sun2024metacritique,
title={The Critique of Critique},
author={Shichao Sun, Junlong Li, Weizhe Yuan, Ruifeng Yuan, Wenjie Li, Pengfei Liu},
journal={arXiv preprint arXiv:2401.04518},
year={2024}
}
```
Raw data
{
"_id": null,
"home_page": "https://github.com/GAIR-NLP/MetaCritique",
"name": "meta-critique",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "",
"author": "GAIR Research Group",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/8b/f5/257472e8928db2e96182d16006ef79693f086cf396530d606e6de68a4374/meta-critique-0.1.2.tar.gz",
"platform": null,
"description": "# The Critique of Critique\n\nThis is the official repository for [**The Critique of Critique**](https://arxiv.org/abs/2401.04518).\n\n\n## Table of contents\n- [Introduction](#Introduction)\n- [Leaderboard](#leaderboard)\n- [Quick Start](#quick-start)\n- [Citation](#citation)\n\n\n## Introduction\nWe introduce **MetaCritique**, a new judge that can effectively evaluate human-written or LLMs-generated critique by generating critique. \n\n**Meta-P**: precision score of MetaCritique that evaluates **factuality** of hypothesis critique.\n\n**Meta-R**: recall score of MetaCritique that evaluates **comprehensiveness** of hypothesis critique.\n\n**Meta-F1**: overall rating that is harmonic mean of precision score and recall score.\n\n## Leaderboard\nWe release the benchmarking results of multiple critique models.\n\n| Critique Model | Meta-Precision | Meta-Recall | Meta-F1 score |\n|---------------------------------------------------------------------------|--| ---- | ---- |\n| [AUTO-J](https://github.com/GAIR-NLP/auto-j) | <u>76.43</u> | **70.65** | **71.14** |\n| [GPT 3.5](https://openai.com/blog/gpt-3-5-turbo-fine-tuning-and-api-updates) | 80.79 | 64.27 | 68.72 |\n| [UltraCM](https://github.com/OpenBMB/UltraFeedback) | 73.64 | 66.77 | 67.79 |\n| [Human Critique from Shepherd](https://github.com/facebookresearch/Shepherd) | **83.19** | 60.65 | 64.02 |\n| [SelFee](https://github.com/kaistAI/SelFee) | 69.56 | 51.05 | 54.22 |\n\n## Quick Start\n#### Installation\n```bash\npip install meta-critique\n```\n\n#### Usage\n```python\nfrom meta_critique import MetaCritique\napi_key = ... # here is your OpenAi key\ninputs = [\n {\"question\": \"<question>\", \"response\": \"<response1>\", \"hypothesis_critique\": \"<hypothesis_critique>\"},\n {\"question\": \"<question>\", \"response\": \"<response2>\", \"hypothesis_critique\": \"<hypothesis_critique>\"},\n ...\n ]\n\nmeta_critique_instance = MetaCritique(\n model_type=\"gpt-4\",\n batch_size=5,\n api_key=api_key,\n api_base=None,\n seed=None,\n cache_dir=\"tmp_cache\",\n )\nprecision_score, recall_score, f1_score = meta_critique_instance.score(inputs)\n```\nwhere\n* `question`: The user query for the model to generate the response.\n* `response`: The response generated by the model.\n* `hypothesis_critique`: The critique written by either human or LLMs.\n* `reference_answer`: (Optional) The reference answer.\n* `reference_critique`: (Optional) The reference critique.\n * str: a critique text\n * dict: {\"critique\": <reference_critique>, \"aius\": <optional_aius_from_reference_critique>}\n\nYou can find a test sample from eval_examples/test_samples.json\n\n## Citation\n\nIf you find our work useful or use meta-critique, please cite our paper:\n```\n@article{sun2024metacritique,\n title={The Critique of Critique},\n author={Shichao Sun, Junlong Li, Weizhe Yuan, Ruifeng Yuan, Wenjie Li, Pengfei Liu},\n journal={arXiv preprint arXiv:2401.04518},\n year={2024}\n}\n```\n\n",
"bugtrack_url": null,
"license": "Apache License",
"summary": "Evaluate the Quality of Critique",
"version": "0.1.2",
"project_urls": {
"Homepage": "https://github.com/GAIR-NLP/MetaCritique"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8bf5257472e8928db2e96182d16006ef79693f086cf396530d606e6de68a4374",
"md5": "b09a118da34e00553b9708a4e06c7390",
"sha256": "4fcc3eaf591cd7a37ffca07fd3ac4bcc08324833c05d632758e55069629d07fc"
},
"downloads": -1,
"filename": "meta-critique-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "b09a118da34e00553b9708a4e06c7390",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 18815,
"upload_time": "2024-01-10T06:46:32",
"upload_time_iso_8601": "2024-01-10T06:46:32.659964Z",
"url": "https://files.pythonhosted.org/packages/8b/f5/257472e8928db2e96182d16006ef79693f086cf396530d606e6de68a4374/meta-critique-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-10 06:46:32",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "GAIR-NLP",
"github_project": "MetaCritique",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "meta-critique"
}