<div align="center">
<img src="https://repository-images.githubusercontent.com/837144095/8190ad0e-e9ff-4dda-9116-644d62d6b886">
</div>
<p align="center">
<!-- Python -->
<a href="https://www.python.org" alt="Python"><img src="https://badges.aleen42.com/src/python.svg"></a>
<!-- Version -->
<a href="https://pypi.org/project/guardbench/"><img src="https://img.shields.io/pypi/v/guardbench?color=light-green" alt="PyPI version"></a>
<!-- Docs -->
<a href="https://github.com/AmenRa/guardbench/tree/main/docs"><img src="https://img.shields.io/badge/docs-passing-<COLOR>.svg" alt="Documentation Status"></a>
<!-- Black -->
<a href="https://github.com/psf/black" alt="Code style: black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg"></a>
<!-- License -->
<a href="https://interoperable-europe.ec.europa.eu/sites/default/files/custom-page/attachment/2020-03/EUPL-1.2%20EN.txt"><img src="https://img.shields.io/badge/license-EUPL-blue.svg" alt="License: EUPL-1.2"></a>
</p>
# GuardBench
## 🔥 News
- [October 9, 2025] GuardBench now supports four additional datasets: [JBB Behaviors](https://huggingface.co/datasets/JailbreakBench/JBB-Behaviors), [NicheHazardQA](https://huggingface.co/datasets/SoftMINER-Group/NicheHazardQA), [HarmEval](https://huggingface.co/datasets/SoftMINER-Group/HarmEval), and [TechHazardQA](https://huggingface.co/datasets/SoftMINER-Group/TechHazardQA). Also, it now allows for choosing the metrics to show at the end of the evaluation. Supported metrics are: `precision` (Precision), `recall` (Recall), `f1` (F1), `mcc` (Matthews Correlation Coefficient), `auprc` (AUPRC), `sensitivity` (Sensitivity), `specificity` (Specificity), `g_mean` (G-Mean), `fpr` (False Positive Rate), `fnr` (False Negative Rate).
## ⚡️ Introduction
[`GuardBench`](https://github.com/AmenRa/guardbench) is a Python library for the evaluation of guardrail models, i.e., LLMs fine-tuned to detect unsafe content in human-AI interactions.
[`GuardBench`](https://github.com/AmenRa/guardbench) provides a common interface to 40 evaluation datasets, which are downloaded and converted into a [standardized format](docs/data_format.md) for improved usability.
It also allows to quickly [compare results and export](docs/report.md) `LaTeX` tables for scientific publications.
[`GuardBench`](https://github.com/AmenRa/guardbench)'s benchmarking pipeline can also be leveraged on [custom datasets](docs/custom_dataset.md).
[`GuardBench`](https://github.com/AmenRa/guardbench) was featured in [EMNLP 2024](https://2024.emnlp.org).
The related paper is available [here](https://aclanthology.org/2024.emnlp-main.1022.pdf).
[`GuardBench`](https://github.com/AmenRa/guardbench) has a public [leaderboard](https://huggingface.co/spaces/AmenRa/guardbench-leaderboard) available on HuggingFace.
You can find the list of supported datasets [here](docs/datasets.md).
A few of them requires authorization. Please, read [this](docs/get_datasets.md).
If you use [`GuardBench`](https://github.com/AmenRa/guardbench) to evaluate guardrail models for your scientific publications, please consider [citing our work](#-citation).
## ✨ Features
- [40 datasets](docs/datasets.md) for guardrail models evaluation.
- Automated evaluation pipeline.
- User-friendly.
- [Extendable](docs/custom_dataset.md).
- Reproducible and sharable evaluation.
- Exportable [evaluation reports](docs/report.md).
## 🔌 Requirements
```bash
python>=3.10
```
## 💾 Installation
```bash
pip install guardbench
```
## 💡 Usage
```python
from guardbench import benchmark
def moderate(
conversations: list[list[dict[str, str]]], # MANDATORY!
# additional `kwargs` as needed
) -> list[float]:
# do moderation
# return list of floats (unsafe probabilities)
benchmark(
moderate=moderate, # User-defined moderation function
model_name="My Guardrail Model",
batch_size=1, # Default value
datasets="all", # Default value
metrics=["f1", "recall"], # Default value
# Note: you can pass additional `kwargs` for `moderate`
)
```
### 📖 Examples
- Follow our [tutorial](docs/llama_guard.md) on benchmarking [`Llama Guard`](https://arxiv.org/pdf/2312.06674) with [`GuardBench`](https://github.com/AmenRa/guardbench).
- More examples are available in the [`scripts`](scripts/effectiveness) folder.
## 📚 Documentation
Browse the documentation for more details about:
- The [datasets](docs/datasets.md) and how to [obtain them](docs/get_datasets.md).
- The [data format](data_format.md) used by [`GuardBench`](https://github.com/AmenRa/guardbench).
- How to use the [`Report`](docs/report.md) class to compare models and export results as `LaTeX` tables.
- How to leverage [`GuardBench`](https://github.com/AmenRa/guardbench)'s benchmarking pipeline on [custom datasets](docs/custom_dataset.md).
## 🏆 Leaderboard
You can find [`GuardBench`](https://github.com/AmenRa/guardbench)'s leaderboard [here](https://huggingface.co/spaces/AmenRa/guardbench-leaderboard). If you want to submit your results, please contact us.
<!-- All results can be reproduced using the provided [`scripts`](scripts/effectiveness). -->
## 👨💻 Authors
- Elias Bassani (European Commission - Joint Research Centre)
## 🎓 Citation
```bibtex
@inproceedings{guardbench,
title = "{G}uard{B}ench: A Large-Scale Benchmark for Guardrail Models",
author = "Bassani, Elias and
Sanchez, Ignacio",
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.1022",
doi = "10.18653/v1/2024.emnlp-main.1022",
pages = "18393--18409",
}
```
## 🎁 Feature Requests
Would you like to see other features implemented? Please, open a [feature request](https://github.com/AmenRa/guardbench/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=%5BFeature+Request%5D+title).
## 📄 License
[GuardBench](https://github.com/AmenRa/guardbench) is provided as open-source software licensed under [EUPL v1.2](https://github.com/AmenRa/guardbench/blob/master/LICENSE).
Raw data
{
"_id": null,
"home_page": null,
"name": "guardbench",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "harmful content detection, harmful content, toxic content detection, toxic content, guardrails, guardrail models, guardrail models evaluation, guardrail models benchmark, evaluation, benchmark",
"author": "Elias Bassani",
"author_email": "elias.bassani@ec.europa.eu",
"download_url": "https://files.pythonhosted.org/packages/e5/1b/f4e50f45840790b4136029bb12c05805755b8d1e5cb04b15622e4f4fb539/guardbench-1.0.1.tar.gz",
"platform": null,
"description": "<div align=\"center\">\n <img src=\"https://repository-images.githubusercontent.com/837144095/8190ad0e-e9ff-4dda-9116-644d62d6b886\">\n</div>\n\n<p align=\"center\">\n <!-- Python -->\n <a href=\"https://www.python.org\" alt=\"Python\"><img src=\"https://badges.aleen42.com/src/python.svg\"></a>\n <!-- Version -->\n <a href=\"https://pypi.org/project/guardbench/\"><img src=\"https://img.shields.io/pypi/v/guardbench?color=light-green\" alt=\"PyPI version\"></a>\n <!-- Docs -->\n <a href=\"https://github.com/AmenRa/guardbench/tree/main/docs\"><img src=\"https://img.shields.io/badge/docs-passing-<COLOR>.svg\" alt=\"Documentation Status\"></a>\n <!-- Black -->\n <a href=\"https://github.com/psf/black\" alt=\"Code style: black\"><img src=\"https://img.shields.io/badge/code%20style-black-000000.svg\"></a>\n <!-- License -->\n <a href=\"https://interoperable-europe.ec.europa.eu/sites/default/files/custom-page/attachment/2020-03/EUPL-1.2%20EN.txt\"><img src=\"https://img.shields.io/badge/license-EUPL-blue.svg\" alt=\"License: EUPL-1.2\"></a>\n</p>\n\n# GuardBench\n\n## \ud83d\udd25 News\n\n- [October 9, 2025] GuardBench now supports four additional datasets: [JBB Behaviors](https://huggingface.co/datasets/JailbreakBench/JBB-Behaviors), [NicheHazardQA](https://huggingface.co/datasets/SoftMINER-Group/NicheHazardQA), [HarmEval](https://huggingface.co/datasets/SoftMINER-Group/HarmEval), and [TechHazardQA](https://huggingface.co/datasets/SoftMINER-Group/TechHazardQA). Also, it now allows for choosing the metrics to show at the end of the evaluation. Supported metrics are: `precision` (Precision), `recall` (Recall), `f1` (F1), `mcc` (Matthews Correlation Coefficient), `auprc` (AUPRC), `sensitivity` (Sensitivity), `specificity` (Specificity), `g_mean` (G-Mean), `fpr` (False Positive Rate), `fnr` (False Negative Rate).\n\n## \u26a1\ufe0f Introduction\n[`GuardBench`](https://github.com/AmenRa/guardbench) is a Python library for the evaluation of guardrail models, i.e., LLMs fine-tuned to detect unsafe content in human-AI interactions.\n[`GuardBench`](https://github.com/AmenRa/guardbench) provides a common interface to 40 evaluation datasets, which are downloaded and converted into a [standardized format](docs/data_format.md) for improved usability.\nIt also allows to quickly [compare results and export](docs/report.md) `LaTeX` tables for scientific publications.\n[`GuardBench`](https://github.com/AmenRa/guardbench)'s benchmarking pipeline can also be leveraged on [custom datasets](docs/custom_dataset.md).\n\n[`GuardBench`](https://github.com/AmenRa/guardbench) was featured in [EMNLP 2024](https://2024.emnlp.org).\nThe related paper is available [here](https://aclanthology.org/2024.emnlp-main.1022.pdf).\n\n[`GuardBench`](https://github.com/AmenRa/guardbench) has a public [leaderboard](https://huggingface.co/spaces/AmenRa/guardbench-leaderboard) available on HuggingFace.\n\nYou can find the list of supported datasets [here](docs/datasets.md).\nA few of them requires authorization. Please, read [this](docs/get_datasets.md).\n\nIf you use [`GuardBench`](https://github.com/AmenRa/guardbench) to evaluate guardrail models for your scientific publications, please consider [citing our work](#-citation).\n\n## \u2728 Features\n- [40 datasets](docs/datasets.md) for guardrail models evaluation.\n- Automated evaluation pipeline.\n- User-friendly.\n- [Extendable](docs/custom_dataset.md).\n- Reproducible and sharable evaluation.\n- Exportable [evaluation reports](docs/report.md).\n\n## \ud83d\udd0c Requirements\n```bash\npython>=3.10\n```\n\n## \ud83d\udcbe Installation \n```bash\npip install guardbench\n```\n\n## \ud83d\udca1 Usage\n```python\nfrom guardbench import benchmark\n\ndef moderate(\n conversations: list[list[dict[str, str]]], # MANDATORY!\n # additional `kwargs` as needed\n) -> list[float]:\n # do moderation\n # return list of floats (unsafe probabilities)\n\nbenchmark(\n moderate=moderate, # User-defined moderation function\n model_name=\"My Guardrail Model\",\n batch_size=1, # Default value\n datasets=\"all\", # Default value\n metrics=[\"f1\", \"recall\"], # Default value\n # Note: you can pass additional `kwargs` for `moderate`\n)\n```\n\n### \ud83d\udcd6 Examples\n- Follow our [tutorial](docs/llama_guard.md) on benchmarking [`Llama Guard`](https://arxiv.org/pdf/2312.06674) with [`GuardBench`](https://github.com/AmenRa/guardbench).\n- More examples are available in the [`scripts`](scripts/effectiveness) folder.\n\n## \ud83d\udcda Documentation\nBrowse the documentation for more details about:\n- The [datasets](docs/datasets.md) and how to [obtain them](docs/get_datasets.md).\n- The [data format](data_format.md) used by [`GuardBench`](https://github.com/AmenRa/guardbench).\n- How to use the [`Report`](docs/report.md) class to compare models and export results as `LaTeX` tables.\n- How to leverage [`GuardBench`](https://github.com/AmenRa/guardbench)'s benchmarking pipeline on [custom datasets](docs/custom_dataset.md).\n\n## \ud83c\udfc6 Leaderboard\nYou can find [`GuardBench`](https://github.com/AmenRa/guardbench)'s leaderboard [here](https://huggingface.co/spaces/AmenRa/guardbench-leaderboard). If you want to submit your results, please contact us.\n<!-- All results can be reproduced using the provided [`scripts`](scripts/effectiveness). -->\n\n## \ud83d\udc68\u200d\ud83d\udcbb Authors\n- Elias Bassani (European Commission - Joint Research Centre)\n\n## \ud83c\udf93 Citation\n```bibtex\n@inproceedings{guardbench,\n title = \"{G}uard{B}ench: A Large-Scale Benchmark for Guardrail Models\",\n author = \"Bassani, Elias and\n Sanchez, Ignacio\",\n editor = \"Al-Onaizan, Yaser and\n Bansal, Mohit and\n Chen, Yun-Nung\",\n booktitle = \"Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing\",\n month = nov,\n year = \"2024\",\n address = \"Miami, Florida, USA\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://aclanthology.org/2024.emnlp-main.1022\",\n doi = \"10.18653/v1/2024.emnlp-main.1022\",\n pages = \"18393--18409\",\n}\n```\n\n## \ud83c\udf81 Feature Requests\nWould you like to see other features implemented? Please, open a [feature request](https://github.com/AmenRa/guardbench/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=%5BFeature+Request%5D+title).\n\n## \ud83d\udcc4 License\n[GuardBench](https://github.com/AmenRa/guardbench) is provided as open-source software licensed under [EUPL v1.2](https://github.com/AmenRa/guardbench/blob/master/LICENSE).\n",
"bugtrack_url": null,
"license": null,
"summary": "GuardBench: A Large-Scale Benchmark for Guardrail Models",
"version": "1.0.1",
"project_urls": null,
"split_keywords": [
"harmful content detection",
" harmful content",
" toxic content detection",
" toxic content",
" guardrails",
" guardrail models",
" guardrail models evaluation",
" guardrail models benchmark",
" evaluation",
" benchmark"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "e79bbeb6b98ee85c5741e4a632fe3f44573c13313d9edc1f3c813ccd8a7da8de",
"md5": "32e2984f1ad936c1674810f8ff579161",
"sha256": "a701929ba6b8748e47c2a7b1eddd56b1bf6890fa670ac19673f1dbce66cd4046"
},
"downloads": -1,
"filename": "guardbench-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "32e2984f1ad936c1674810f8ff579161",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 83432,
"upload_time": "2025-10-09T13:39:28",
"upload_time_iso_8601": "2025-10-09T13:39:28.050000Z",
"url": "https://files.pythonhosted.org/packages/e7/9b/beb6b98ee85c5741e4a632fe3f44573c13313d9edc1f3c813ccd8a7da8de/guardbench-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "e51bf4e50f45840790b4136029bb12c05805755b8d1e5cb04b15622e4f4fb539",
"md5": "0b21657e03dbc01bfb69d06913217081",
"sha256": "9082f44b79e697de3d50e299da95921172e8a5de42a003cd68d23bcf9b226135"
},
"downloads": -1,
"filename": "guardbench-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "0b21657e03dbc01bfb69d06913217081",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 56146,
"upload_time": "2025-10-09T13:39:29",
"upload_time_iso_8601": "2025-10-09T13:39:29.440681Z",
"url": "https://files.pythonhosted.org/packages/e5/1b/f4e50f45840790b4136029bb12c05805755b8d1e5cb04b15622e4f4fb539/guardbench-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-09 13:39:29",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "guardbench"
}