# GEM-BENCH
[](LICENSE)[](https://www.python.org/downloads/)

This repository provides a comprehensive benchmark for **Generative Engine Marketing (GEM)**, an emerging field that focuses on monetizing generative AI by seamlessly integrating advertisements into Large Language Model (LLM) responses. Our work addresses the core problem of **ad-injected response (AIR) generation** and provides a framework for its **evaluation**.
* **Generative Engine Marketing (GEM):** A new ecosystem where relevant ads are integrated directly into responses from generative AI assistants, such as LLM-based chatbots.
* **Ad-injected Response (AIR) Generation:** The process of creating responses that seamlessly include relevant advertisements while maintaining a high-quality user experience and satisfying advertiser objectives.
* **GEM-BENCH:** The first comprehensive benchmark designed for the generation and evaluation of ad-injected responses.
---
## 📋 Table of Contents
- [Installation](#installation)
- [Getting Started](#getting-started)
- [Available Datasets](#available-datasets)
- [Evaluation Methods](#evaluation-methods)
- [Supported Solutions](#supported-solutions)
- [License](#license)
---
## 🔧 Installation
### Prerequisites
- Python 3.12 or higher
- Conda (recommended for environment management)
### Setup
```bash
# Clone the repository
git clone https://github.com/Generative-Engine-Marketing/GEM-Bench.git
cd GemBench
# Create and activate conda environment
conda create --name GemBench python=3.12
conda activate GemBench
# Install Project
pip install -e .
````
### Environment Configuration
Create a `.env` file in the root directory with the following variables:
```
# Please fill in your own API keys here and change the file name to .env
OPENAI_API_KEY="<LLMs API Key>"
BASE_URL="<LLMs Base URL>"
TRANSFORMERS_OFFLINE=1 # Enable offline mode for Hugging Face Transformers
HF_HUB_OFFLINE=1 # Enable offline mode for Hugging Face Hub
# Embedding
EMBEDDING_API_KEY="<Embedding API Key>"
EMBEDDING_BASE_URL="<Embedding Base URL>"
```
-----
## 🚀 Getting Started
After setting up your environment and configuration, you can run the main script to reproduce the experiments from our paper.
```bash
python paper.py
```
To modify the evaluation, edit the `paper.py` file to adjust the `data_sets`, `solutions` dictionary, and `model_name`/`judge_model` parameters.
-----
## Available Datasets
The GEM-BENCH benchmark includes three curated datasets that cover both chatbot and search scenarios. You can find their paths within the `paper.py` script.
* **MT-Human:** Based on the humanities questions from the MT-Bench benchmark, this dataset is suitable for ad injection in a multi-turn chatbot scenario.
* **LM-Market:** Curated from the LMSYS-Chat-1M dataset, it contains real user-LLM conversations focused on marketing-related topics.
* **CA-Prod:** Simulates the AI overview feature in search engines using commercial advertising data from a search engine.
-----
## Evaluation Methods
GEM-BENCH provides a multi-faceted metric ontology for evaluating ad-injected responses, covering both **quantitative** and **qualitative** aspects of user satisfaction and engagement. The evaluation logic is located in `evaluation/`.
* **Quantitative Metrics:**
* **Response Flow & Coherence:** Measure the semantic smoothness and topic consistency of the response.
* **Ad Flow & Coherence:** Specifically assess how well the ad sentence integrates with the surrounding text.
* **Injection Rate & Click-Through Rate (CTR):** Capture the system's ability to deliver ads and user engagement.
* **Qualitative Metrics:**
* **User Satisfaction:** Evaluated on dimensions like **Accuracy**, **Naturalness** (interruptiveness, authenticity), **Personality** (helpfulness, salesmanship), and **Trust** (credibility, bias).
* **User Engagement:** Measured by **Notice** (awareness, attitude) and **Click** (awareness of sponsored links, likelihood to click).
-----
## Supported Solutions
The benchmark provides implementations for several baseline solutions, allowing for flexible experimentation. You can find their configurations and exposed parameters within the `paper.py` file.
* **Ad-Chat:** An existing solution that integrates ads into the system prompt of the LLM.
* **Parameters:** `model_name` (default: `doubao-1-5-lite-32k`).
* **Ad-LLM:** A multi-agent framework inspired by recent work, implemented with different configurations:
* **GI-R:** **G**enerate and **I**nject with ad **R**etrieval based on the raw response. This is a retrieval-augmented generation (RAG) approach that skips the final rewriting step.
* **GIR-R:** **G**enerate, **I**nject, and **R**ewrite with ad **R**etrieval based on the raw response.
* **GIR-P:** **G**enerate, **I**nject, and **R**ewrite with ad **R**etrieval based on the user **P**rompt.
* **Parameters:** All Ad-LLM solutions expose the `embedding_model` and `ad_retriever` as configurable parameters. The `response_rewriter` and `ad_injector` modules also have internal parameters that can be modified.
-----
## 📖 Citation
If you use GEM-BENCH in your research, please cite our paper:
```bibtex
@article{hu2025gembench,
title={GEM-Bench: A Benchmark for Ad-Injected Response Generation within Generative Engine Marketing},
author={Hu, Silan and Zhang, Shiqi and Shi, Yimin and Xiao, Xiaokui},
journal={arXiv preprint arXiv:2509.14221},
year={2025}
}
```
For more information, visit our website: [https://gem-bench.org](https://gem-bench.org)
-----
## 📄 License
This project is licensed under the Apache-2.0 License - see the [LICENSE](./LICENSE) file for details.
Raw data
{
"_id": null,
"home_page": "https://github.com/Generative-Engine-Marketing/GEM-Bench",
"name": "gembench",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "llm generative-engine-marketing gem benchmark machine-learning",
"author": "GemBench Team(Silan Hu, Shiqi Zhang, Yimin Shi, Xiaokui Xiao)",
"author_email": null,
"download_url": null,
"platform": null,
"description": "# GEM-BENCH\n\n[](LICENSE)[](https://www.python.org/downloads/)\n\n\n\n\n\nThis repository provides a comprehensive benchmark for **Generative Engine Marketing (GEM)**, an emerging field that focuses on monetizing generative AI by seamlessly integrating advertisements into Large Language Model (LLM) responses. Our work addresses the core problem of **ad-injected response (AIR) generation** and provides a framework for its **evaluation**.\n\n* **Generative Engine Marketing (GEM):** A new ecosystem where relevant ads are integrated directly into responses from generative AI assistants, such as LLM-based chatbots.\n* **Ad-injected Response (AIR) Generation:** The process of creating responses that seamlessly include relevant advertisements while maintaining a high-quality user experience and satisfying advertiser objectives.\n* **GEM-BENCH:** The first comprehensive benchmark designed for the generation and evaluation of ad-injected responses.\n\n---\n\n## \ud83d\udccb Table of Contents\n\n- [Installation](#installation)\n- [Getting Started](#getting-started)\n- [Available Datasets](#available-datasets)\n- [Evaluation Methods](#evaluation-methods)\n- [Supported Solutions](#supported-solutions)\n- [License](#license)\n\n---\n\n## \ud83d\udd27 Installation\n\n### Prerequisites\n\n- Python 3.12 or higher\n- Conda (recommended for environment management)\n\n### Setup\n\n```bash\n# Clone the repository\ngit clone https://github.com/Generative-Engine-Marketing/GEM-Bench.git\ncd GemBench\n\n# Create and activate conda environment\nconda create --name GemBench python=3.12\nconda activate GemBench\n\n# Install Project\npip install -e .\n````\n\n### Environment Configuration\n\nCreate a `.env` file in the root directory with the following variables:\n\n```\n# Please fill in your own API keys here and change the file name to .env\nOPENAI_API_KEY=\"<LLMs API Key>\"\nBASE_URL=\"<LLMs Base URL>\"\n\nTRANSFORMERS_OFFLINE=1 # Enable offline mode for Hugging Face Transformers\nHF_HUB_OFFLINE=1 # Enable offline mode for Hugging Face Hub\n\n# Embedding\nEMBEDDING_API_KEY=\"<Embedding API Key>\"\nEMBEDDING_BASE_URL=\"<Embedding Base URL>\"\n```\n\n-----\n\n## \ud83d\ude80 Getting Started\n\nAfter setting up your environment and configuration, you can run the main script to reproduce the experiments from our paper.\n\n```bash\npython paper.py\n```\n\nTo modify the evaluation, edit the `paper.py` file to adjust the `data_sets`, `solutions` dictionary, and `model_name`/`judge_model` parameters.\n\n-----\n\n## Available Datasets\n\nThe GEM-BENCH benchmark includes three curated datasets that cover both chatbot and search scenarios. You can find their paths within the `paper.py` script.\n\n * **MT-Human:** Based on the humanities questions from the MT-Bench benchmark, this dataset is suitable for ad injection in a multi-turn chatbot scenario.\n * **LM-Market:** Curated from the LMSYS-Chat-1M dataset, it contains real user-LLM conversations focused on marketing-related topics.\n * **CA-Prod:** Simulates the AI overview feature in search engines using commercial advertising data from a search engine.\n\n-----\n\n## Evaluation Methods\n\nGEM-BENCH provides a multi-faceted metric ontology for evaluating ad-injected responses, covering both **quantitative** and **qualitative** aspects of user satisfaction and engagement. The evaluation logic is located in `evaluation/`.\n\n * **Quantitative Metrics:**\n\n * **Response Flow & Coherence:** Measure the semantic smoothness and topic consistency of the response.\n * **Ad Flow & Coherence:** Specifically assess how well the ad sentence integrates with the surrounding text.\n * **Injection Rate & Click-Through Rate (CTR):** Capture the system's ability to deliver ads and user engagement.\n\n * **Qualitative Metrics:**\n\n * **User Satisfaction:** Evaluated on dimensions like **Accuracy**, **Naturalness** (interruptiveness, authenticity), **Personality** (helpfulness, salesmanship), and **Trust** (credibility, bias).\n * **User Engagement:** Measured by **Notice** (awareness, attitude) and **Click** (awareness of sponsored links, likelihood to click).\n\n-----\n\n## Supported Solutions\n\nThe benchmark provides implementations for several baseline solutions, allowing for flexible experimentation. You can find their configurations and exposed parameters within the `paper.py` file.\n\n * **Ad-Chat:** An existing solution that integrates ads into the system prompt of the LLM.\n\n * **Parameters:** `model_name` (default: `doubao-1-5-lite-32k`).\n\n * **Ad-LLM:** A multi-agent framework inspired by recent work, implemented with different configurations:\n\n * **GI-R:** **G**enerate and **I**nject with ad **R**etrieval based on the raw response. This is a retrieval-augmented generation (RAG) approach that skips the final rewriting step.\n * **GIR-R:** **G**enerate, **I**nject, and **R**ewrite with ad **R**etrieval based on the raw response.\n * **GIR-P:** **G**enerate, **I**nject, and **R**ewrite with ad **R**etrieval based on the user **P**rompt.\n * **Parameters:** All Ad-LLM solutions expose the `embedding_model` and `ad_retriever` as configurable parameters. The `response_rewriter` and `ad_injector` modules also have internal parameters that can be modified.\n\n-----\n\n## \ud83d\udcd6 Citation\n\nIf you use GEM-BENCH in your research, please cite our paper:\n\n```bibtex\n@article{hu2025gembench,\n title={GEM-Bench: A Benchmark for Ad-Injected Response Generation within Generative Engine Marketing},\n author={Hu, Silan and Zhang, Shiqi and Shi, Yimin and Xiao, Xiaokui},\n journal={arXiv preprint arXiv:2509.14221},\n year={2025}\n}\n```\n\nFor more information, visit our website: [https://gem-bench.org](https://gem-bench.org)\n\n-----\n\n## \ud83d\udcc4 License\n\nThis project is licensed under the Apache-2.0 License - see the [LICENSE](./LICENSE) file for details.\n",
"bugtrack_url": null,
"license": null,
"summary": "First comprehensive benchmark for Generative Engine Marketing (GEM), an emerging field that focuses on monetizing generative AI by seamlessly integrating advertisements into Large Language Model (LLM) responses. Our work addresses the core problem of ad-injected response (AIR) generation and provides a framework for its evaluation.",
"version": "1.0.7",
"project_urls": {
"Bug Reports": "https://github.com/Generative-Engine-Marketing/GEM-Bench/issues",
"Documentation": "https://github.com/Generative-Engine-Marketing/GEM-Bench/blob/main/README.md",
"Homepage": "https://github.com/Generative-Engine-Marketing/GEM-Bench",
"Source": "https://github.com/Generative-Engine-Marketing/GEM-Bench"
},
"split_keywords": [
"llm",
"generative-engine-marketing",
"gem",
"benchmark",
"machine-learning"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "a1945c9c4aac6a0709b6c8e8ce6cf4a312100b166fb231bca72794d1622c68f7",
"md5": "518b38755ede0c72986fed84b4ef6a43",
"sha256": "f0d8d8d8db7e120238be329c07a61ec9fd349b306b81ec2c0f0f4e1e8a1485c9"
},
"downloads": -1,
"filename": "gembench-1.0.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "518b38755ede0c72986fed84b4ef6a43",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 28906226,
"upload_time": "2025-10-13T06:46:21",
"upload_time_iso_8601": "2025-10-13T06:46:21.610526Z",
"url": "https://files.pythonhosted.org/packages/a1/94/5c9c4aac6a0709b6c8e8ce6cf4a312100b166fb231bca72794d1622c68f7/gembench-1.0.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-13 06:46:21",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Generative-Engine-Marketing",
"github_project": "GEM-Bench",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [
{
"name": "openai",
"specs": []
},
{
"name": "tiktoken",
"specs": []
},
{
"name": "flask",
"specs": []
},
{
"name": "python_dotenv",
"specs": []
},
{
"name": "nltk",
"specs": []
},
{
"name": "numpy",
"specs": []
},
{
"name": "sentence_transformers",
"specs": []
},
{
"name": "huggingface_hub",
"specs": []
},
{
"name": "tokenizers",
"specs": []
},
{
"name": "transformers",
"specs": []
},
{
"name": "scikit-learn",
"specs": [
[
">=",
"1.0.0"
]
]
},
{
"name": "rouge",
"specs": [
[
">=",
"1.0.1"
]
]
},
{
"name": "gensim",
"specs": [
[
">=",
"4.3.2"
]
]
},
{
"name": "pandas",
"specs": [
[
">=",
"2.1.0"
]
]
},
{
"name": "rich",
"specs": [
[
"==",
"14.0.0"
]
]
},
{
"name": "openpyxl",
"specs": [
[
"==",
"3.1.5"
]
]
},
{
"name": "et-xmlfile",
"specs": [
[
"==",
"2.0.0"
]
]
},
{
"name": "pyfiglet",
"specs": [
[
"==",
"1.0.2"
]
]
},
{
"name": "portalocker",
"specs": [
[
"==",
"2.10.1"
]
]
},
{
"name": "tenacity",
"specs": []
},
{
"name": "aiolimiter",
"specs": []
},
{
"name": "diskcache",
"specs": []
},
{
"name": "cachetools",
"specs": []
},
{
"name": "matplotlib",
"specs": []
},
{
"name": "tiktoken",
"specs": []
}
],
"lcname": "gembench"
}