# Retrieval-Augmented Generation (RAG) Evaluation Pack
Get benchmark scores on your own RAG pipeline (i.e. `QueryEngine`) on a RAG
dataset (i.e., `LabelledRagDataset`). Specifically this pack takes in as input a
query engine and a `LabelledRagDataset`, which can also be downloaded from
[llama-hub](https://llamahub.ai).
## CLI Usage
You can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:
```bash
llamaindex-cli download-llamapack RagEvaluatorPack --download-dir ./rag_evaluator_pack
```
You can then inspect the files at `./rag_evaluator_pack` and use them as a template for your own project!
## Code Usage
You can download the pack to the `./rag_evaluator_pack` directory through python
code as well. The sample script below demonstrates how to construct `RagEvaluatorPack`
using a `LabelledRagDataset` downloaded from `llama-hub` and a simple RAG pipeline
built off of its source documents.
```python
from llama_index.core.llama_dataset import download_llama_dataset
from llama_index.core.llama_pack import download_llama_pack
from llama_index import VectorStoreIndex
# download a LabelledRagDataset from llama-hub
rag_dataset, documents = download_llama_dataset(
"PaulGrahamEssayDataset", "./paul_graham"
)
# build a basic RAG pipeline off of the source documents
index = VectorStoreIndex.from_documents(documents=documents)
query_engine = index.as_query_engine()
# Time to benchmark/evaluate this RAG pipeline
# Download and install dependencies
RagEvaluatorPack = download_llama_pack(
"RagEvaluatorPack", "./rag_evaluator_pack"
)
# construction requires a query_engine, a rag_dataset, and optionally a judge_llm
rag_evaluator_pack = RagEvaluatorPack(
query_engine=query_engine, rag_dataset=rag_dataset
)
# PERFORM EVALUATION
benchmark_df = rag_evaluator_pack.run() # async arun() also supported
print(benchmark_df)
```
`Output:`
```text
rag base_rag
metrics
mean_correctness_score 4.511364
mean_relevancy_score 0.931818
mean_faithfulness_score 1.000000
mean_context_similarity_score 0.945952
```
Note that `rag_evaluator_pack.run()` will also save two files in the same directory
in which the pack was invoked:
```bash
.
├── benchmark.csv (CSV format of the benchmark scores)
└── _evaluations.json (raw evaluation results for all examples & predictions)
```
Raw data
{
"_id": null,
"home_page": "",
"name": "llama-index-packs-rag-evaluator",
"maintainer": "nerdai",
"docs_url": null,
"requires_python": ">=3.8.1,<4.0",
"maintainer_email": "",
"keywords": "benchmarks,evaluation,rag",
"author": "Your Name",
"author_email": "you@example.com",
"download_url": "https://files.pythonhosted.org/packages/19/de/520989eadfcf40f28ac20ff346327affa7ebca22dfb26af4f291a0e17799/llama_index_packs_rag_evaluator-0.1.3.tar.gz",
"platform": null,
"description": "# Retrieval-Augmented Generation (RAG) Evaluation Pack\n\nGet benchmark scores on your own RAG pipeline (i.e. `QueryEngine`) on a RAG\ndataset (i.e., `LabelledRagDataset`). Specifically this pack takes in as input a\nquery engine and a `LabelledRagDataset`, which can also be downloaded from\n[llama-hub](https://llamahub.ai).\n\n## CLI Usage\n\nYou can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:\n\n```bash\nllamaindex-cli download-llamapack RagEvaluatorPack --download-dir ./rag_evaluator_pack\n```\n\nYou can then inspect the files at `./rag_evaluator_pack` and use them as a template for your own project!\n\n## Code Usage\n\nYou can download the pack to the `./rag_evaluator_pack` directory through python\ncode as well. The sample script below demonstrates how to construct `RagEvaluatorPack`\nusing a `LabelledRagDataset` downloaded from `llama-hub` and a simple RAG pipeline\nbuilt off of its source documents.\n\n```python\nfrom llama_index.core.llama_dataset import download_llama_dataset\nfrom llama_index.core.llama_pack import download_llama_pack\nfrom llama_index import VectorStoreIndex\n\n# download a LabelledRagDataset from llama-hub\nrag_dataset, documents = download_llama_dataset(\n \"PaulGrahamEssayDataset\", \"./paul_graham\"\n)\n\n# build a basic RAG pipeline off of the source documents\nindex = VectorStoreIndex.from_documents(documents=documents)\nquery_engine = index.as_query_engine()\n\n# Time to benchmark/evaluate this RAG pipeline\n# Download and install dependencies\nRagEvaluatorPack = download_llama_pack(\n \"RagEvaluatorPack\", \"./rag_evaluator_pack\"\n)\n\n# construction requires a query_engine, a rag_dataset, and optionally a judge_llm\nrag_evaluator_pack = RagEvaluatorPack(\n query_engine=query_engine, rag_dataset=rag_dataset\n)\n\n# PERFORM EVALUATION\nbenchmark_df = rag_evaluator_pack.run() # async arun() also supported\nprint(benchmark_df)\n```\n\n`Output:`\n\n```text\nrag base_rag\nmetrics\nmean_correctness_score 4.511364\nmean_relevancy_score 0.931818\nmean_faithfulness_score 1.000000\nmean_context_similarity_score 0.945952\n```\n\nNote that `rag_evaluator_pack.run()` will also save two files in the same directory\nin which the pack was invoked:\n\n```bash\n.\n\u251c\u2500\u2500 benchmark.csv (CSV format of the benchmark scores)\n\u2514\u2500\u2500 _evaluations.json (raw evaluation results for all examples & predictions)\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "llama-index packs rag_evaluator integration",
"version": "0.1.3",
"project_urls": null,
"split_keywords": [
"benchmarks",
"evaluation",
"rag"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "c06adfa8b006cf43c6015d553db0108624d9ee4ae42c59d60c51fca8b546bd9e",
"md5": "92687b49fb6f5671fde9296be2436f0b",
"sha256": "f253975bdc019126a1599128e82f1099e5be5b22ab57ba02e64ea757f45a2fa5"
},
"downloads": -1,
"filename": "llama_index_packs_rag_evaluator-0.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "92687b49fb6f5671fde9296be2436f0b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8.1,<4.0",
"size": 5810,
"upload_time": "2024-02-22T01:29:46",
"upload_time_iso_8601": "2024-02-22T01:29:46.207628Z",
"url": "https://files.pythonhosted.org/packages/c0/6a/dfa8b006cf43c6015d553db0108624d9ee4ae42c59d60c51fca8b546bd9e/llama_index_packs_rag_evaluator-0.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "19de520989eadfcf40f28ac20ff346327affa7ebca22dfb26af4f291a0e17799",
"md5": "01c195d01abb9d45ab5f432d420fa515",
"sha256": "d378372bb6fd1717542244f434b6bb1c773077eb5ee4a284d86e8c5682dc8bd7"
},
"downloads": -1,
"filename": "llama_index_packs_rag_evaluator-0.1.3.tar.gz",
"has_sig": false,
"md5_digest": "01c195d01abb9d45ab5f432d420fa515",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8.1,<4.0",
"size": 5508,
"upload_time": "2024-02-22T01:29:47",
"upload_time_iso_8601": "2024-02-22T01:29:47.453880Z",
"url": "https://files.pythonhosted.org/packages/19/de/520989eadfcf40f28ac20ff346327affa7ebca22dfb26af4f291a0e17799/llama_index_packs_rag_evaluator-0.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-02-22 01:29:47",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "llama-index-packs-rag-evaluator"
}