# Retrieval-Augmented Generation (RAG) Evaluation Pack
Get benchmark scores on your own RAG pipeline (i.e. `QueryEngine`) on a RAG
dataset (i.e., `LabelledRagDataset`). Specifically this pack takes in as input a
query engine and a `LabelledRagDataset`, which can also be downloaded from
[llama-hub](https://llamahub.ai).
## CLI Usage
You can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:
```bash
llamaindex-cli download-llamapack RagEvaluatorPack --download-dir ./rag_evaluator_pack
```
You can then inspect the files at `./rag_evaluator_pack` and use them as a template for your own project!
## Code Usage
You can download the pack to the `./rag_evaluator_pack` directory through python
code as well. The sample script below demonstrates how to construct `RagEvaluatorPack`
using a `LabelledRagDataset` downloaded from `llama-hub` and a simple RAG pipeline
built off of its source documents.
```python
from llama_index.core.llama_dataset import download_llama_dataset
from llama_index.core.llama_pack import download_llama_pack
from llama_index.core import VectorStoreIndex
# download a LabelledRagDataset from llama-hub
rag_dataset, documents = download_llama_dataset(
"PaulGrahamEssayDataset", "./paul_graham"
)
# build a basic RAG pipeline off of the source documents
index = VectorStoreIndex.from_documents(documents=documents)
query_engine = index.as_query_engine()
# Time to benchmark/evaluate this RAG pipeline
# Download and install dependencies
RagEvaluatorPack = download_llama_pack(
"RagEvaluatorPack", "./rag_evaluator_pack"
)
# construction requires a query_engine, a rag_dataset, and optionally a judge_llm
rag_evaluator_pack = RagEvaluatorPack(
query_engine=query_engine, rag_dataset=rag_dataset
)
# PERFORM EVALUATION
benchmark_df = rag_evaluator_pack.run() # async arun() also supported
print(benchmark_df)
```
`Output:`
```text
rag base_rag
metrics
mean_correctness_score 4.511364
mean_relevancy_score 0.931818
mean_faithfulness_score 1.000000
mean_context_similarity_score 0.945952
```
Note that `rag_evaluator_pack.run()` will also save two files in the same directory
in which the pack was invoked:
```bash
.
├── benchmark.csv (CSV format of the benchmark scores)
└── _evaluations.json (raw evaluation results for all examples & predictions)
```
Raw data
{
"_id": null,
"home_page": null,
"name": "llama-index-packs-rag-evaluator",
"maintainer": "nerdai",
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "benchmarks, evaluation, rag",
"author": "Your Name",
"author_email": "you@example.com",
"download_url": "https://files.pythonhosted.org/packages/34/6a/e9f47b7bc45739ae59e1fd2195173e0b359b245d32915db155a527e5862f/llama_index_packs_rag_evaluator-0.3.0.tar.gz",
"platform": null,
"description": "# Retrieval-Augmented Generation (RAG) Evaluation Pack\n\nGet benchmark scores on your own RAG pipeline (i.e. `QueryEngine`) on a RAG\ndataset (i.e., `LabelledRagDataset`). Specifically this pack takes in as input a\nquery engine and a `LabelledRagDataset`, which can also be downloaded from\n[llama-hub](https://llamahub.ai).\n\n## CLI Usage\n\nYou can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:\n\n```bash\nllamaindex-cli download-llamapack RagEvaluatorPack --download-dir ./rag_evaluator_pack\n```\n\nYou can then inspect the files at `./rag_evaluator_pack` and use them as a template for your own project!\n\n## Code Usage\n\nYou can download the pack to the `./rag_evaluator_pack` directory through python\ncode as well. The sample script below demonstrates how to construct `RagEvaluatorPack`\nusing a `LabelledRagDataset` downloaded from `llama-hub` and a simple RAG pipeline\nbuilt off of its source documents.\n\n```python\nfrom llama_index.core.llama_dataset import download_llama_dataset\nfrom llama_index.core.llama_pack import download_llama_pack\nfrom llama_index.core import VectorStoreIndex\n\n# download a LabelledRagDataset from llama-hub\nrag_dataset, documents = download_llama_dataset(\n \"PaulGrahamEssayDataset\", \"./paul_graham\"\n)\n\n# build a basic RAG pipeline off of the source documents\nindex = VectorStoreIndex.from_documents(documents=documents)\nquery_engine = index.as_query_engine()\n\n# Time to benchmark/evaluate this RAG pipeline\n# Download and install dependencies\nRagEvaluatorPack = download_llama_pack(\n \"RagEvaluatorPack\", \"./rag_evaluator_pack\"\n)\n\n# construction requires a query_engine, a rag_dataset, and optionally a judge_llm\nrag_evaluator_pack = RagEvaluatorPack(\n query_engine=query_engine, rag_dataset=rag_dataset\n)\n\n# PERFORM EVALUATION\nbenchmark_df = rag_evaluator_pack.run() # async arun() also supported\nprint(benchmark_df)\n```\n\n`Output:`\n\n```text\nrag base_rag\nmetrics\nmean_correctness_score 4.511364\nmean_relevancy_score 0.931818\nmean_faithfulness_score 1.000000\nmean_context_similarity_score 0.945952\n```\n\nNote that `rag_evaluator_pack.run()` will also save two files in the same directory\nin which the pack was invoked:\n\n```bash\n.\n\u251c\u2500\u2500 benchmark.csv (CSV format of the benchmark scores)\n\u2514\u2500\u2500 _evaluations.json (raw evaluation results for all examples & predictions)\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "llama-index packs rag_evaluator integration",
"version": "0.3.0",
"project_urls": null,
"split_keywords": [
"benchmarks",
" evaluation",
" rag"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "a4bc2f64714f10a10338d3cd39496ed453a4649fc9d2ae6aefa683cddf720bfc",
"md5": "2155965141359bb6a8220774ac6e7abe",
"sha256": "d2626989b21c0ce4bf2fdbc866598639b9ae3a05b0baea3a9bbad58855f8cd56"
},
"downloads": -1,
"filename": "llama_index_packs_rag_evaluator-0.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2155965141359bb6a8220774ac6e7abe",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 5944,
"upload_time": "2024-11-18T01:31:46",
"upload_time_iso_8601": "2024-11-18T01:31:46.308955Z",
"url": "https://files.pythonhosted.org/packages/a4/bc/2f64714f10a10338d3cd39496ed453a4649fc9d2ae6aefa683cddf720bfc/llama_index_packs_rag_evaluator-0.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "346ae9f47b7bc45739ae59e1fd2195173e0b359b245d32915db155a527e5862f",
"md5": "e1156ebacef06a1974f2d255719b2b98",
"sha256": "b4a093f127dabab5434fc69c1fb35ab6a35b55de1004b2551c14ea2abb69a742"
},
"downloads": -1,
"filename": "llama_index_packs_rag_evaluator-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "e1156ebacef06a1974f2d255719b2b98",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 5665,
"upload_time": "2024-11-18T01:31:47",
"upload_time_iso_8601": "2024-11-18T01:31:47.826168Z",
"url": "https://files.pythonhosted.org/packages/34/6a/e9f47b7bc45739ae59e1fd2195173e0b359b245d32915db155a527e5862f/llama_index_packs_rag_evaluator-0.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-18 01:31:47",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "llama-index-packs-rag-evaluator"
}