Name | llama-index-packs-evaluator-benchmarker JSON |
Version |
0.2.0
JSON |
| download |
home_page | None |
Summary | llama-index packs evaluator_benchmarker integration |
upload_time | 2024-08-22 16:51:01 |
maintainer | nerdai |
docs_url | None |
author | Your Name |
requires_python | <4.0,>=3.8.1 |
license | MIT |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Evaluator Benchmarker Pack
A pack for quick computation of benchmark results of your own LLM evaluator
on an Evaluation llama-dataset. Specifically, this pack supports benchmarking
an appropriate evaluator on the following llama-datasets:
- `LabelledEvaluatorDataset` for single-grading evaluations
- `LabelledPairwiseEvaluatorDataset` for pairwise-grading evaluations
These llama-datasets can be downloaed from [llama-hub](https://llamahub.ai).
## CLI Usage
You can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:
```bash
llamaindex-cli download-llamapack EvaluatorBenchmarkerPack --download-dir ./evaluator_benchmarker_pack
```
You can then inspect the files at `./evaluator_benchmarker_pack` and use them as a template for your own project!
## Code Usage
You can download the pack to the `./evaluator_benchmarker_pack` directory through python
code as well. The sample script below demonstrates how to construct `EvaluatorBenchmarkerPack`
using a `LabelledPairwiseEvaluatorDataset` downloaded from `llama-hub` and a
`PairwiseComparisonEvaluator` that uses GPT-4 as the LLM. Note though that this pack
can also be used on a `LabelledEvaluatorDataset` with a `BaseEvaluator` that performs
single-grading evaluation — in this case, the usage flow remains the same.
```python
from llama_index.core.llama_dataset import download_llama_dataset
from llama_index.core.llama_pack import download_llama_pack
from llama_index.core.evaluation import PairwiseComparisonEvaluator
from llama_index.llms.openai import OpenAI
from llama_index.core import ServiceContext
# download a LabelledRagDataset from llama-hub
pairwise_dataset = download_llama_dataset(
"MiniMtBenchHumanJudgementDataset", "./data"
)
# define your evaluator
gpt_4_context = ServiceContext.from_defaults(
llm=OpenAI(temperature=0, model="gpt-4"),
)
evaluator = PairwiseComparisonEvaluator(service_context=gpt_4_context)
# download and install dependencies
EvaluatorBenchmarkerPack = download_llama_pack(
"EvaluatorBenchmarkerPack", "./pack"
)
# construction requires an evaluator and an eval_dataset
evaluator_benchmarker_pack = EvaluatorBenchmarkerPack(
evaluator=evaluator,
eval_dataset=pairwise_dataset,
show_progress=True,
)
# PERFORM EVALUATION
benchmark_df = evaluator_benchmarker_pack.run() # async arun() also supported
print(benchmark_df)
```
`Output:`
```text
number_examples 1689
inconclusives 140
ties 379
agreement_rate_with_ties 0.657844
agreement_rate_without_ties 0.828205
```
Note that `evaluator_benchmarker_pack.run()` will also save the `benchmark_df` files in the same directory.
```bash
.
└── benchmark.csv
```
Raw data
{
"_id": null,
"home_page": null,
"name": "llama-index-packs-evaluator-benchmarker",
"maintainer": "nerdai",
"docs_url": null,
"requires_python": "<4.0,>=3.8.1",
"maintainer_email": null,
"keywords": null,
"author": "Your Name",
"author_email": "you@example.com",
"download_url": "https://files.pythonhosted.org/packages/f0/a7/e8c559f8c6a9ab31da2549eb032984eb7e8e20afd7a54d6dc7aa22311725/llama_index_packs_evaluator_benchmarker-0.2.0.tar.gz",
"platform": null,
"description": "# Evaluator Benchmarker Pack\n\nA pack for quick computation of benchmark results of your own LLM evaluator\non an Evaluation llama-dataset. Specifically, this pack supports benchmarking\nan appropriate evaluator on the following llama-datasets:\n\n- `LabelledEvaluatorDataset` for single-grading evaluations\n- `LabelledPairwiseEvaluatorDataset` for pairwise-grading evaluations\n\nThese llama-datasets can be downloaed from [llama-hub](https://llamahub.ai).\n\n## CLI Usage\n\nYou can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:\n\n```bash\nllamaindex-cli download-llamapack EvaluatorBenchmarkerPack --download-dir ./evaluator_benchmarker_pack\n```\n\nYou can then inspect the files at `./evaluator_benchmarker_pack` and use them as a template for your own project!\n\n## Code Usage\n\nYou can download the pack to the `./evaluator_benchmarker_pack` directory through python\ncode as well. The sample script below demonstrates how to construct `EvaluatorBenchmarkerPack`\nusing a `LabelledPairwiseEvaluatorDataset` downloaded from `llama-hub` and a\n`PairwiseComparisonEvaluator` that uses GPT-4 as the LLM. Note though that this pack\ncan also be used on a `LabelledEvaluatorDataset` with a `BaseEvaluator` that performs\nsingle-grading evaluation \u2014 in this case, the usage flow remains the same.\n\n```python\nfrom llama_index.core.llama_dataset import download_llama_dataset\nfrom llama_index.core.llama_pack import download_llama_pack\nfrom llama_index.core.evaluation import PairwiseComparisonEvaluator\nfrom llama_index.llms.openai import OpenAI\nfrom llama_index.core import ServiceContext\n\n# download a LabelledRagDataset from llama-hub\npairwise_dataset = download_llama_dataset(\n \"MiniMtBenchHumanJudgementDataset\", \"./data\"\n)\n\n# define your evaluator\ngpt_4_context = ServiceContext.from_defaults(\n llm=OpenAI(temperature=0, model=\"gpt-4\"),\n)\nevaluator = PairwiseComparisonEvaluator(service_context=gpt_4_context)\n\n\n# download and install dependencies\nEvaluatorBenchmarkerPack = download_llama_pack(\n \"EvaluatorBenchmarkerPack\", \"./pack\"\n)\n\n# construction requires an evaluator and an eval_dataset\nevaluator_benchmarker_pack = EvaluatorBenchmarkerPack(\n evaluator=evaluator,\n eval_dataset=pairwise_dataset,\n show_progress=True,\n)\n\n# PERFORM EVALUATION\nbenchmark_df = evaluator_benchmarker_pack.run() # async arun() also supported\nprint(benchmark_df)\n```\n\n`Output:`\n\n```text\nnumber_examples 1689\ninconclusives 140\nties 379\nagreement_rate_with_ties 0.657844\nagreement_rate_without_ties 0.828205\n```\n\nNote that `evaluator_benchmarker_pack.run()` will also save the `benchmark_df` files in the same directory.\n\n```bash\n.\n\u2514\u2500\u2500 benchmark.csv\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "llama-index packs evaluator_benchmarker integration",
"version": "0.2.0",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "16641abd3dfffe7a51b1f984726467dea02c9bf9a9ec5980d0ed7960747e51c6",
"md5": "299009163b7b11f1d0d7cb75a5e2c1aa",
"sha256": "d2f5209742d3d3c02392461028dcc60aeffdd7ce7db27783d58882138de8cc06"
},
"downloads": -1,
"filename": "llama_index_packs_evaluator_benchmarker-0.2.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "299009163b7b11f1d0d7cb75a5e2c1aa",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.8.1",
"size": 4306,
"upload_time": "2024-08-22T16:50:59",
"upload_time_iso_8601": "2024-08-22T16:50:59.843721Z",
"url": "https://files.pythonhosted.org/packages/16/64/1abd3dfffe7a51b1f984726467dea02c9bf9a9ec5980d0ed7960747e51c6/llama_index_packs_evaluator_benchmarker-0.2.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f0a7e8c559f8c6a9ab31da2549eb032984eb7e8e20afd7a54d6dc7aa22311725",
"md5": "58c07e9585c384c198aef38087becbf1",
"sha256": "672940b62051256a2cb55155455df2b6da277f4aa44765f2cdba89d5ed095aec"
},
"downloads": -1,
"filename": "llama_index_packs_evaluator_benchmarker-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "58c07e9585c384c198aef38087becbf1",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.8.1",
"size": 3939,
"upload_time": "2024-08-22T16:51:01",
"upload_time_iso_8601": "2024-08-22T16:51:01.238164Z",
"url": "https://files.pythonhosted.org/packages/f0/a7/e8c559f8c6a9ab31da2549eb032984eb7e8e20afd7a54d6dc7aa22311725/llama_index_packs_evaluator_benchmarker-0.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-22 16:51:01",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "llama-index-packs-evaluator-benchmarker"
}