Name | llama-index-packs-evaluator-benchmarker JSON |
Version |
0.3.0
JSON |
| download |
home_page | None |
Summary | llama-index packs evaluator_benchmarker integration |
upload_time | 2024-11-17 22:42:15 |
maintainer | nerdai |
docs_url | None |
author | Your Name |
requires_python | <4.0,>=3.9 |
license | MIT |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# Evaluator Benchmarker Pack
A pack for quick computation of benchmark results of your own LLM evaluator
on an Evaluation llama-dataset. Specifically, this pack supports benchmarking
an appropriate evaluator on the following llama-datasets:
- `LabelledEvaluatorDataset` for single-grading evaluations
- `LabelledPairwiseEvaluatorDataset` for pairwise-grading evaluations
These llama-datasets can be downloaed from [llama-hub](https://llamahub.ai).
## CLI Usage
You can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:
```bash
llamaindex-cli download-llamapack EvaluatorBenchmarkerPack --download-dir ./evaluator_benchmarker_pack
```
You can then inspect the files at `./evaluator_benchmarker_pack` and use them as a template for your own project!
## Code Usage
You can download the pack to the `./evaluator_benchmarker_pack` directory through python
code as well. The sample script below demonstrates how to construct `EvaluatorBenchmarkerPack`
using a `LabelledPairwiseEvaluatorDataset` downloaded from `llama-hub` and a
`PairwiseComparisonEvaluator` that uses GPT-4 as the LLM. Note though that this pack
can also be used on a `LabelledEvaluatorDataset` with a `BaseEvaluator` that performs
single-grading evaluation — in this case, the usage flow remains the same.
```python
from llama_index.core.llama_dataset import download_llama_dataset
from llama_index.core.llama_pack import download_llama_pack
from llama_index.core.evaluation import PairwiseComparisonEvaluator
from llama_index.llms.openai import OpenAI
from llama_index.core import ServiceContext
# download a LabelledRagDataset from llama-hub
pairwise_dataset = download_llama_dataset(
"MiniMtBenchHumanJudgementDataset", "./data"
)
# define your evaluator
gpt_4_context = ServiceContext.from_defaults(
llm=OpenAI(temperature=0, model="gpt-4"),
)
evaluator = PairwiseComparisonEvaluator(service_context=gpt_4_context)
# download and install dependencies
EvaluatorBenchmarkerPack = download_llama_pack(
"EvaluatorBenchmarkerPack", "./pack"
)
# construction requires an evaluator and an eval_dataset
evaluator_benchmarker_pack = EvaluatorBenchmarkerPack(
evaluator=evaluator,
eval_dataset=pairwise_dataset,
show_progress=True,
)
# PERFORM EVALUATION
benchmark_df = evaluator_benchmarker_pack.run() # async arun() also supported
print(benchmark_df)
```
`Output:`
```text
number_examples 1689
inconclusives 140
ties 379
agreement_rate_with_ties 0.657844
agreement_rate_without_ties 0.828205
```
Note that `evaluator_benchmarker_pack.run()` will also save the `benchmark_df` files in the same directory.
```bash
.
└── benchmark.csv
```
Raw data
{
"_id": null,
"home_page": null,
"name": "llama-index-packs-evaluator-benchmarker",
"maintainer": "nerdai",
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": null,
"author": "Your Name",
"author_email": "you@example.com",
"download_url": "https://files.pythonhosted.org/packages/62/05/8d14c4ba9ed6df9c75fb3d76c26bbdb27d11c621c61cfcf974605d7ab6e7/llama_index_packs_evaluator_benchmarker-0.3.0.tar.gz",
"platform": null,
"description": "# Evaluator Benchmarker Pack\n\nA pack for quick computation of benchmark results of your own LLM evaluator\non an Evaluation llama-dataset. Specifically, this pack supports benchmarking\nan appropriate evaluator on the following llama-datasets:\n\n- `LabelledEvaluatorDataset` for single-grading evaluations\n- `LabelledPairwiseEvaluatorDataset` for pairwise-grading evaluations\n\nThese llama-datasets can be downloaed from [llama-hub](https://llamahub.ai).\n\n## CLI Usage\n\nYou can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:\n\n```bash\nllamaindex-cli download-llamapack EvaluatorBenchmarkerPack --download-dir ./evaluator_benchmarker_pack\n```\n\nYou can then inspect the files at `./evaluator_benchmarker_pack` and use them as a template for your own project!\n\n## Code Usage\n\nYou can download the pack to the `./evaluator_benchmarker_pack` directory through python\ncode as well. The sample script below demonstrates how to construct `EvaluatorBenchmarkerPack`\nusing a `LabelledPairwiseEvaluatorDataset` downloaded from `llama-hub` and a\n`PairwiseComparisonEvaluator` that uses GPT-4 as the LLM. Note though that this pack\ncan also be used on a `LabelledEvaluatorDataset` with a `BaseEvaluator` that performs\nsingle-grading evaluation \u2014 in this case, the usage flow remains the same.\n\n```python\nfrom llama_index.core.llama_dataset import download_llama_dataset\nfrom llama_index.core.llama_pack import download_llama_pack\nfrom llama_index.core.evaluation import PairwiseComparisonEvaluator\nfrom llama_index.llms.openai import OpenAI\nfrom llama_index.core import ServiceContext\n\n# download a LabelledRagDataset from llama-hub\npairwise_dataset = download_llama_dataset(\n \"MiniMtBenchHumanJudgementDataset\", \"./data\"\n)\n\n# define your evaluator\ngpt_4_context = ServiceContext.from_defaults(\n llm=OpenAI(temperature=0, model=\"gpt-4\"),\n)\nevaluator = PairwiseComparisonEvaluator(service_context=gpt_4_context)\n\n\n# download and install dependencies\nEvaluatorBenchmarkerPack = download_llama_pack(\n \"EvaluatorBenchmarkerPack\", \"./pack\"\n)\n\n# construction requires an evaluator and an eval_dataset\nevaluator_benchmarker_pack = EvaluatorBenchmarkerPack(\n evaluator=evaluator,\n eval_dataset=pairwise_dataset,\n show_progress=True,\n)\n\n# PERFORM EVALUATION\nbenchmark_df = evaluator_benchmarker_pack.run() # async arun() also supported\nprint(benchmark_df)\n```\n\n`Output:`\n\n```text\nnumber_examples 1689\ninconclusives 140\nties 379\nagreement_rate_with_ties 0.657844\nagreement_rate_without_ties 0.828205\n```\n\nNote that `evaluator_benchmarker_pack.run()` will also save the `benchmark_df` files in the same directory.\n\n```bash\n.\n\u2514\u2500\u2500 benchmark.csv\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "llama-index packs evaluator_benchmarker integration",
"version": "0.3.0",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f4fdfe4f0199dbc028034cbffd930083627b2790e2527c5c5b7079952440cf61",
"md5": "c6b8d5b52054bbcc72958a278b01ac21",
"sha256": "c729fd578eb4ec36dc6ac3b21c835c20a29341d15fe3ffe89d41bffc70bfee10"
},
"downloads": -1,
"filename": "llama_index_packs_evaluator_benchmarker-0.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c6b8d5b52054bbcc72958a278b01ac21",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 4303,
"upload_time": "2024-11-17T22:42:13",
"upload_time_iso_8601": "2024-11-17T22:42:13.923332Z",
"url": "https://files.pythonhosted.org/packages/f4/fd/fe4f0199dbc028034cbffd930083627b2790e2527c5c5b7079952440cf61/llama_index_packs_evaluator_benchmarker-0.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "62058d14c4ba9ed6df9c75fb3d76c26bbdb27d11c621c61cfcf974605d7ab6e7",
"md5": "9f24dea3b9fe801122379ada79991e5c",
"sha256": "bdc246bc5bef3ce47ce6d7018535ab73f5472df1925749606de9189b6e6ddf04"
},
"downloads": -1,
"filename": "llama_index_packs_evaluator_benchmarker-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "9f24dea3b9fe801122379ada79991e5c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 3920,
"upload_time": "2024-11-17T22:42:15",
"upload_time_iso_8601": "2024-11-17T22:42:15.069135Z",
"url": "https://files.pythonhosted.org/packages/62/05/8d14c4ba9ed6df9c75fb3d76c26bbdb27d11c621c61cfcf974605d7ab6e7/llama_index_packs_evaluator_benchmarker-0.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-17 22:42:15",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "llama-index-packs-evaluator-benchmarker"
}