Name | lastmile-auto-eval JSON |
Version |
0.0.10
JSON |
| download |
home_page | None |
Summary | An API for using metric models (either provided by default or fine-tuned yourself) to evaluate LLMs. |
upload_time | 2024-09-04 22:57:49 |
maintainer | None |
docs_url | None |
author | LastMile AI |
requires_python | >=3.10 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
A library for using evaluation models (either default ones provided by LastMile or your own that are fine-tuned) to evaluate LLMs.
Evaluations are run on dataframes that include any combination of `input`, `ground_truth`, and `output` columns. At least one of these columns must be defined and all values must be strings.
## Synchronous Requests
You can use `evaluate()` and `stream_evaluate()` for non-streaming and streaming results.
## Asynchronous Requests
You can use `submit_job()` which will return a job_id string. Ideal for evaluations that don't require immediate responses. With a job_id (ex: `cm0c4bxwo002kpe01h5j4zj2y`) you can:
1. Read job info: `<host_url>/api/auto_eval_job/read?id=<job_id>`
Ex: https://lastmileai.dev/api/auto_eval_job/read?id=cm0c4bxwo002kpe01h5j4zj2y
2. Retrieve results: `<host_url>/api/auto_eval_job/get_results?id=<job_id>`
Ex: https://lastmileai.dev/api/auto_eval_job/get_results?id=cm0c4bxwo002kpe01h5j4zj2y
## Example Usage
```python
from lastmile_auto_eval import (
EvaluationMetric,
EvaluationResult,
evaluate,
stream_evaluate,
submit_job,
)
import pandas as pd
import json
from typing import Any, Generator
queries = ["what color is the sky?", "what color is the sky?"]
correct_answer = "the sky is blue"
incorrect_answer = "the sky is red"
ground_truth_values = [correct_answer, correct_answer]
responses = [correct_answer, incorrect_answer]
df = pd.DataFrame(
{
"input": queries,
"ground_truth": ground_truth_values,
"output": responses,
}
)
# Non-streaming
result: EvaluationResult = evaluate(
dataframe=df,
metrics=[
EvaluationMetric.P_FAITHFUL,
EvaluationMetric.SUMMARIZATION,
],
)
print(json.dumps(result, indent=2))
# Response will look something like this:
"""
{
"p_faithful": [
0.999255359172821,
0.00011296303273411468
],
"summarization": [
0.9995583891868591,
6.86283819959499e-05
]
}
"""
# Response-streaming
result_iterator: Generator[EvaluationResult, Any, Any] = (
stream_evaluate(
dataframe=df,
metrics=[
EvaluationMetric.P_FAITHFUL,
EvaluationMetric.SUMMARIZATION,
],
)
)
for result_chunk in result_iterator:
print(json.dumps(result_chunk, indent=2))
# Bidirectional-streaming
def gen_df_stream(input: list[str], gt: list[str], output: list[str]):
for i in range(len(input)):
df_chunk = pd.DataFrame(
{
"input": [input[i]],
"ground_truth": [gt[i]],
"output": [output[i]],
}
)
yield df_chunk
df_iterator = gen_df_stream(
input=queries, gt=ground_truth_values, output=responses
)
result_iterator: Generator[EvaluationResult, Any, Any] = (
stream_evaluate(
dataframe=df_iterator,
metrics=[
EvaluationMetric.P_FAITHFUL,
EvaluationMetric.SUMMARIZATION,
],
)
)
for result_chunk in result_iterator:
print(json.dumps(result_chunk, indent=2))
# Async request
job_id = submit_job(
df,
metrics=[
EvaluationMetric.P_FAITHFUL,
EvaluationMetric.SUMMARIZATION,
],
)
print(f"{job_id=}")
```
Raw data
{
"_id": null,
"home_page": null,
"name": "lastmile-auto-eval",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": null,
"author": "LastMile AI",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/5b/01/a72161fe61040057703d658b7d3e957b0383430a9c12286b9b082c13a678/lastmile_auto_eval-0.0.10.tar.gz",
"platform": null,
"description": "A library for using evaluation models (either default ones provided by LastMile or your own that are fine-tuned) to evaluate LLMs.\n\nEvaluations are run on dataframes that include any combination of `input`, `ground_truth`, and `output` columns. At least one of these columns must be defined and all values must be strings.\n\n## Synchronous Requests\n\nYou can use `evaluate()` and `stream_evaluate()` for non-streaming and streaming results.\n\n## Asynchronous Requests\n\nYou can use `submit_job()` which will return a job_id string. Ideal for evaluations that don't require immediate responses. With a job_id (ex: `cm0c4bxwo002kpe01h5j4zj2y`) you can:\n\n1. Read job info: `<host_url>/api/auto_eval_job/read?id=<job_id>`\n\n Ex: https://lastmileai.dev/api/auto_eval_job/read?id=cm0c4bxwo002kpe01h5j4zj2y\n\n2. Retrieve results: `<host_url>/api/auto_eval_job/get_results?id=<job_id>`\n\n Ex: https://lastmileai.dev/api/auto_eval_job/get_results?id=cm0c4bxwo002kpe01h5j4zj2y\n\n## Example Usage\n\n```python\n\nfrom lastmile_auto_eval import (\n EvaluationMetric,\n EvaluationResult,\n evaluate,\n stream_evaluate,\n submit_job,\n)\nimport pandas as pd\nimport json\nfrom typing import Any, Generator\n\nqueries = [\"what color is the sky?\", \"what color is the sky?\"]\ncorrect_answer = \"the sky is blue\"\nincorrect_answer = \"the sky is red\"\nground_truth_values = [correct_answer, correct_answer]\nresponses = [correct_answer, incorrect_answer]\n\ndf = pd.DataFrame(\n {\n \"input\": queries,\n \"ground_truth\": ground_truth_values,\n \"output\": responses,\n }\n)\n\n# Non-streaming\nresult: EvaluationResult = evaluate(\n dataframe=df,\n metrics=[\n EvaluationMetric.P_FAITHFUL,\n EvaluationMetric.SUMMARIZATION,\n ],\n)\nprint(json.dumps(result, indent=2))\n\n# Response will look something like this:\n\"\"\"\n{\n \"p_faithful\": [\n 0.999255359172821,\n 0.00011296303273411468\n ],\n \"summarization\": [\n 0.9995583891868591,\n 6.86283819959499e-05\n ]\n}\n\"\"\"\n\n# Response-streaming\nresult_iterator: Generator[EvaluationResult, Any, Any] = (\n stream_evaluate(\n dataframe=df,\n metrics=[\n EvaluationMetric.P_FAITHFUL,\n EvaluationMetric.SUMMARIZATION,\n ],\n )\n)\nfor result_chunk in result_iterator:\n print(json.dumps(result_chunk, indent=2))\n\n# Bidirectional-streaming\ndef gen_df_stream(input: list[str], gt: list[str], output: list[str]):\n for i in range(len(input)):\n df_chunk = pd.DataFrame(\n {\n \"input\": [input[i]],\n \"ground_truth\": [gt[i]],\n \"output\": [output[i]],\n }\n )\n yield df_chunk\n\ndf_iterator = gen_df_stream(\n input=queries, gt=ground_truth_values, output=responses\n)\nresult_iterator: Generator[EvaluationResult, Any, Any] = (\n stream_evaluate(\n dataframe=df_iterator,\n metrics=[\n EvaluationMetric.P_FAITHFUL,\n EvaluationMetric.SUMMARIZATION,\n ],\n )\n)\nfor result_chunk in result_iterator:\n print(json.dumps(result_chunk, indent=2))\n\n# Async request\njob_id = submit_job(\n df,\n metrics=[\n EvaluationMetric.P_FAITHFUL,\n EvaluationMetric.SUMMARIZATION,\n ],\n)\nprint(f\"{job_id=}\")\n```\n",
"bugtrack_url": null,
"license": null,
"summary": "An API for using metric models (either provided by default or fine-tuned yourself) to evaluate LLMs.",
"version": "0.0.10",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5a804ed310973c7c188cdd9be8eb073d5adf07250c6d020c38172fdb508c923a",
"md5": "a725c0a189a3b398d721276cf90c539f",
"sha256": "027740c9670323e69092a7d674c17b1ddaf96dbf9df6687a489bcc3b11ef5122"
},
"downloads": -1,
"filename": "lastmile_auto_eval-0.0.10-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a725c0a189a3b398d721276cf90c539f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 24477,
"upload_time": "2024-09-04T22:57:48",
"upload_time_iso_8601": "2024-09-04T22:57:48.908067Z",
"url": "https://files.pythonhosted.org/packages/5a/80/4ed310973c7c188cdd9be8eb073d5adf07250c6d020c38172fdb508c923a/lastmile_auto_eval-0.0.10-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "5b01a72161fe61040057703d658b7d3e957b0383430a9c12286b9b082c13a678",
"md5": "f7dfd0ad8dfe1bb20d3be35883b0d07c",
"sha256": "47ac3d28647bb8f57d0dfc1c1a5c6a1601a9d14666bab38040ade7a3278ccacd"
},
"downloads": -1,
"filename": "lastmile_auto_eval-0.0.10.tar.gz",
"has_sig": false,
"md5_digest": "f7dfd0ad8dfe1bb20d3be35883b0d07c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 16711,
"upload_time": "2024-09-04T22:57:49",
"upload_time_iso_8601": "2024-09-04T22:57:49.837648Z",
"url": "https://files.pythonhosted.org/packages/5b/01/a72161fe61040057703d658b7d3e957b0383430a9c12286b9b082c13a678/lastmile_auto_eval-0.0.10.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-04 22:57:49",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "lastmile-auto-eval"
}