lastmile-auto-eval


Namelastmile-auto-eval JSON
Version 0.0.10 PyPI version JSON
download
home_pageNone
SummaryAn API for using metric models (either provided by default or fine-tuned yourself) to evaluate LLMs.
upload_time2024-09-04 22:57:49
maintainerNone
docs_urlNone
authorLastMile AI
requires_python>=3.10
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            A library for using evaluation models (either default ones provided by LastMile or your own that are fine-tuned) to evaluate LLMs.

Evaluations are run on dataframes that include any combination of `input`, `ground_truth`, and `output` columns. At least one of these columns must be defined and all values must be strings.

## Synchronous Requests

You can use `evaluate()` and `stream_evaluate()` for non-streaming and streaming results.

## Asynchronous Requests

You can use `submit_job()` which will return a job_id string. Ideal for evaluations that don't require immediate responses. With a job_id (ex: `cm0c4bxwo002kpe01h5j4zj2y`) you can:

1. Read job info: `<host_url>/api/auto_eval_job/read?id=<job_id>`

   Ex: https://lastmileai.dev/api/auto_eval_job/read?id=cm0c4bxwo002kpe01h5j4zj2y

2. Retrieve results: `<host_url>/api/auto_eval_job/get_results?id=<job_id>`

   Ex: https://lastmileai.dev/api/auto_eval_job/get_results?id=cm0c4bxwo002kpe01h5j4zj2y

## Example Usage

```python

from lastmile_auto_eval import (
    EvaluationMetric,
    EvaluationResult,
    evaluate,
    stream_evaluate,
    submit_job,
)
import pandas as pd
import json
from typing import Any, Generator

queries = ["what color is the sky?", "what color is the sky?"]
correct_answer = "the sky is blue"
incorrect_answer = "the sky is red"
ground_truth_values = [correct_answer, correct_answer]
responses = [correct_answer, incorrect_answer]

df = pd.DataFrame(
    {
        "input": queries,
        "ground_truth": ground_truth_values,
        "output": responses,
    }
)

# Non-streaming
result: EvaluationResult = evaluate(
    dataframe=df,
    metrics=[
        EvaluationMetric.P_FAITHFUL,
        EvaluationMetric.SUMMARIZATION,
    ],
)
print(json.dumps(result, indent=2))

# Response will look something like this:
"""
{
  "p_faithful": [
    0.999255359172821,
    0.00011296303273411468
  ],
  "summarization": [
    0.9995583891868591,
    6.86283819959499e-05
  ]
}
"""

# Response-streaming
result_iterator: Generator[EvaluationResult, Any, Any] = (
    stream_evaluate(
        dataframe=df,
        metrics=[
            EvaluationMetric.P_FAITHFUL,
            EvaluationMetric.SUMMARIZATION,
        ],
    )
)
for result_chunk in result_iterator:
    print(json.dumps(result_chunk, indent=2))

# Bidirectional-streaming
def gen_df_stream(input: list[str], gt: list[str], output: list[str]):
    for i in range(len(input)):
        df_chunk = pd.DataFrame(
            {
                "input": [input[i]],
                "ground_truth": [gt[i]],
                "output": [output[i]],
            }
        )
        yield df_chunk

df_iterator = gen_df_stream(
    input=queries, gt=ground_truth_values, output=responses
)
result_iterator: Generator[EvaluationResult, Any, Any] = (
    stream_evaluate(
        dataframe=df_iterator,
        metrics=[
            EvaluationMetric.P_FAITHFUL,
            EvaluationMetric.SUMMARIZATION,
        ],
    )
)
for result_chunk in result_iterator:
    print(json.dumps(result_chunk, indent=2))

# Async request
job_id = submit_job(
    df,
    metrics=[
        EvaluationMetric.P_FAITHFUL,
        EvaluationMetric.SUMMARIZATION,
    ],
)
print(f"{job_id=}")
```

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "lastmile-auto-eval",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "LastMile AI",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/5b/01/a72161fe61040057703d658b7d3e957b0383430a9c12286b9b082c13a678/lastmile_auto_eval-0.0.10.tar.gz",
    "platform": null,
    "description": "A library for using evaluation models (either default ones provided by LastMile or your own that are fine-tuned) to evaluate LLMs.\n\nEvaluations are run on dataframes that include any combination of `input`, `ground_truth`, and `output` columns. At least one of these columns must be defined and all values must be strings.\n\n## Synchronous Requests\n\nYou can use `evaluate()` and `stream_evaluate()` for non-streaming and streaming results.\n\n## Asynchronous Requests\n\nYou can use `submit_job()` which will return a job_id string. Ideal for evaluations that don't require immediate responses. With a job_id (ex: `cm0c4bxwo002kpe01h5j4zj2y`) you can:\n\n1. Read job info: `<host_url>/api/auto_eval_job/read?id=<job_id>`\n\n   Ex: https://lastmileai.dev/api/auto_eval_job/read?id=cm0c4bxwo002kpe01h5j4zj2y\n\n2. Retrieve results: `<host_url>/api/auto_eval_job/get_results?id=<job_id>`\n\n   Ex: https://lastmileai.dev/api/auto_eval_job/get_results?id=cm0c4bxwo002kpe01h5j4zj2y\n\n## Example Usage\n\n```python\n\nfrom lastmile_auto_eval import (\n    EvaluationMetric,\n    EvaluationResult,\n    evaluate,\n    stream_evaluate,\n    submit_job,\n)\nimport pandas as pd\nimport json\nfrom typing import Any, Generator\n\nqueries = [\"what color is the sky?\", \"what color is the sky?\"]\ncorrect_answer = \"the sky is blue\"\nincorrect_answer = \"the sky is red\"\nground_truth_values = [correct_answer, correct_answer]\nresponses = [correct_answer, incorrect_answer]\n\ndf = pd.DataFrame(\n    {\n        \"input\": queries,\n        \"ground_truth\": ground_truth_values,\n        \"output\": responses,\n    }\n)\n\n# Non-streaming\nresult: EvaluationResult = evaluate(\n    dataframe=df,\n    metrics=[\n        EvaluationMetric.P_FAITHFUL,\n        EvaluationMetric.SUMMARIZATION,\n    ],\n)\nprint(json.dumps(result, indent=2))\n\n# Response will look something like this:\n\"\"\"\n{\n  \"p_faithful\": [\n    0.999255359172821,\n    0.00011296303273411468\n  ],\n  \"summarization\": [\n    0.9995583891868591,\n    6.86283819959499e-05\n  ]\n}\n\"\"\"\n\n# Response-streaming\nresult_iterator: Generator[EvaluationResult, Any, Any] = (\n    stream_evaluate(\n        dataframe=df,\n        metrics=[\n            EvaluationMetric.P_FAITHFUL,\n            EvaluationMetric.SUMMARIZATION,\n        ],\n    )\n)\nfor result_chunk in result_iterator:\n    print(json.dumps(result_chunk, indent=2))\n\n# Bidirectional-streaming\ndef gen_df_stream(input: list[str], gt: list[str], output: list[str]):\n    for i in range(len(input)):\n        df_chunk = pd.DataFrame(\n            {\n                \"input\": [input[i]],\n                \"ground_truth\": [gt[i]],\n                \"output\": [output[i]],\n            }\n        )\n        yield df_chunk\n\ndf_iterator = gen_df_stream(\n    input=queries, gt=ground_truth_values, output=responses\n)\nresult_iterator: Generator[EvaluationResult, Any, Any] = (\n    stream_evaluate(\n        dataframe=df_iterator,\n        metrics=[\n            EvaluationMetric.P_FAITHFUL,\n            EvaluationMetric.SUMMARIZATION,\n        ],\n    )\n)\nfor result_chunk in result_iterator:\n    print(json.dumps(result_chunk, indent=2))\n\n# Async request\njob_id = submit_job(\n    df,\n    metrics=[\n        EvaluationMetric.P_FAITHFUL,\n        EvaluationMetric.SUMMARIZATION,\n    ],\n)\nprint(f\"{job_id=}\")\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "An API for using metric models (either provided by default or fine-tuned yourself) to evaluate LLMs.",
    "version": "0.0.10",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5a804ed310973c7c188cdd9be8eb073d5adf07250c6d020c38172fdb508c923a",
                "md5": "a725c0a189a3b398d721276cf90c539f",
                "sha256": "027740c9670323e69092a7d674c17b1ddaf96dbf9df6687a489bcc3b11ef5122"
            },
            "downloads": -1,
            "filename": "lastmile_auto_eval-0.0.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a725c0a189a3b398d721276cf90c539f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 24477,
            "upload_time": "2024-09-04T22:57:48",
            "upload_time_iso_8601": "2024-09-04T22:57:48.908067Z",
            "url": "https://files.pythonhosted.org/packages/5a/80/4ed310973c7c188cdd9be8eb073d5adf07250c6d020c38172fdb508c923a/lastmile_auto_eval-0.0.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5b01a72161fe61040057703d658b7d3e957b0383430a9c12286b9b082c13a678",
                "md5": "f7dfd0ad8dfe1bb20d3be35883b0d07c",
                "sha256": "47ac3d28647bb8f57d0dfc1c1a5c6a1601a9d14666bab38040ade7a3278ccacd"
            },
            "downloads": -1,
            "filename": "lastmile_auto_eval-0.0.10.tar.gz",
            "has_sig": false,
            "md5_digest": "f7dfd0ad8dfe1bb20d3be35883b0d07c",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 16711,
            "upload_time": "2024-09-04T22:57:49",
            "upload_time_iso_8601": "2024-09-04T22:57:49.837648Z",
            "url": "https://files.pythonhosted.org/packages/5b/01/a72161fe61040057703d658b7d3e957b0383430a9c12286b9b082c13a678/lastmile_auto_eval-0.0.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-04 22:57:49",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "lastmile-auto-eval"
}
        
Elapsed time: 1.95592s