Name | lm-debug-eval-ui JSON |
Version |
0.0.6
JSON |
| download |
home_page | None |
Summary | LLM Application Debug/Eval UI on top of AIConfig |
upload_time | 2024-04-09 18:20:31 |
maintainer | None |
docs_url | None |
author | LastMile AI |
requires_python | >=3.10 |
license | None |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
LLM application debugger / evaluation UI built on top of [AIConfig](https://github.com/lastmile-ai/aiconfig)
## Quickstart
1. `pip3 install lm-debug-eval-ui`
2. Create an `evaluation_module.py` file containing the necessary functions to run evaluation (see Evaluation Module section below)
3. Create an `aiconfig_model_registry.py` parser module which registers the model parser to use for prompt iteration
4. Create an aiconfig file (e.g. `eval_prompt_template.aiconfig.yaml`) containing the prompts
to iterate on and to use within the application / eval code
5. Set the following variables in bash:
```
evaluation_module_path=<path_to_evaluation_module>
parsers_path=<path_to_model_parser_module>
aiconfig_path=<path_to_aiconfig file>
```
6. Run the eval/debug UI with:
```
aiconfig eval --aiconfig-path=$aiconfig_path --parsers-module-path=$parsers_path --evaluation-module-path=$evaluation_module_path
```
NOTE: By default, the script will look for `./evaluation_module.py` and `./aiconfig_model_registry.py` for the evaluation module and parsers module, so no need to pass those args if those files exist already.
# UI Overview
The UI is a single page web application with two tabs: "Evaluation" and "Prompt Iteration".
## Evaluation
The evaluation tab is where evaluation sets (think batch runs) are created and analyzed.
### Evaluation Sets
The first table shows the sets that have been created. Create a new evaluation set by clicking the 'Create Evaluation Set' button and stepping through the creation flow:
- Give the set a name to explain what is being evaluated
- Specify the path to the prompt config (aiconfig) to use for the evaluation set. The evaluation code should be hooked up (via python-aiconfig package) to resolve prompts from the config
**Data Selection**
Select the desired data (i.e. paper paths) from the 'Available Data' on the left and click the > to move to the 'Selected Data' section on the right. These will be the source data (papers) used by the evaluation (opgee_cli) runs for the evaluation set. Click 'Next Step' once desired data is selected.
**Specific Data Configuration**
For each of the selected data sources from the Data Selection step, optionally specify a ground truth file path to use for the evaluation. If no ground truth is specified, the raw evaluation results will be obtained without any comparison metrics to the ground truth.
**General Data Configuration**
Specify the general configuration to be used for every evaluation call (for all data sources). Each opgee run for the papers in the evaluation set will use the arguments specified in the general data configuration.
### Evaluation Set Results
Clicking a row in the Evaluation Sets table will load the table of results for that evaluation set. The results show all relevant metrics for each data source in the evaluation set. To start, these metrics are obtained from `eval_matrix.csv` and `eval_matrix_report.txt`.
### Evaluation Result Details
Clicking a row in the Evaluation Set Results table will load the table of details for the specific result. This will include the raw extracted values for the data source (paper), with cells colored based on TP/FP/TN/FN compared to the ground truth file.
## Prompt Iteration
The prompt iteration tab provides an editor for iterating on prompt templates to use in the application / evaluation. The editor opens the aiconfig file specified in `aiconfig eval` script call.
At the top of the page, select a data source (preprocessed paper) and specify the configuration of how it will be used for running each prompt. Other configuration settings (such as 'model') can be specified in the global model settings or the model settings section of each cell (cell-level overrides global).
We have created an ask_llm model parser and associated model parser registry file to use so that running a prompt will run the ask_llm script with that prompt text and arguments associated with the prompt context data and model settings.
Name a prompt cell and write the prompt in the input, then run it to see the ask_llm output. Iterate on the prompt text until it produces the desired output. The prompt can then be referenced in the application/evaluation code using AIConfig:
```
# TODO: Initialize AIConfig from the aiconfig somewhere in evaluation/application code:
aiconfig = AIConfigRuntime.load(<path_to_aiconfig>)
# ... In prompt template function in evaluation/application code
await aiconfig.resolve(prompt_name=<prompt_name>, parameters)
```
# Evaluation Module
The application runs on a flask server which is integrated with the evaluation module (default `evaluation_module.py`) for interfacing with the evaluation logic (`opgee` script). Each function in the evaluation module is associated with some aspect of the UI and determines how the data is retrieved (or created) with respect to the underlying system. The `evaluation_module.py` we have created will retrieve data from the existing `result` folder structure and create evaluation sets via async runs of the `opgee_cli` script
Raw data
{
"_id": null,
"home_page": null,
"name": "lm-debug-eval-ui",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": null,
"author": "LastMile AI",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/61/92/1fd6d1a9f724d18c02e2fdcae86a04903786ee66f7f2d4b9bf3ce8e45271/lm-debug-eval-ui-0.0.6.tar.gz",
"platform": null,
"description": "LLM application debugger / evaluation UI built on top of [AIConfig](https://github.com/lastmile-ai/aiconfig)\n\n\n## Quickstart\n\n1. `pip3 install lm-debug-eval-ui`\n2. Create an `evaluation_module.py` file containing the necessary functions to run evaluation (see Evaluation Module section below)\n3. Create an `aiconfig_model_registry.py` parser module which registers the model parser to use for prompt iteration\n4. Create an aiconfig file (e.g. `eval_prompt_template.aiconfig.yaml`) containing the prompts\nto iterate on and to use within the application / eval code\n5. Set the following variables in bash:\n```\nevaluation_module_path=<path_to_evaluation_module>\nparsers_path=<path_to_model_parser_module>\naiconfig_path=<path_to_aiconfig file>\n```\n6. Run the eval/debug UI with:\n```\naiconfig eval --aiconfig-path=$aiconfig_path --parsers-module-path=$parsers_path --evaluation-module-path=$evaluation_module_path\n```\n\nNOTE: By default, the script will look for `./evaluation_module.py` and `./aiconfig_model_registry.py` for the evaluation module and parsers module, so no need to pass those args if those files exist already.\n\n\n# UI Overview\nThe UI is a single page web application with two tabs: \"Evaluation\" and \"Prompt Iteration\".\n\n## Evaluation\nThe evaluation tab is where evaluation sets (think batch runs) are created and analyzed. \n\n### Evaluation Sets\nThe first table shows the sets that have been created. Create a new evaluation set by clicking the 'Create Evaluation Set' button and stepping through the creation flow:\n- Give the set a name to explain what is being evaluated\n- Specify the path to the prompt config (aiconfig) to use for the evaluation set. The evaluation code should be hooked up (via python-aiconfig package) to resolve prompts from the config\n\n**Data Selection**\nSelect the desired data (i.e. paper paths) from the 'Available Data' on the left and click the > to move to the 'Selected Data' section on the right. These will be the source data (papers) used by the evaluation (opgee_cli) runs for the evaluation set. Click 'Next Step' once desired data is selected.\n\n**Specific Data Configuration**\nFor each of the selected data sources from the Data Selection step, optionally specify a ground truth file path to use for the evaluation. If no ground truth is specified, the raw evaluation results will be obtained without any comparison metrics to the ground truth.\n\n**General Data Configuration**\nSpecify the general configuration to be used for every evaluation call (for all data sources). Each opgee run for the papers in the evaluation set will use the arguments specified in the general data configuration.\n\n### Evaluation Set Results\nClicking a row in the Evaluation Sets table will load the table of results for that evaluation set. The results show all relevant metrics for each data source in the evaluation set. To start, these metrics are obtained from `eval_matrix.csv` and `eval_matrix_report.txt`.\n\n### Evaluation Result Details\nClicking a row in the Evaluation Set Results table will load the table of details for the specific result. This will include the raw extracted values for the data source (paper), with cells colored based on TP/FP/TN/FN compared to the ground truth file.\n\n\n## Prompt Iteration\nThe prompt iteration tab provides an editor for iterating on prompt templates to use in the application / evaluation. The editor opens the aiconfig file specified in `aiconfig eval` script call. \n\nAt the top of the page, select a data source (preprocessed paper) and specify the configuration of how it will be used for running each prompt. Other configuration settings (such as 'model') can be specified in the global model settings or the model settings section of each cell (cell-level overrides global).\n\nWe have created an ask_llm model parser and associated model parser registry file to use so that running a prompt will run the ask_llm script with that prompt text and arguments associated with the prompt context data and model settings.\n\nName a prompt cell and write the prompt in the input, then run it to see the ask_llm output. Iterate on the prompt text until it produces the desired output. The prompt can then be referenced in the application/evaluation code using AIConfig:\n```\n# TODO: Initialize AIConfig from the aiconfig somewhere in evaluation/application code:\naiconfig = AIConfigRuntime.load(<path_to_aiconfig>)\n\n# ... In prompt template function in evaluation/application code\nawait aiconfig.resolve(prompt_name=<prompt_name>, parameters)\n```\n\n# Evaluation Module\nThe application runs on a flask server which is integrated with the evaluation module (default `evaluation_module.py`) for interfacing with the evaluation logic (`opgee` script). Each function in the evaluation module is associated with some aspect of the UI and determines how the data is retrieved (or created) with respect to the underlying system. The `evaluation_module.py` we have created will retrieve data from the existing `result` folder structure and create evaluation sets via async runs of the `opgee_cli` script\n",
"bugtrack_url": null,
"license": null,
"summary": "LLM Application Debug/Eval UI on top of AIConfig",
"version": "0.0.6",
"project_urls": {
"Homepage": "https://github.com/lastmile-ai/aiconfig"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "dd543c85b4e7f9bf9920c2b902d2094203566191681d70fe5aee331dc410d502",
"md5": "53a40effdb3301df0b37a54b1530c5c9",
"sha256": "cd64472eefd2f63cb49e24e8ae9d5064afc286026051149278855645c6eeae2f"
},
"downloads": -1,
"filename": "lm_debug_eval_ui-0.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "53a40effdb3301df0b37a54b1530c5c9",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 2063277,
"upload_time": "2024-04-09T18:20:27",
"upload_time_iso_8601": "2024-04-09T18:20:27.828441Z",
"url": "https://files.pythonhosted.org/packages/dd/54/3c85b4e7f9bf9920c2b902d2094203566191681d70fe5aee331dc410d502/lm_debug_eval_ui-0.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "61921fd6d1a9f724d18c02e2fdcae86a04903786ee66f7f2d4b9bf3ce8e45271",
"md5": "65c89cd50cb9079b6b59f450261fdd55",
"sha256": "525b53dc721dc873a829cf312937bce7500c8b4bc65cf9b4bfe41b4f7c4c3eb5"
},
"downloads": -1,
"filename": "lm-debug-eval-ui-0.0.6.tar.gz",
"has_sig": false,
"md5_digest": "65c89cd50cb9079b6b59f450261fdd55",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 2036189,
"upload_time": "2024-04-09T18:20:31",
"upload_time_iso_8601": "2024-04-09T18:20:31.039214Z",
"url": "https://files.pythonhosted.org/packages/61/92/1fd6d1a9f724d18c02e2fdcae86a04903786ee66f7f2d4b9bf3ce8e45271/lm-debug-eval-ui-0.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-09 18:20:31",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "lastmile-ai",
"github_project": "aiconfig",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "lm-debug-eval-ui"
}