# ModelGauge
Goal: Make it easy to automatically and uniformly measure the behavior of many AI Systems.
> [!WARNING]
> This repo is still in **beta** with a planned full release in Fall 2024. Until then we reserve the right to make backward incompatible changes as needed.
ModelGauge is an evolution of [crfm-helm](https://github.com/stanford-crfm/helm/), intended to meet their existing use cases as well as those needed by the [MLCommons AI Safety](https://mlcommons.org/working-groups/ai-safety/ai-safety/) project.
## Summary
ModelGauge is a library that provides a set of interfaces for Tests and Systems Under Test (SUTs) such that:
* Each Test can be applied to all SUTs with the required underlying capabilities (e.g. does it take text input?)
* Adding new Tests or SUTs can be done without modifications to the core libraries or support from ModelGauge authors.
Currently ModelGauge is targeted at LLMs and [single turn prompt response Tests](docs/prompt_response_tests.md), with Tests scored by automated Annotators (e.g. LlamaGuard). However, we expect to extend the library to cover more Test, SUT, and Annotation types as we move toward full release.
## Docs
* [Developer Quick Start](docs/dev_quick_start.md)
* [Tutorial for how to create a Test](docs/tutorial_tests.md)
* [Tutorial for how to create a System Under Test (SUT)](docs/tutorial_suts.md)
* How we use [plugins](docs/plugins.md) to connect it all together.
Raw data
{
"_id": null,
"home_page": "https://github.com/mlcommons/modelgauge",
"name": "modelgauge",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": "AI, GenAI, LLM, NLP, evaluate, measure, quality, testing, prompt, safety, compare, artificial, intelligence, Large, Language, Models",
"author": "MLCommons AI Safety",
"author_email": "ai-safety-engineering@mlcommons.org",
"download_url": "https://files.pythonhosted.org/packages/24/21/f072f8a00f8fa5ca9f6b4c72430170c4884fb580edfe098ebcba755c10b4/modelgauge-0.5.1.tar.gz",
"platform": null,
"description": "# ModelGauge\n\nGoal: Make it easy to automatically and uniformly measure the behavior of many AI Systems.\n\n> [!WARNING]\n> This repo is still in **beta** with a planned full release in Fall 2024. Until then we reserve the right to make backward incompatible changes as needed.\n\nModelGauge is an evolution of [crfm-helm](https://github.com/stanford-crfm/helm/), intended to meet their existing use cases as well as those needed by the [MLCommons AI Safety](https://mlcommons.org/working-groups/ai-safety/ai-safety/) project.\n\n## Summary\n\nModelGauge is a library that provides a set of interfaces for Tests and Systems Under Test (SUTs) such that:\n\n* Each Test can be applied to all SUTs with the required underlying capabilities (e.g. does it take text input?)\n* Adding new Tests or SUTs can be done without modifications to the core libraries or support from ModelGauge authors.\n\nCurrently ModelGauge is targeted at LLMs and [single turn prompt response Tests](docs/prompt_response_tests.md), with Tests scored by automated Annotators (e.g. LlamaGuard). However, we expect to extend the library to cover more Test, SUT, and Annotation types as we move toward full release.\n\n\n## Docs\n\n* [Developer Quick Start](docs/dev_quick_start.md)\n* [Tutorial for how to create a Test](docs/tutorial_tests.md)\n* [Tutorial for how to create a System Under Test (SUT)](docs/tutorial_suts.md)\n* How we use [plugins](docs/plugins.md) to connect it all together.\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Automatically and uniformly measure the behavior of many AI Systems.",
"version": "0.5.1",
"project_urls": {
"Homepage": "https://github.com/mlcommons/modelgauge",
"Repository": "https://github.com/mlcommons/modelgauge"
},
"split_keywords": [
"ai",
" genai",
" llm",
" nlp",
" evaluate",
" measure",
" quality",
" testing",
" prompt",
" safety",
" compare",
" artificial",
" intelligence",
" large",
" language",
" models"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "26cb2f25962f3cb0455c229ae13583a82f4715e083b2f4775d4091bb90026a2d",
"md5": "23c4ddebd525c71433bdfcc8b53ce758",
"sha256": "c2d8a35f9156b0baca19d3fb37795e1bbb073bcb14dcaea4b045192dcd490afc"
},
"downloads": -1,
"filename": "modelgauge-0.5.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "23c4ddebd525c71433bdfcc8b53ce758",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 44329,
"upload_time": "2024-04-27T02:32:50",
"upload_time_iso_8601": "2024-04-27T02:32:50.214398Z",
"url": "https://files.pythonhosted.org/packages/26/cb/2f25962f3cb0455c229ae13583a82f4715e083b2f4775d4091bb90026a2d/modelgauge-0.5.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2421f072f8a00f8fa5ca9f6b4c72430170c4884fb580edfe098ebcba755c10b4",
"md5": "0fc07bf69863488fdc5e74c9e3ec5deb",
"sha256": "0165aabe059dcb1a9e4bdac9b4de85c1e4264b0e8204d66fd1af307d9b60d43d"
},
"downloads": -1,
"filename": "modelgauge-0.5.1.tar.gz",
"has_sig": false,
"md5_digest": "0fc07bf69863488fdc5e74c9e3ec5deb",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 32482,
"upload_time": "2024-04-27T02:32:52",
"upload_time_iso_8601": "2024-04-27T02:32:52.004153Z",
"url": "https://files.pythonhosted.org/packages/24/21/f072f8a00f8fa5ca9f6b4c72430170c4884fb580edfe098ebcba755c10b4/modelgauge-0.5.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-27 02:32:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mlcommons",
"github_project": "modelgauge",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "modelgauge"
}