# ModelGauge
Goal: Make it easy to automatically and uniformly measure the behavior of many AI Systems.
> [!WARNING]
> This repo is still in **beta** with a planned full release in Fall 2024. Until then we reserve the right to make backward incompatible changes as needed.
ModelGauge is an evolution of [crfm-helm](https://github.com/stanford-crfm/helm/), intended to meet their existing use cases as well as those needed by the [MLCommons AI Safety](https://mlcommons.org/working-groups/ai-safety/ai-safety/) project.
## Summary
ModelGauge is a library that provides a set of interfaces for Tests and Systems Under Test (SUTs) such that:
* Each Test can be applied to all SUTs with the required underlying capabilities (e.g. does it take text input?)
* Adding new Tests or SUTs can be done without modifications to the core libraries or support from ModelGauge authors.
Currently ModelGauge is targeted at LLMs and [single turn prompt response Tests](docs/prompt_response_tests.md), with Tests scored by automated Annotators (e.g. LlamaGuard). However, we expect to extend the library to cover more Test, SUT, and Annotation types as we move toward full release.
## Docs
* [Developer Quick Start](docs/dev_quick_start.md)
* [Tutorial for how to create a Test](docs/tutorial_tests.md)
* [Tutorial for how to create a System Under Test (SUT)](docs/tutorial_suts.md)
* How we use [plugins](docs/plugins.md) to connect it all together.
Raw data
{
"_id": null,
"home_page": "https://github.com/mlcommons/modelgauge",
"name": "modelgauge",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": "AI, GenAI, LLM, NLP, evaluate, measure, quality, testing, prompt, safety, compare, artificial, intelligence, Large, Language, Models",
"author": "MLCommons AI Safety",
"author_email": "ai-safety-engineering@mlcommons.org",
"download_url": "https://files.pythonhosted.org/packages/51/79/6892bea160dda36c74bbe8c4275db4351df7f3d3469e98e1abdecbbbf9fe/modelgauge-0.6.3.tar.gz",
"platform": null,
"description": "# ModelGauge\n\nGoal: Make it easy to automatically and uniformly measure the behavior of many AI Systems.\n\n> [!WARNING]\n> This repo is still in **beta** with a planned full release in Fall 2024. Until then we reserve the right to make backward incompatible changes as needed.\n\nModelGauge is an evolution of [crfm-helm](https://github.com/stanford-crfm/helm/), intended to meet their existing use cases as well as those needed by the [MLCommons AI Safety](https://mlcommons.org/working-groups/ai-safety/ai-safety/) project.\n\n## Summary\n\nModelGauge is a library that provides a set of interfaces for Tests and Systems Under Test (SUTs) such that:\n\n* Each Test can be applied to all SUTs with the required underlying capabilities (e.g. does it take text input?)\n* Adding new Tests or SUTs can be done without modifications to the core libraries or support from ModelGauge authors.\n\nCurrently ModelGauge is targeted at LLMs and [single turn prompt response Tests](docs/prompt_response_tests.md), with Tests scored by automated Annotators (e.g. LlamaGuard). However, we expect to extend the library to cover more Test, SUT, and Annotation types as we move toward full release.\n\n\n## Docs\n\n* [Developer Quick Start](docs/dev_quick_start.md)\n* [Tutorial for how to create a Test](docs/tutorial_tests.md)\n* [Tutorial for how to create a System Under Test (SUT)](docs/tutorial_suts.md)\n* How we use [plugins](docs/plugins.md) to connect it all together.\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Automatically and uniformly measure the behavior of many AI Systems.",
"version": "0.6.3",
"project_urls": {
"Homepage": "https://github.com/mlcommons/modelgauge",
"Repository": "https://github.com/mlcommons/modelgauge"
},
"split_keywords": [
"ai",
" genai",
" llm",
" nlp",
" evaluate",
" measure",
" quality",
" testing",
" prompt",
" safety",
" compare",
" artificial",
" intelligence",
" large",
" language",
" models"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "61d2dccef44f5399c0ade89ecf319e25ef6f4e9dbec5c71bf84e6d1eae214d84",
"md5": "d8787fd74768ff78060ffa6c1e302a94",
"sha256": "a7317b1a8d39221b1ea8455cdb49c895959e57890a0254a26cc1e0ad03ad4344"
},
"downloads": -1,
"filename": "modelgauge-0.6.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "d8787fd74768ff78060ffa6c1e302a94",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 72503,
"upload_time": "2024-09-13T00:19:49",
"upload_time_iso_8601": "2024-09-13T00:19:49.900463Z",
"url": "https://files.pythonhosted.org/packages/61/d2/dccef44f5399c0ade89ecf319e25ef6f4e9dbec5c71bf84e6d1eae214d84/modelgauge-0.6.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "51796892bea160dda36c74bbe8c4275db4351df7f3d3469e98e1abdecbbbf9fe",
"md5": "1d556c642d2e0630335cf1459108f079",
"sha256": "181ad1f691e5d3bdd3b1de519919ec48da9618cdd3eaebd38d4b655af9391e8b"
},
"downloads": -1,
"filename": "modelgauge-0.6.3.tar.gz",
"has_sig": false,
"md5_digest": "1d556c642d2e0630335cf1459108f079",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 55527,
"upload_time": "2024-09-13T00:19:51",
"upload_time_iso_8601": "2024-09-13T00:19:51.796304Z",
"url": "https://files.pythonhosted.org/packages/51/79/6892bea160dda36c74bbe8c4275db4351df7f3d3469e98e1abdecbbbf9fe/modelgauge-0.6.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-13 00:19:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "mlcommons",
"github_project": "modelgauge",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "modelgauge"
}