modelgauge

Name	modelgauge JSON
Version	0.6.3 JSON
	download
home_page	https://github.com/mlcommons/modelgauge
Summary	Automatically and uniformly measure the behavior of many AI Systems.
upload_time	2024-09-13 00:19:51
maintainer	None
docs_url	None
author	MLCommons AI Safety
requires_python	<4.0,>=3.10
license	Apache-2.0
keywords	ai genai llm nlp evaluate measure quality testing prompt safety compare artificial intelligence large language models
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # ModelGauge

Goal: Make it easy to automatically and uniformly measure the behavior of many AI Systems.

> [!WARNING]
> This repo is still in **beta** with a planned full release in Fall 2024. Until then we reserve the right to make backward incompatible changes as needed.

ModelGauge is an evolution of [crfm-helm](https://github.com/stanford-crfm/helm/), intended to meet their existing use cases as well as those needed by the [MLCommons AI Safety](https://mlcommons.org/working-groups/ai-safety/ai-safety/) project.

## Summary

ModelGauge is a library that provides a set of interfaces for Tests and Systems Under Test (SUTs) such that:

* Each Test can be applied to all SUTs with the required underlying capabilities (e.g. does it take text input?)
* Adding new Tests or SUTs can be done without modifications to the core libraries or support from ModelGauge authors.

Currently ModelGauge is targeted at LLMs and [single turn prompt response Tests](docs/prompt_response_tests.md), with Tests scored by automated Annotators (e.g. LlamaGuard). However, we expect to extend the library to cover more Test, SUT, and Annotation types as we move toward full release.


## Docs

* [Developer Quick Start](docs/dev_quick_start.md)
* [Tutorial for how to create a Test](docs/tutorial_tests.md)
* [Tutorial for how to create a System Under Test (SUT)](docs/tutorial_suts.md)
* How we use [plugins](docs/plugins.md) to connect it all together.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mlcommons/modelgauge",
    "name": "modelgauge",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "AI, GenAI, LLM, NLP, evaluate, measure, quality, testing, prompt, safety, compare, artificial, intelligence, Large, Language, Models",
    "author": "MLCommons AI Safety",
    "author_email": "ai-safety-engineering@mlcommons.org",
    "download_url": "https://files.pythonhosted.org/packages/51/79/6892bea160dda36c74bbe8c4275db4351df7f3d3469e98e1abdecbbbf9fe/modelgauge-0.6.3.tar.gz",
    "platform": null,
    "description": "# ModelGauge\n\nGoal: Make it easy to automatically and uniformly measure the behavior of many AI Systems.\n\n> [!WARNING]\n> This repo is still in **beta** with a planned full release in Fall 2024. Until then we reserve the right to make backward incompatible changes as needed.\n\nModelGauge is an evolution of [crfm-helm](https://github.com/stanford-crfm/helm/), intended to meet their existing use cases as well as those needed by the [MLCommons AI Safety](https://mlcommons.org/working-groups/ai-safety/ai-safety/) project.\n\n## Summary\n\nModelGauge is a library that provides a set of interfaces for Tests and Systems Under Test (SUTs) such that:\n\n* Each Test can be applied to all SUTs with the required underlying capabilities (e.g. does it take text input?)\n* Adding new Tests or SUTs can be done without modifications to the core libraries or support from ModelGauge authors.\n\nCurrently ModelGauge is targeted at LLMs and [single turn prompt response Tests](docs/prompt_response_tests.md), with Tests scored by automated Annotators (e.g. LlamaGuard). However, we expect to extend the library to cover more Test, SUT, and Annotation types as we move toward full release.\n\n\n## Docs\n\n* [Developer Quick Start](docs/dev_quick_start.md)\n* [Tutorial for how to create a Test](docs/tutorial_tests.md)\n* [Tutorial for how to create a System Under Test (SUT)](docs/tutorial_suts.md)\n* How we use [plugins](docs/plugins.md) to connect it all together.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Automatically and uniformly measure the behavior of many AI Systems.",
    "version": "0.6.3",
    "project_urls": {
        "Homepage": "https://github.com/mlcommons/modelgauge",
        "Repository": "https://github.com/mlcommons/modelgauge"
    },
    "split_keywords": [
        "ai",
        " genai",
        " llm",
        " nlp",
        " evaluate",
        " measure",
        " quality",
        " testing",
        " prompt",
        " safety",
        " compare",
        " artificial",
        " intelligence",
        " large",
        " language",
        " models"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "61d2dccef44f5399c0ade89ecf319e25ef6f4e9dbec5c71bf84e6d1eae214d84",
                "md5": "d8787fd74768ff78060ffa6c1e302a94",
                "sha256": "a7317b1a8d39221b1ea8455cdb49c895959e57890a0254a26cc1e0ad03ad4344"
            },
            "downloads": -1,
            "filename": "modelgauge-0.6.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d8787fd74768ff78060ffa6c1e302a94",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 72503,
            "upload_time": "2024-09-13T00:19:49",
            "upload_time_iso_8601": "2024-09-13T00:19:49.900463Z",
            "url": "https://files.pythonhosted.org/packages/61/d2/dccef44f5399c0ade89ecf319e25ef6f4e9dbec5c71bf84e6d1eae214d84/modelgauge-0.6.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "51796892bea160dda36c74bbe8c4275db4351df7f3d3469e98e1abdecbbbf9fe",
                "md5": "1d556c642d2e0630335cf1459108f079",
                "sha256": "181ad1f691e5d3bdd3b1de519919ec48da9618cdd3eaebd38d4b655af9391e8b"
            },
            "downloads": -1,
            "filename": "modelgauge-0.6.3.tar.gz",
            "has_sig": false,
            "md5_digest": "1d556c642d2e0630335cf1459108f079",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 55527,
            "upload_time": "2024-09-13T00:19:51",
            "upload_time_iso_8601": "2024-09-13T00:19:51.796304Z",
            "url": "https://files.pythonhosted.org/packages/51/79/6892bea160dda36c74bbe8c4275db4351df7f3d3469e98e1abdecbbbf9fe/modelgauge-0.6.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-13 00:19:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mlcommons",
    "github_project": "modelgauge",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "modelgauge"
}

MLCommons AI Safety