modelgauge

Name	modelgauge JSON
Version	0.5.1 JSON
	download
home_page	https://github.com/mlcommons/modelgauge
Summary	Automatically and uniformly measure the behavior of many AI Systems.
upload_time	2024-04-27 02:32:52
maintainer	None
docs_url	None
author	MLCommons AI Safety
requires_python	<4.0,>=3.10
license	Apache-2.0
keywords	ai genai llm nlp evaluate measure quality testing prompt safety compare artificial intelligence large language models
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # ModelGauge

Goal: Make it easy to automatically and uniformly measure the behavior of many AI Systems.

> [!WARNING]
> This repo is still in **beta** with a planned full release in Fall 2024. Until then we reserve the right to make backward incompatible changes as needed.

ModelGauge is an evolution of [crfm-helm](https://github.com/stanford-crfm/helm/), intended to meet their existing use cases as well as those needed by the [MLCommons AI Safety](https://mlcommons.org/working-groups/ai-safety/ai-safety/) project.

## Summary

ModelGauge is a library that provides a set of interfaces for Tests and Systems Under Test (SUTs) such that:

* Each Test can be applied to all SUTs with the required underlying capabilities (e.g. does it take text input?)
* Adding new Tests or SUTs can be done without modifications to the core libraries or support from ModelGauge authors.

Currently ModelGauge is targeted at LLMs and [single turn prompt response Tests](docs/prompt_response_tests.md), with Tests scored by automated Annotators (e.g. LlamaGuard). However, we expect to extend the library to cover more Test, SUT, and Annotation types as we move toward full release.


## Docs

* [Developer Quick Start](docs/dev_quick_start.md)
* [Tutorial for how to create a Test](docs/tutorial_tests.md)
* [Tutorial for how to create a System Under Test (SUT)](docs/tutorial_suts.md)
* How we use [plugins](docs/plugins.md) to connect it all together.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mlcommons/modelgauge",
    "name": "modelgauge",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "AI, GenAI, LLM, NLP, evaluate, measure, quality, testing, prompt, safety, compare, artificial, intelligence, Large, Language, Models",
    "author": "MLCommons AI Safety",
    "author_email": "ai-safety-engineering@mlcommons.org",
    "download_url": "https://files.pythonhosted.org/packages/24/21/f072f8a00f8fa5ca9f6b4c72430170c4884fb580edfe098ebcba755c10b4/modelgauge-0.5.1.tar.gz",
    "platform": null,
    "description": "# ModelGauge\n\nGoal: Make it easy to automatically and uniformly measure the behavior of many AI Systems.\n\n> [!WARNING]\n> This repo is still in **beta** with a planned full release in Fall 2024. Until then we reserve the right to make backward incompatible changes as needed.\n\nModelGauge is an evolution of [crfm-helm](https://github.com/stanford-crfm/helm/), intended to meet their existing use cases as well as those needed by the [MLCommons AI Safety](https://mlcommons.org/working-groups/ai-safety/ai-safety/) project.\n\n## Summary\n\nModelGauge is a library that provides a set of interfaces for Tests and Systems Under Test (SUTs) such that:\n\n* Each Test can be applied to all SUTs with the required underlying capabilities (e.g. does it take text input?)\n* Adding new Tests or SUTs can be done without modifications to the core libraries or support from ModelGauge authors.\n\nCurrently ModelGauge is targeted at LLMs and [single turn prompt response Tests](docs/prompt_response_tests.md), with Tests scored by automated Annotators (e.g. LlamaGuard). However, we expect to extend the library to cover more Test, SUT, and Annotation types as we move toward full release.\n\n\n## Docs\n\n* [Developer Quick Start](docs/dev_quick_start.md)\n* [Tutorial for how to create a Test](docs/tutorial_tests.md)\n* [Tutorial for how to create a System Under Test (SUT)](docs/tutorial_suts.md)\n* How we use [plugins](docs/plugins.md) to connect it all together.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Automatically and uniformly measure the behavior of many AI Systems.",
    "version": "0.5.1",
    "project_urls": {
        "Homepage": "https://github.com/mlcommons/modelgauge",
        "Repository": "https://github.com/mlcommons/modelgauge"
    },
    "split_keywords": [
        "ai",
        " genai",
        " llm",
        " nlp",
        " evaluate",
        " measure",
        " quality",
        " testing",
        " prompt",
        " safety",
        " compare",
        " artificial",
        " intelligence",
        " large",
        " language",
        " models"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "26cb2f25962f3cb0455c229ae13583a82f4715e083b2f4775d4091bb90026a2d",
                "md5": "23c4ddebd525c71433bdfcc8b53ce758",
                "sha256": "c2d8a35f9156b0baca19d3fb37795e1bbb073bcb14dcaea4b045192dcd490afc"
            },
            "downloads": -1,
            "filename": "modelgauge-0.5.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "23c4ddebd525c71433bdfcc8b53ce758",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.10",
            "size": 44329,
            "upload_time": "2024-04-27T02:32:50",
            "upload_time_iso_8601": "2024-04-27T02:32:50.214398Z",
            "url": "https://files.pythonhosted.org/packages/26/cb/2f25962f3cb0455c229ae13583a82f4715e083b2f4775d4091bb90026a2d/modelgauge-0.5.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2421f072f8a00f8fa5ca9f6b4c72430170c4884fb580edfe098ebcba755c10b4",
                "md5": "0fc07bf69863488fdc5e74c9e3ec5deb",
                "sha256": "0165aabe059dcb1a9e4bdac9b4de85c1e4264b0e8204d66fd1af307d9b60d43d"
            },
            "downloads": -1,
            "filename": "modelgauge-0.5.1.tar.gz",
            "has_sig": false,
            "md5_digest": "0fc07bf69863488fdc5e74c9e3ec5deb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 32482,
            "upload_time": "2024-04-27T02:32:52",
            "upload_time_iso_8601": "2024-04-27T02:32:52.004153Z",
            "url": "https://files.pythonhosted.org/packages/24/21/f072f8a00f8fa5ca9f6b4c72430170c4884fb580edfe098ebcba755c10b4/modelgauge-0.5.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-27 02:32:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mlcommons",
    "github_project": "modelgauge",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "modelgauge"
}

MLCommons AI Safety