BYOD


NameBYOD JSON
Version 0.3.0 PyPI version JSON
download
home_pagehttps://github.com/neelsjain/BYOD
SummaryBring Your Own Data! Self-Supervised Evaluation for Large Language Models
upload_time2023-06-23 01:57:19
maintainer
docs_urlNone
authorNeel Jain, Khalid Saifullah, Jonas Geiping
requires_python>=3.9
licenseMIT
keywords todo
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Bring Your Own Data! Self-Supervised Evaluation for Large Language Models

The official code for Bring Your Own Data! Self-Supervised Evaluation for Large Language Models.
If you have any questions, feel free to email (<njain17@umd.edu>).


<img src="images/Teaser.png">

## About
To complement conventional evaluation, we propose a framework for _self-supervised model evaluation_. In this framework, metrics are defined as invariances and sensitivities that can be checked in a self-supervised fashion using interventions based only on the model in question rather than external labels. Self-supervised evaluation pipelines are _dataset-agnostic_, and so they can be utilized over larger corpora of evaluation data than conventional metrics, or even directly in production systems to monitor day-to-day performance. In this work, we develop this framework, discuss desiderata for such metrics, and provide a number of case studies for self-supervised metrics: knownledge capability, toxicity detection, long-range (context), word-order, and tokenization sensitivities. By developing these new metrics, we hope to provide a more comprehensive and nuanced understanding of the strengths and limitations of LLMs.

## Installation

You can run `pip install byod` to directly install our package. Or, install directly from source via `pip install git+https://github.com/neelsjain/BYOD/`.

## Dependencies

* transformers==4.28.1
* scipy==1.10.1
* torch==2.0.0
* datasets==2.11.0
* nltk==3.8.1
* apache_beam==2.48.0

Python 3.8 or higher is recommended

## Usage

See `run_model.sh` for examples on how to evaluate a model. We provide scripts to run all huggingface models against metrics computed on wikipedia data, as an example. These are named `run_[metric].py`.

Note that only models are huggingface are currently supported.


You can also use the metrics directly, given your own `model`, `tokenizer`, and `dataset`, like so
```
import BYOD

long_range_sensitivity = BYOD.lrs_metric(model, data, tokenizer)
negation_knowledge = BYOD.negation_metric(model, data, tokenizer)
tokenization_robustness = BYOD.tokenization_metric(model, data, tokenizer)
toxicity_proxy = BYOD.toxicity_metric(model, data, tokenizer)
word_order_sensitivity = BYOD.word_order_metric(model, data, tokenizer)
```


## Suggestions and Pull Requests are welcome!
Everything can be better! If you have suggestions on improving the codebase or the invariance/sensitivity test. Feel free to reach out!

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/neelsjain/BYOD",
    "name": "BYOD",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": "",
    "keywords": "todo",
    "author": "Neel Jain, Khalid Saifullah, Jonas Geiping",
    "author_email": "njain17@umd.edu",
    "download_url": "https://files.pythonhosted.org/packages/14/8e/173e5c1e7ffe71950be6a626f4a2e489dcb1ae5859c66b6a62a75c5e6b9b/BYOD-0.3.0.tar.gz",
    "platform": "any",
    "description": "# Bring Your Own Data! Self-Supervised Evaluation for Large Language Models\n\nThe official code for Bring Your Own Data! Self-Supervised Evaluation for Large Language Models.\nIf you have any questions, feel free to email (<njain17@umd.edu>).\n\n\n<img src=\"images/Teaser.png\">\n\n## About\nTo complement conventional evaluation, we propose a framework for _self-supervised model evaluation_. In this framework, metrics are defined as invariances and sensitivities that can be checked in a self-supervised fashion using interventions based only on the model in question rather than external labels. Self-supervised evaluation pipelines are _dataset-agnostic_, and so they can be utilized over larger corpora of evaluation data than conventional metrics, or even directly in production systems to monitor day-to-day performance. In this work, we develop this framework, discuss desiderata for such metrics, and provide a number of case studies for self-supervised metrics: knownledge capability, toxicity detection, long-range (context), word-order, and tokenization sensitivities. By developing these new metrics, we hope to provide a more comprehensive and nuanced understanding of the strengths and limitations of LLMs.\n\n## Installation\n\nYou can run `pip install byod` to directly install our package. Or, install directly from source via `pip install git+https://github.com/neelsjain/BYOD/`.\n\n## Dependencies\n\n* transformers==4.28.1\n* scipy==1.10.1\n* torch==2.0.0\n* datasets==2.11.0\n* nltk==3.8.1\n* apache_beam==2.48.0\n\nPython 3.8 or higher is recommended\n\n## Usage\n\nSee `run_model.sh` for examples on how to evaluate a model. We provide scripts to run all huggingface models against metrics computed on wikipedia data, as an example. These are named `run_[metric].py`.\n\nNote that only models are huggingface are currently supported.\n\n\nYou can also use the metrics directly, given your own `model`, `tokenizer`, and `dataset`, like so\n```\nimport BYOD\n\nlong_range_sensitivity = BYOD.lrs_metric(model, data, tokenizer)\nnegation_knowledge = BYOD.negation_metric(model, data, tokenizer)\ntokenization_robustness = BYOD.tokenization_metric(model, data, tokenizer)\ntoxicity_proxy = BYOD.toxicity_metric(model, data, tokenizer)\nword_order_sensitivity = BYOD.word_order_metric(model, data, tokenizer)\n```\n\n\n## Suggestions and Pull Requests are welcome!\nEverything can be better! If you have suggestions on improving the codebase or the invariance/sensitivity test. Feel free to reach out!\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models",
    "version": "0.3.0",
    "project_urls": {
        "Homepage": "https://github.com/neelsjain/BYOD"
    },
    "split_keywords": [
        "todo"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fad9b67797f45b074abdd712c856876369429403f0dbff07d972c6aa133a01ad",
                "md5": "1b8f755fbd0dad333a631effbae9bb99",
                "sha256": "a6b088133439c584633971594bb83211f0a90743546009f6c61bc986d6f51a6e"
            },
            "downloads": -1,
            "filename": "BYOD-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1b8f755fbd0dad333a631effbae9bb99",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 22101,
            "upload_time": "2023-06-23T01:57:18",
            "upload_time_iso_8601": "2023-06-23T01:57:18.159466Z",
            "url": "https://files.pythonhosted.org/packages/fa/d9/b67797f45b074abdd712c856876369429403f0dbff07d972c6aa133a01ad/BYOD-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "148e173e5c1e7ffe71950be6a626f4a2e489dcb1ae5859c66b6a62a75c5e6b9b",
                "md5": "ee30b3b347da95febe49cd4bc075459a",
                "sha256": "d1eb6f2970e3cc5042e1a0edfdb596478112f843a62bc320ae297bac12a2a076"
            },
            "downloads": -1,
            "filename": "BYOD-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ee30b3b347da95febe49cd4bc075459a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 264068,
            "upload_time": "2023-06-23T01:57:19",
            "upload_time_iso_8601": "2023-06-23T01:57:19.996711Z",
            "url": "https://files.pythonhosted.org/packages/14/8e/173e5c1e7ffe71950be6a626f4a2e489dcb1ae5859c66b6a62a75c5e6b9b/BYOD-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-23 01:57:19",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "neelsjain",
    "github_project": "BYOD",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "byod"
}
        
Elapsed time: 0.12671s