ai2-catwalk


Nameai2-catwalk JSON
Version 0.2.2 PyPI version JSON
download
home_pagehttps://github.com/allenai/catwalk
SummaryA library for evaluating language models.
upload_time2023-01-27 22:08:13
maintainer
docs_urlNone
authorAllen Institute for Artificial Intelligence
requires_python>=3.8.0
licenseApache
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Catwalk

Catwalk shows off models.

Catwalk contains a lot of models, and a lot of tasks. The goal is to be able to run all models on all tasks. In
practice, some combinations are not possible, but many are.

<details>
<summary>Here is the current list of tasks we have implemented.
This list is not showing the `metaicl` and `p3` categories of tasks, because those are
largely variants of the other tasks.
</summary>

```
wikitext
piqa
squad
squadshifts-reddit
squadshifts-amazon
squadshifts-nyt
squadshifts-new-wiki
mrqa::race
mrqa::newsqa
mrqa::triviaqa
mrqa::searchqa
mrqa::hotpotqa
mrqa::naturalquestions
mrqa::bioasq
mrqa::drop
mrqa::relationextraction
mrqa::textbookqa
mrqa::duorc.paraphraserc
squad2
rte
superglue::rte
cola
mnli
mnli_mismatched
mrpc
qnli
qqp
sst
wnli
boolq
cb
copa
multirc
wic
wsc
drop
lambada
lambada_cloze
lambada_mt_en
lambada_mt_fr
lambada_mt_de
lambada_mt_it
lambada_mt_es
prost
mc_taco
pubmedqa
sciq
qa4mre_2011
qa4mre_2012
qa4mre_2013
triviaqa
arc_easy
arc_challenge
logiqa
hellaswag
openbookqa
race
headqa_es
headqa_en
mathqa
webqs
wsc273
winogrande
anli_r1
anli_r2
anli_r3
ethics_cm
ethics_deontology
ethics_justice
ethics_utilitarianism_original
ethics_utilitarianism
ethics_virtue
truthfulqa_gen
mutual
mutual_plus
math_algebra
math_counting_and_prob
math_geometry
math_intermediate_algebra
math_num_theory
math_prealgebra
math_precalc
math_asdiv
arithmetic_2da
arithmetic_2ds
arithmetic_3da
arithmetic_3ds
arithmetic_4da
arithmetic_4ds
arithmetic_5da
arithmetic_5ds
arithmetic_2dm
arithmetic_1dc
anagrams1
anagrams2
cycle_letters
random_insertion
reversed_words
raft::ade_corpus_v2
raft::banking_77
raft::neurips_impact_statement_risks
raft::one_stop_english
raft::overruling
raft::semiconductor_org_types
raft::systematic_review_inclusion
raft::tai_safety_research
raft::terms_of_service
raft::tweet_eval_hate
raft::twitter_complaints
```
</details>

## Installation

<!-- start install -->

**Catwalk** requires Python 3.9 or later.

Unfortunately Catwalk cannot be installed from pypi, because it depends on other packages that are not uploaded to
pypi.

Install from source:
```shell
git clone https://github.com/allenai/catwalk.git
cd catwalk
pip install -e .
```

<!-- end install -->

## Getting started

Let's run GPT2 on PIQA:
```shell
python -m catwalk --model rc::gpt2 --task piqa
```

This will load up GPT2 and use it to perform the PIQA task with the "ranked classification" approach.

You can specify multiple tasks at once:
```shell
python -m catwalk --model rc::gpt2 --task piqa arc_easy
```

It'll print you a nice table with all tasks and the metrics for each task:
```text
arc_challenge   acc     0.22440272569656372
arc_easy        acc     0.3998316526412964
piqa    acc     0.6256800889968872
```

## Training / Finetuning

Catwalk can train models. It can train models on a single task, or on multiple tasks at once.
To train, use this command line:
```shell
python -m catwalk.train --model rc::gpt2 --task piqa
```

You can train on multiple tasks at the same time, if you want to create a multi-task model:
```shell
python -m catwalk.train --model rc::gpt2 --task piqa arc_easy
```

Note that not all models support training. If you want to train one and can't, create an issue and tag @dirkgr in
it. 

## Tango integration

Catwalk uses [Tango](https://github.com/allenai/tango) for caching and executing evaluations. The command line
interface internally constructs a Tango step graph and executes it. You can point the command line to a Tango
workspace to cache results:

```shell
python -m catwalk --model rc::gpt2 --task piqa arc_easy -w ./my-workspace/
```

The second time you run one of those tasks, it will be fast:
```shell
time python -m catwalk --model rc::gpt2 --task piqa -w ./my-workspace/
```

```text
arc_easy	acc	0.39941078424453735
piqa	acc	0.626224160194397

________________________________________________________
Executed in    9.82 secs    fish           external
   usr time    6.51 secs  208.00 micros    6.51 secs
   sys time    1.25 secs  807.00 micros    1.25 secs
```

Tango workspaces also save partial results, so if you interrupt an evaluation half-way through, your progress is
saved.

## Team

<!-- start team -->

**ai2-catwalk** is developed and maintained by the AllenNLP team, backed by [the Allen Institute for Artificial Intelligence (AI2)](https://allenai.org/).
AI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering.
To learn more about who specifically contributed to this codebase, see [our contributors](https://github.com/allenai/catwalk/graphs/contributors) page.

<!-- end team -->

## License

<!-- start license -->

**ai2-catwalk** is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).
A full copy of the license can be found [on GitHub](https://github.com/allenai/catwalk/blob/main/LICENSE).

<!-- end license -->

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/allenai/catwalk",
    "name": "ai2-catwalk",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8.0",
    "maintainer_email": "",
    "keywords": "",
    "author": "Allen Institute for Artificial Intelligence",
    "author_email": "contact@allenai.org",
    "download_url": "https://files.pythonhosted.org/packages/c3/05/5ad15b776b88693058f1a890e58f03ab42d68c9c6590c13e72ac4330a2f1/ai2-catwalk-0.2.2.tar.gz",
    "platform": null,
    "description": "# Catwalk\n\nCatwalk shows off models.\n\nCatwalk contains a lot of models, and a lot of tasks. The goal is to be able to run all models on all tasks. In\npractice, some combinations are not possible, but many are.\n\n<details>\n<summary>Here is the current list of tasks we have implemented.\nThis list is not showing the `metaicl` and `p3` categories of tasks, because those are\nlargely variants of the other tasks.\n</summary>\n\n```\nwikitext\npiqa\nsquad\nsquadshifts-reddit\nsquadshifts-amazon\nsquadshifts-nyt\nsquadshifts-new-wiki\nmrqa::race\nmrqa::newsqa\nmrqa::triviaqa\nmrqa::searchqa\nmrqa::hotpotqa\nmrqa::naturalquestions\nmrqa::bioasq\nmrqa::drop\nmrqa::relationextraction\nmrqa::textbookqa\nmrqa::duorc.paraphraserc\nsquad2\nrte\nsuperglue::rte\ncola\nmnli\nmnli_mismatched\nmrpc\nqnli\nqqp\nsst\nwnli\nboolq\ncb\ncopa\nmultirc\nwic\nwsc\ndrop\nlambada\nlambada_cloze\nlambada_mt_en\nlambada_mt_fr\nlambada_mt_de\nlambada_mt_it\nlambada_mt_es\nprost\nmc_taco\npubmedqa\nsciq\nqa4mre_2011\nqa4mre_2012\nqa4mre_2013\ntriviaqa\narc_easy\narc_challenge\nlogiqa\nhellaswag\nopenbookqa\nrace\nheadqa_es\nheadqa_en\nmathqa\nwebqs\nwsc273\nwinogrande\nanli_r1\nanli_r2\nanli_r3\nethics_cm\nethics_deontology\nethics_justice\nethics_utilitarianism_original\nethics_utilitarianism\nethics_virtue\ntruthfulqa_gen\nmutual\nmutual_plus\nmath_algebra\nmath_counting_and_prob\nmath_geometry\nmath_intermediate_algebra\nmath_num_theory\nmath_prealgebra\nmath_precalc\nmath_asdiv\narithmetic_2da\narithmetic_2ds\narithmetic_3da\narithmetic_3ds\narithmetic_4da\narithmetic_4ds\narithmetic_5da\narithmetic_5ds\narithmetic_2dm\narithmetic_1dc\nanagrams1\nanagrams2\ncycle_letters\nrandom_insertion\nreversed_words\nraft::ade_corpus_v2\nraft::banking_77\nraft::neurips_impact_statement_risks\nraft::one_stop_english\nraft::overruling\nraft::semiconductor_org_types\nraft::systematic_review_inclusion\nraft::tai_safety_research\nraft::terms_of_service\nraft::tweet_eval_hate\nraft::twitter_complaints\n```\n</details>\n\n## Installation\n\n<!-- start install -->\n\n**Catwalk** requires Python 3.9 or later.\n\nUnfortunately Catwalk cannot be installed from pypi, because it depends on other packages that are not uploaded to\npypi.\n\nInstall from source:\n```shell\ngit clone https://github.com/allenai/catwalk.git\ncd catwalk\npip install -e .\n```\n\n<!-- end install -->\n\n## Getting started\n\nLet's run GPT2 on PIQA:\n```shell\npython -m catwalk --model rc::gpt2 --task piqa\n```\n\nThis will load up GPT2 and use it to perform the PIQA task with the \"ranked classification\" approach.\n\nYou can specify multiple tasks at once:\n```shell\npython -m catwalk --model rc::gpt2 --task piqa arc_easy\n```\n\nIt'll print you a nice table with all tasks and the metrics for each task:\n```text\narc_challenge   acc     0.22440272569656372\narc_easy        acc     0.3998316526412964\npiqa    acc     0.6256800889968872\n```\n\n## Training / Finetuning\n\nCatwalk can train models. It can train models on a single task, or on multiple tasks at once.\nTo train, use this command line:\n```shell\npython -m catwalk.train --model rc::gpt2 --task piqa\n```\n\nYou can train on multiple tasks at the same time, if you want to create a multi-task model:\n```shell\npython -m catwalk.train --model rc::gpt2 --task piqa arc_easy\n```\n\nNote that not all models support training. If you want to train one and can't, create an issue and tag @dirkgr in\nit. \n\n## Tango integration\n\nCatwalk uses [Tango](https://github.com/allenai/tango) for caching and executing evaluations. The command line\ninterface internally constructs a Tango step graph and executes it. You can point the command line to a Tango\nworkspace to cache results:\n\n```shell\npython -m catwalk --model rc::gpt2 --task piqa arc_easy -w ./my-workspace/\n```\n\nThe second time you run one of those tasks, it will be fast:\n```shell\ntime python -m catwalk --model rc::gpt2 --task piqa -w ./my-workspace/\n```\n\n```text\narc_easy\tacc\t0.39941078424453735\npiqa\tacc\t0.626224160194397\n\n________________________________________________________\nExecuted in    9.82 secs    fish           external\n   usr time    6.51 secs  208.00 micros    6.51 secs\n   sys time    1.25 secs  807.00 micros    1.25 secs\n```\n\nTango workspaces also save partial results, so if you interrupt an evaluation half-way through, your progress is\nsaved.\n\n## Team\n\n<!-- start team -->\n\n**ai2-catwalk** is developed and maintained by the AllenNLP team, backed by [the Allen Institute for Artificial Intelligence (AI2)](https://allenai.org/).\nAI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering.\nTo learn more about who specifically contributed to this codebase, see [our contributors](https://github.com/allenai/catwalk/graphs/contributors) page.\n\n<!-- end team -->\n\n## License\n\n<!-- start license -->\n\n**ai2-catwalk** is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).\nA full copy of the license can be found [on GitHub](https://github.com/allenai/catwalk/blob/main/LICENSE).\n\n<!-- end license -->\n",
    "bugtrack_url": null,
    "license": "Apache",
    "summary": "A library for evaluating language models.",
    "version": "0.2.2",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0f022a26917258dc67de7f59d1657f4484accb77d06d401bae2b7a243b91c717",
                "md5": "6d508cceaa9f0e2b59210ca0098b1bb6",
                "sha256": "92fff9b89cb0bcc6eda841ad1185a2aacf7cdf1a4cda83b4002dfeedbf04373f"
            },
            "downloads": -1,
            "filename": "ai2_catwalk-0.2.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6d508cceaa9f0e2b59210ca0098b1bb6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8.0",
            "size": 790220,
            "upload_time": "2023-01-27T22:08:11",
            "upload_time_iso_8601": "2023-01-27T22:08:11.350208Z",
            "url": "https://files.pythonhosted.org/packages/0f/02/2a26917258dc67de7f59d1657f4484accb77d06d401bae2b7a243b91c717/ai2_catwalk-0.2.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c3055ad15b776b88693058f1a890e58f03ab42d68c9c6590c13e72ac4330a2f1",
                "md5": "515799e9af5260d28b79fcfe7045ef59",
                "sha256": "20f58606f2d68bf8c1007e51857dc88755590f64b2df41c2c3c3e9ce39baba92"
            },
            "downloads": -1,
            "filename": "ai2-catwalk-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "515799e9af5260d28b79fcfe7045ef59",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8.0",
            "size": 458293,
            "upload_time": "2023-01-27T22:08:13",
            "upload_time_iso_8601": "2023-01-27T22:08:13.932863Z",
            "url": "https://files.pythonhosted.org/packages/c3/05/5ad15b776b88693058f1a890e58f03ab42d68c9c6590c13e72ac4330a2f1/ai2-catwalk-0.2.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-27 22:08:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "allenai",
    "github_project": "catwalk",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "ai2-catwalk"
}
        
Elapsed time: 0.04516s