# Catwalk
Catwalk shows off models.
Catwalk contains a lot of models, and a lot of tasks. The goal is to be able to run all models on all tasks. In
practice, some combinations are not possible, but many are.
<details>
<summary>Here is the current list of tasks we have implemented.
This list is not showing the `metaicl` and `p3` categories of tasks, because those are
largely variants of the other tasks.
</summary>
```
wikitext
piqa
squad
squadshifts-reddit
squadshifts-amazon
squadshifts-nyt
squadshifts-new-wiki
mrqa::race
mrqa::newsqa
mrqa::triviaqa
mrqa::searchqa
mrqa::hotpotqa
mrqa::naturalquestions
mrqa::bioasq
mrqa::drop
mrqa::relationextraction
mrqa::textbookqa
mrqa::duorc.paraphraserc
squad2
rte
superglue::rte
cola
mnli
mnli_mismatched
mrpc
qnli
qqp
sst
wnli
boolq
cb
copa
multirc
wic
wsc
drop
lambada
lambada_cloze
lambada_mt_en
lambada_mt_fr
lambada_mt_de
lambada_mt_it
lambada_mt_es
prost
mc_taco
pubmedqa
sciq
qa4mre_2011
qa4mre_2012
qa4mre_2013
triviaqa
arc_easy
arc_challenge
logiqa
hellaswag
openbookqa
race
headqa_es
headqa_en
mathqa
webqs
wsc273
winogrande
anli_r1
anli_r2
anli_r3
ethics_cm
ethics_deontology
ethics_justice
ethics_utilitarianism_original
ethics_utilitarianism
ethics_virtue
truthfulqa_gen
mutual
mutual_plus
math_algebra
math_counting_and_prob
math_geometry
math_intermediate_algebra
math_num_theory
math_prealgebra
math_precalc
math_asdiv
arithmetic_2da
arithmetic_2ds
arithmetic_3da
arithmetic_3ds
arithmetic_4da
arithmetic_4ds
arithmetic_5da
arithmetic_5ds
arithmetic_2dm
arithmetic_1dc
anagrams1
anagrams2
cycle_letters
random_insertion
reversed_words
raft::ade_corpus_v2
raft::banking_77
raft::neurips_impact_statement_risks
raft::one_stop_english
raft::overruling
raft::semiconductor_org_types
raft::systematic_review_inclusion
raft::tai_safety_research
raft::terms_of_service
raft::tweet_eval_hate
raft::twitter_complaints
```
</details>
## Installation
<!-- start install -->
**Catwalk** requires Python 3.9 or later.
Unfortunately Catwalk cannot be installed from pypi, because it depends on other packages that are not uploaded to
pypi.
Install from source:
```shell
git clone https://github.com/allenai/catwalk.git
cd catwalk
pip install -e .
```
<!-- end install -->
## Getting started
Let's run GPT2 on PIQA:
```shell
python -m catwalk --model rc::gpt2 --task piqa
```
This will load up GPT2 and use it to perform the PIQA task with the "ranked classification" approach.
You can specify multiple tasks at once:
```shell
python -m catwalk --model rc::gpt2 --task piqa arc_easy
```
It'll print you a nice table with all tasks and the metrics for each task:
```text
arc_challenge acc 0.22440272569656372
arc_easy acc 0.3998316526412964
piqa acc 0.6256800889968872
```
## Training / Finetuning
Catwalk can train models. It can train models on a single task, or on multiple tasks at once.
To train, use this command line:
```shell
python -m catwalk.train --model rc::gpt2 --task piqa
```
You can train on multiple tasks at the same time, if you want to create a multi-task model:
```shell
python -m catwalk.train --model rc::gpt2 --task piqa arc_easy
```
Note that not all models support training. If you want to train one and can't, create an issue and tag @dirkgr in
it.
## Tango integration
Catwalk uses [Tango](https://github.com/allenai/tango) for caching and executing evaluations. The command line
interface internally constructs a Tango step graph and executes it. You can point the command line to a Tango
workspace to cache results:
```shell
python -m catwalk --model rc::gpt2 --task piqa arc_easy -w ./my-workspace/
```
The second time you run one of those tasks, it will be fast:
```shell
time python -m catwalk --model rc::gpt2 --task piqa -w ./my-workspace/
```
```text
arc_easy acc 0.39941078424453735
piqa acc 0.626224160194397
________________________________________________________
Executed in 9.82 secs fish external
usr time 6.51 secs 208.00 micros 6.51 secs
sys time 1.25 secs 807.00 micros 1.25 secs
```
Tango workspaces also save partial results, so if you interrupt an evaluation half-way through, your progress is
saved.
## Team
<!-- start team -->
**ai2-catwalk** is developed and maintained by the AllenNLP team, backed by [the Allen Institute for Artificial Intelligence (AI2)](https://allenai.org/).
AI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering.
To learn more about who specifically contributed to this codebase, see [our contributors](https://github.com/allenai/catwalk/graphs/contributors) page.
<!-- end team -->
## License
<!-- start license -->
**ai2-catwalk** is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).
A full copy of the license can be found [on GitHub](https://github.com/allenai/catwalk/blob/main/LICENSE).
<!-- end license -->
Raw data
{
"_id": null,
"home_page": "https://github.com/allenai/catwalk",
"name": "ai2-catwalk",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8.0",
"maintainer_email": "",
"keywords": "",
"author": "Allen Institute for Artificial Intelligence",
"author_email": "contact@allenai.org",
"download_url": "https://files.pythonhosted.org/packages/c3/05/5ad15b776b88693058f1a890e58f03ab42d68c9c6590c13e72ac4330a2f1/ai2-catwalk-0.2.2.tar.gz",
"platform": null,
"description": "# Catwalk\n\nCatwalk shows off models.\n\nCatwalk contains a lot of models, and a lot of tasks. The goal is to be able to run all models on all tasks. In\npractice, some combinations are not possible, but many are.\n\n<details>\n<summary>Here is the current list of tasks we have implemented.\nThis list is not showing the `metaicl` and `p3` categories of tasks, because those are\nlargely variants of the other tasks.\n</summary>\n\n```\nwikitext\npiqa\nsquad\nsquadshifts-reddit\nsquadshifts-amazon\nsquadshifts-nyt\nsquadshifts-new-wiki\nmrqa::race\nmrqa::newsqa\nmrqa::triviaqa\nmrqa::searchqa\nmrqa::hotpotqa\nmrqa::naturalquestions\nmrqa::bioasq\nmrqa::drop\nmrqa::relationextraction\nmrqa::textbookqa\nmrqa::duorc.paraphraserc\nsquad2\nrte\nsuperglue::rte\ncola\nmnli\nmnli_mismatched\nmrpc\nqnli\nqqp\nsst\nwnli\nboolq\ncb\ncopa\nmultirc\nwic\nwsc\ndrop\nlambada\nlambada_cloze\nlambada_mt_en\nlambada_mt_fr\nlambada_mt_de\nlambada_mt_it\nlambada_mt_es\nprost\nmc_taco\npubmedqa\nsciq\nqa4mre_2011\nqa4mre_2012\nqa4mre_2013\ntriviaqa\narc_easy\narc_challenge\nlogiqa\nhellaswag\nopenbookqa\nrace\nheadqa_es\nheadqa_en\nmathqa\nwebqs\nwsc273\nwinogrande\nanli_r1\nanli_r2\nanli_r3\nethics_cm\nethics_deontology\nethics_justice\nethics_utilitarianism_original\nethics_utilitarianism\nethics_virtue\ntruthfulqa_gen\nmutual\nmutual_plus\nmath_algebra\nmath_counting_and_prob\nmath_geometry\nmath_intermediate_algebra\nmath_num_theory\nmath_prealgebra\nmath_precalc\nmath_asdiv\narithmetic_2da\narithmetic_2ds\narithmetic_3da\narithmetic_3ds\narithmetic_4da\narithmetic_4ds\narithmetic_5da\narithmetic_5ds\narithmetic_2dm\narithmetic_1dc\nanagrams1\nanagrams2\ncycle_letters\nrandom_insertion\nreversed_words\nraft::ade_corpus_v2\nraft::banking_77\nraft::neurips_impact_statement_risks\nraft::one_stop_english\nraft::overruling\nraft::semiconductor_org_types\nraft::systematic_review_inclusion\nraft::tai_safety_research\nraft::terms_of_service\nraft::tweet_eval_hate\nraft::twitter_complaints\n```\n</details>\n\n## Installation\n\n<!-- start install -->\n\n**Catwalk** requires Python 3.9 or later.\n\nUnfortunately Catwalk cannot be installed from pypi, because it depends on other packages that are not uploaded to\npypi.\n\nInstall from source:\n```shell\ngit clone https://github.com/allenai/catwalk.git\ncd catwalk\npip install -e .\n```\n\n<!-- end install -->\n\n## Getting started\n\nLet's run GPT2 on PIQA:\n```shell\npython -m catwalk --model rc::gpt2 --task piqa\n```\n\nThis will load up GPT2 and use it to perform the PIQA task with the \"ranked classification\" approach.\n\nYou can specify multiple tasks at once:\n```shell\npython -m catwalk --model rc::gpt2 --task piqa arc_easy\n```\n\nIt'll print you a nice table with all tasks and the metrics for each task:\n```text\narc_challenge acc 0.22440272569656372\narc_easy acc 0.3998316526412964\npiqa acc 0.6256800889968872\n```\n\n## Training / Finetuning\n\nCatwalk can train models. It can train models on a single task, or on multiple tasks at once.\nTo train, use this command line:\n```shell\npython -m catwalk.train --model rc::gpt2 --task piqa\n```\n\nYou can train on multiple tasks at the same time, if you want to create a multi-task model:\n```shell\npython -m catwalk.train --model rc::gpt2 --task piqa arc_easy\n```\n\nNote that not all models support training. If you want to train one and can't, create an issue and tag @dirkgr in\nit. \n\n## Tango integration\n\nCatwalk uses [Tango](https://github.com/allenai/tango) for caching and executing evaluations. The command line\ninterface internally constructs a Tango step graph and executes it. You can point the command line to a Tango\nworkspace to cache results:\n\n```shell\npython -m catwalk --model rc::gpt2 --task piqa arc_easy -w ./my-workspace/\n```\n\nThe second time you run one of those tasks, it will be fast:\n```shell\ntime python -m catwalk --model rc::gpt2 --task piqa -w ./my-workspace/\n```\n\n```text\narc_easy\tacc\t0.39941078424453735\npiqa\tacc\t0.626224160194397\n\n________________________________________________________\nExecuted in 9.82 secs fish external\n usr time 6.51 secs 208.00 micros 6.51 secs\n sys time 1.25 secs 807.00 micros 1.25 secs\n```\n\nTango workspaces also save partial results, so if you interrupt an evaluation half-way through, your progress is\nsaved.\n\n## Team\n\n<!-- start team -->\n\n**ai2-catwalk** is developed and maintained by the AllenNLP team, backed by [the Allen Institute for Artificial Intelligence (AI2)](https://allenai.org/).\nAI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering.\nTo learn more about who specifically contributed to this codebase, see [our contributors](https://github.com/allenai/catwalk/graphs/contributors) page.\n\n<!-- end team -->\n\n## License\n\n<!-- start license -->\n\n**ai2-catwalk** is licensed under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).\nA full copy of the license can be found [on GitHub](https://github.com/allenai/catwalk/blob/main/LICENSE).\n\n<!-- end license -->\n",
"bugtrack_url": null,
"license": "Apache",
"summary": "A library for evaluating language models.",
"version": "0.2.2",
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0f022a26917258dc67de7f59d1657f4484accb77d06d401bae2b7a243b91c717",
"md5": "6d508cceaa9f0e2b59210ca0098b1bb6",
"sha256": "92fff9b89cb0bcc6eda841ad1185a2aacf7cdf1a4cda83b4002dfeedbf04373f"
},
"downloads": -1,
"filename": "ai2_catwalk-0.2.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6d508cceaa9f0e2b59210ca0098b1bb6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8.0",
"size": 790220,
"upload_time": "2023-01-27T22:08:11",
"upload_time_iso_8601": "2023-01-27T22:08:11.350208Z",
"url": "https://files.pythonhosted.org/packages/0f/02/2a26917258dc67de7f59d1657f4484accb77d06d401bae2b7a243b91c717/ai2_catwalk-0.2.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c3055ad15b776b88693058f1a890e58f03ab42d68c9c6590c13e72ac4330a2f1",
"md5": "515799e9af5260d28b79fcfe7045ef59",
"sha256": "20f58606f2d68bf8c1007e51857dc88755590f64b2df41c2c3c3e9ce39baba92"
},
"downloads": -1,
"filename": "ai2-catwalk-0.2.2.tar.gz",
"has_sig": false,
"md5_digest": "515799e9af5260d28b79fcfe7045ef59",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8.0",
"size": 458293,
"upload_time": "2023-01-27T22:08:13",
"upload_time_iso_8601": "2023-01-27T22:08:13.932863Z",
"url": "https://files.pythonhosted.org/packages/c3/05/5ad15b776b88693058f1a890e58f03ab42d68c9c6590c13e72ac4330a2f1/ai2-catwalk-0.2.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-01-27 22:08:13",
"github": true,
"gitlab": false,
"bitbucket": false,
"github_user": "allenai",
"github_project": "catwalk",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "ai2-catwalk"
}