# QWOP Gym
A Gym environment for Bennet Foddy's game called _QWOP_.
![banner-3](./doc/banner-3.gif)
[Give it a try](https://www.foddy.net/Athletics.html) and see why it's such a
good candidate for Reinforcement Learning :)
You should also check this [video](https://www.youtube.com/watch?v=2qNKjRwcx74) for a demo.
### Features
* A call to `.step()` advances exactly N game frames (configurable)
* Option to disable WebGL rendering for improved performance
* Satisfies the Markov property \*
* State extraction for a slim observation of 60 bytes
* Real-time visualization of various game stats (optional)
* Additional in-game controls for easier debugging
\* given the state includes the steps since last hard reset, see [♻️ Resetting](./doc/env.md#resetting)
## Getting started
1. Install [Python](https://www.python.org/downloads/) 3.10 or higher
1. Install a chrome-based web browser (Google Chrome, Brave, Chromium, etc.)
1. Download [chromedriver](https://googlechromelabs.github.io/chrome-for-testing/) 116.0 or higher
1. Install the `qwop-gym` package and patch QWOP.min.js from your terminal:
```bash
pip install qwop-gym
# Fetch & patch QWOP source code
curl -sL https://www.foddy.net/QWOP.min.js | qwop-gym patch
```
Create an instance in your code:
```python
import qwop_gym
env = gym.make("QWOP-v1", browser="/browser/path", driver="/driver/path")
```
## The `qwop-gym` tool
The `qwop-gym` executable is a handy command-line tool which makes it easy to
play, record and replay episodes, train agents and more.
Firstly, perform the initial setup:
```
qwop-gym bootstrap
```
Play the game (use Q, W, O, P keys):
```bash
qwop-gym play
```
Explore the other available commands:
```bash
$ qwop-gym -h
usage: qwop-gym [options] <action>
options:
-h, --help show this help message and exit
-c FILE config file, defaults to config/<action>.yml
action:
play play QWOP, optionally recording actions
replay replay recorded game actions
train_bc train using Behavioral Cloning (BC)
train_gail train using Generative Adversarial Imitation Learning (GAIL)
train_airl train using Adversarial Inverse Reinforcement Learning (AIRL)
train_ppo train using Proximal Policy Optimization (PPO)
train_dqn train using Deep Q Network (DQN)
train_qrdqn train using Quantile Regression DQN (QRDQN)
spectate watch a trained model play QWOP, optionally recording actions
benchmark evaluate the actions/s achievable with this env
bootstrap perform initial setup
patch apply patch to original QWOP.min.js code
help print this help message
examples:
qwop-gym play
qwop-gym -c config/record.yml play
```
For example, to train a PPO agent, edit [`config/ppo.yml`](./config/ppo.yml) and run:
```bash
python qwop-gym train_ppo
```
> [!WARNING]
> Although no rendering occurs during training, the browser window must remain
> open as the game is actually running at very high speeds behind the curtains.
Visualize tensorboard graphs:
```bash
tensorboard --logdir data/
```
Configure `model_file` in [`config/spectate.yml`](./config/spectate.yml) and watch your trained agent play the game:
```bash
python qwop-gym spectate
```
### Imitation
> [!NOTE]
> Imitation learning is powered by the
> [`imitation`](https://github.com/HumanCompatibleAI/imitation) library, which
> depends on the deprecated `gym` library which makes it incompatible with
> QwopEnv. This can be resolved as soon as `imitation` introduces support for
> `gymnasium`. As a workaround, you can checkout the `qwop-gym` project
> locally and use the `gym-compat` branch instead.
```bash
# In this branch, QwopEnv works with the deprecated `gym` library
git checkout gym-compat
# Note that python-3.10 is required, see notes in requirements.txt
pip install -r requirements.txt
# Patch the game again as this branch works with different paths
curl -sL https://www.foddy.net/QWOP.min.js | python -m src.game.patcher
```
For imitation learning, first record some of your own games:
```bash
python qwop-gym.py play -c config/record.yml
```
Train an imitator via [Behavioral Cloning](https://imitation.readthedocs.io/en/latest/tutorials/1_train_bc.html):
```bash
python qwop-gym.py train_bc
```
### W&B sweeps
If you are a fan of [W&B](https://docs.wandb.ai/guides/sweeps), you can
use the provided configs in `config/wandb/` and create your own sweeps.
`wandb` is a rather bulky dependency and is not installed by default. Install
it with `pip install wandb` before proceeding with the below examples.
```bash
# create a new W&B sweep
wandb sweep config/wandb/qrdqn.yml
# start a new W&B agent
wandb agent <username>/qwop/<sweep>
```
You can check out my W&B public QWOP project
[here](https://wandb.ai/s-manolloff/qwop-gym).
There you can find pre-trained model artifacts (zip files) of some
well-performing agents, as well as see how they compare to each other. This
[youtube video](https://www.youtube.com/watch?v=2qNKjRwcx74) showcases some of
them.
![banner](./doc/banner.gif)
## Developer documentation
Info about the Gym env can be found [here](./doc/env.md)
Details about the QWOP game can be found [here](./doc/game.md)
## Similar projects
* https://github.com/Wesleyliao/QWOP-RL
* https://github.com/drakesvoboda/RL-QWOP
* https://github.com/juanto121/qwop-ai
* https://github.com/ShawnHymel/qwop-ai
In comparison, qwop-gym offers several key features:
* the env is _performant_ - perfect for on-policy algorithms as observations
can be collected at great speeds (more than 2000 observations/sec on an Apple
M2 CPU - orders of magnitute faster than the other QWOP RL envs).
* the env satisfies the _Markov property_ - there are no race conditions and
randomness can be removed if desired, so recorded episodes are 100% replayable
* the env has a _simple reward model_ and compared to other QWOP envs, it is
less biased, eg. no special logic for stuff like _knee bending_,
_low torso height_, _vertical movement_, etc.
* the env allows all possible key combinations (15), other QWOP envs usually
allow only the "useful" 8 key combinations.
* great results (fast, human-like running) achieved by RL agents trained
entirely through self-play, without pre-recorded expert demonstrations
* qwop-gym already contains scripts for training with 6 different algorithms
and adding more to the list is simple - this makes it suitable for exploring
and/or benchmarking a variety of RL algorithms.
* qwop-gym uses reliable open-source implementations of RL algorithms in
contrast to many other projects using "roll-your-own" implementations.
* QWOP's original JS source code is barely modified: 99% of all extra
functionality is designed as a plugin, bundled separately and only a "diff"
of QWOP.min.js is published here (in respect to Benett Foddy's kind request
to refrain from publishing the QWOP source code as part of is _not_
open-source).
## Caveats
The below list highlights some areas in which the project could use some
improvements:
* the OS may put some pretty rough restrictions on the web browser's rendering
as soon as it's put in the background (on OS X at least). Ideally, the browser
should run in a headless mode, but I couldn't find a headless browser that can
support WebGL.
* `gym` is deprecated since October 2022, but the `imitation` library still
does not officially support `gymnasium`. As soon as that is addressed, there
will no longer be required to use the special `gym-compat` branch here for
imitation learning.
* `wandb` uses a monkey-patch for collecting tensorboard logs which does not
work well with GAIL/AIRL/BC (and possibly other algos from `imitation`). As a
result, graphs in wandb have weird names. This is mostly an issue with `wandb`
and/or `imitation` libraries, however there could be a way to work around this
here.
* firefox browser and geckodriver are not supported as an alternative
browser/driver pair, but adding support for them should be fairly easy
## Contributing
Here is a simple guide to follow if you want to contribute to this project:
1. Find an existing issue to work on or submit a new issue which you're also
going to fix. Make sure to notify that you're working on a fix for the issue
you picked.
1. Branch out from latest `main`.
1. Make sure you have formatted your code with the [black](https://github.com/psf/black)
formatter.
1. Commit and push your changes in your branch.
1. Submit a PR.
Raw data
{
"_id": null,
"home_page": "",
"name": "qwop-gym",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "",
"keywords": "gym,gymnasium,Reinforcement Learning,AI,game,QWOP,Farama-Foundation",
"author": "",
"author_email": "Simeon Manolov <s.manolloff@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/92/7c/ee5367d65ddd9431225e75f7cdc9686a77c6d516d02a241b9e849f58c4ad/qwop-gym-1.0.1.tar.gz",
"platform": null,
"description": "# QWOP Gym\n\nA Gym environment for Bennet Foddy's game called _QWOP_.\n\n![banner-3](./doc/banner-3.gif)\n\n[Give it a try](https://www.foddy.net/Athletics.html) and see why it's such a\ngood candidate for Reinforcement Learning :)\n\nYou should also check this [video](https://www.youtube.com/watch?v=2qNKjRwcx74) for a demo.\n\n### Features\n\n* A call to `.step()` advances exactly N game frames (configurable)\n* Option to disable WebGL rendering for improved performance\n* Satisfies the Markov property \\*\n* State extraction for a slim observation of 60 bytes\n* Real-time visualization of various game stats (optional)\n* Additional in-game controls for easier debugging\n\n\\* given the state includes the steps since last hard reset, see [\u267b\ufe0f Resetting](./doc/env.md#resetting)\n\n## Getting started\n\n1. Install [Python](https://www.python.org/downloads/) 3.10 or higher\n1. Install a chrome-based web browser (Google Chrome, Brave, Chromium, etc.)\n1. Download [chromedriver](https://googlechromelabs.github.io/chrome-for-testing/) 116.0 or higher\n1. Install the `qwop-gym` package and patch QWOP.min.js from your terminal:\n\n```bash\npip install qwop-gym\n\n# Fetch & patch QWOP source code\ncurl -sL https://www.foddy.net/QWOP.min.js | qwop-gym patch\n```\n\nCreate an instance in your code:\n\n```python\nimport qwop_gym\n\nenv = gym.make(\"QWOP-v1\", browser=\"/browser/path\", driver=\"/driver/path\")\n```\n\n## The `qwop-gym` tool\n\nThe `qwop-gym` executable is a handy command-line tool which makes it easy to\nplay, record and replay episodes, train agents and more.\n\nFirstly, perform the initial setup:\n\n```\nqwop-gym bootstrap\n```\n\nPlay the game (use Q, W, O, P keys):\n\n```bash\nqwop-gym play\n```\n\nExplore the other available commands:\n\n```bash\n$ qwop-gym -h\nusage: qwop-gym [options] <action>\n\noptions:\n -h, --help show this help message and exit\n -c FILE config file, defaults to config/<action>.yml\n\naction:\n play play QWOP, optionally recording actions\n replay replay recorded game actions\n train_bc train using Behavioral Cloning (BC)\n train_gail train using Generative Adversarial Imitation Learning (GAIL)\n train_airl train using Adversarial Inverse Reinforcement Learning (AIRL)\n train_ppo train using Proximal Policy Optimization (PPO)\n train_dqn train using Deep Q Network (DQN)\n train_qrdqn train using Quantile Regression DQN (QRDQN)\n spectate watch a trained model play QWOP, optionally recording actions\n benchmark evaluate the actions/s achievable with this env\n bootstrap perform initial setup\n patch apply patch to original QWOP.min.js code\n help print this help message\n\nexamples:\n qwop-gym play\n qwop-gym -c config/record.yml play\n```\n\nFor example, to train a PPO agent, edit [`config/ppo.yml`](./config/ppo.yml) and run:\n\n```bash\npython qwop-gym train_ppo\n```\n\n> [!WARNING]\n> Although no rendering occurs during training, the browser window must remain\n> open as the game is actually running at very high speeds behind the curtains.\n\nVisualize tensorboard graphs:\n\n```bash\ntensorboard --logdir data/\n```\n\nConfigure `model_file` in [`config/spectate.yml`](./config/spectate.yml) and watch your trained agent play the game:\n\n```bash\npython qwop-gym spectate\n```\n\n### Imitation\n\n> [!NOTE]\n> Imitation learning is powered by the\n> [`imitation`](https://github.com/HumanCompatibleAI/imitation) library, which\n> depends on the deprecated `gym` library which makes it incompatible with\n> QwopEnv. This can be resolved as soon as `imitation` introduces support for\n> `gymnasium`. As a workaround, you can checkout the `qwop-gym` project\n> locally and use the `gym-compat` branch instead.\n\n```bash\n# In this branch, QwopEnv works with the deprecated `gym` library\ngit checkout gym-compat\n\n# Note that python-3.10 is required, see notes in requirements.txt\npip install -r requirements.txt\n\n# Patch the game again as this branch works with different paths\ncurl -sL https://www.foddy.net/QWOP.min.js | python -m src.game.patcher\n```\n\nFor imitation learning, first record some of your own games:\n\n```bash\npython qwop-gym.py play -c config/record.yml \n```\n\nTrain an imitator via [Behavioral Cloning](https://imitation.readthedocs.io/en/latest/tutorials/1_train_bc.html):\n\n```bash\npython qwop-gym.py train_bc\n```\n\n### W&B sweeps\n\nIf you are a fan of [W&B](https://docs.wandb.ai/guides/sweeps), you can \nuse the provided configs in `config/wandb/` and create your own sweeps.\n\n`wandb` is a rather bulky dependency and is not installed by default. Install\nit with `pip install wandb` before proceeding with the below examples.\n\n```bash\n# create a new W&B sweep\nwandb sweep config/wandb/qrdqn.yml\n\n# start a new W&B agent\nwandb agent <username>/qwop/<sweep>\n``` \n\nYou can check out my W&B public QWOP project\n[here](https://wandb.ai/s-manolloff/qwop-gym).\nThere you can find pre-trained model artifacts (zip files) of some\nwell-performing agents, as well as see how they compare to each other. This\n[youtube video](https://www.youtube.com/watch?v=2qNKjRwcx74) showcases some of\nthem.\n\n![banner](./doc/banner.gif)\n\n## Developer documentation\n\nInfo about the Gym env can be found [here](./doc/env.md)\n\nDetails about the QWOP game can be found [here](./doc/game.md)\n\n## Similar projects\n\n* https://github.com/Wesleyliao/QWOP-RL\n* https://github.com/drakesvoboda/RL-QWOP\n* https://github.com/juanto121/qwop-ai\n* https://github.com/ShawnHymel/qwop-ai\n\nIn comparison, qwop-gym offers several key features:\n* the env is _performant_ - perfect for on-policy algorithms as observations\ncan be collected at great speeds (more than 2000 observations/sec on an Apple\nM2 CPU - orders of magnitute faster than the other QWOP RL envs).\n* the env satisfies the _Markov property_ - there are no race conditions and\nrandomness can be removed if desired, so recorded episodes are 100% replayable\n* the env has a _simple reward model_ and compared to other QWOP envs, it is\nless biased, eg. no special logic for stuff like _knee bending_,\n_low torso height_, _vertical movement_, etc.\n* the env allows all possible key combinations (15), other QWOP envs usually\nallow only the \"useful\" 8 key combinations.\n* great results (fast, human-like running) achieved by RL agents trained\nentirely through self-play, without pre-recorded expert demonstrations\n* qwop-gym already contains scripts for training with 6 different algorithms\nand adding more to the list is simple - this makes it suitable for exploring\nand/or benchmarking a variety of RL algorithms.\n* qwop-gym uses reliable open-source implementations of RL algorithms in\ncontrast to many other projects using \"roll-your-own\" implementations.\n* QWOP's original JS source code is barely modified: 99% of all extra\nfunctionality is designed as a plugin, bundled separately and only a \"diff\"\nof QWOP.min.js is published here (in respect to Benett Foddy's kind request\nto refrain from publishing the QWOP source code as part of is _not_\nopen-source).\n\n## Caveats\n\nThe below list highlights some areas in which the project could use some\nimprovements:\n\n* the OS may put some pretty rough restrictions on the web browser's rendering\nas soon as it's put in the background (on OS X at least). Ideally, the browser\nshould run in a headless mode, but I couldn't find a headless browser that can\nsupport WebGL.\n* `gym` is deprecated since October 2022, but the `imitation` library still\ndoes not officially support `gymnasium`. As soon as that is addressed, there\nwill no longer be required to use the special `gym-compat` branch here for\nimitation learning.\n* `wandb` uses a monkey-patch for collecting tensorboard logs which does not\nwork well with GAIL/AIRL/BC (and possibly other algos from `imitation`). As a\nresult, graphs in wandb have weird names. This is mostly an issue with `wandb`\nand/or `imitation` libraries, however there could be a way to work around this\nhere.\n* firefox browser and geckodriver are not supported as an alternative\nbrowser/driver pair, but adding support for them should be fairly easy\n\n## Contributing\n\nHere is a simple guide to follow if you want to contribute to this project:\n\n1. Find an existing issue to work on or submit a new issue which you're also\ngoing to fix. Make sure to notify that you're working on a fix for the issue\nyou picked.\n1. Branch out from latest `main`.\n1. Make sure you have formatted your code with the [black](https://github.com/psf/black)\nformatter.\n1. Commit and push your changes in your branch.\n1. Submit a PR.\n",
"bugtrack_url": null,
"license": "Apache License v2.0",
"summary": "A Gymnasium environment for Benett Foddy's game QWOP",
"version": "1.0.1",
"project_urls": {
"Bug Tracker": "https://github.com/smanolloff/qwop-gym/issues",
"Homepage": "https://github.com/smanolloff/qwop-gym"
},
"split_keywords": [
"gym",
"gymnasium",
"reinforcement learning",
"ai",
"game",
"qwop",
"farama-foundation"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f3fa9789b0456ed4a44d0b950906e8c9cbc95d281b3f3a750c0aa17e5f7a7ffa",
"md5": "57f28f2ed11d07c9fa1bcbd9d82e4cac",
"sha256": "d91673537bfb87b422b25b3391a7d0fea68fe79cde9dac446ee031501322f3fd"
},
"downloads": -1,
"filename": "qwop_gym-1.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "57f28f2ed11d07c9fa1bcbd9d82e4cac",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 279201,
"upload_time": "2023-10-03T14:38:48",
"upload_time_iso_8601": "2023-10-03T14:38:48.501554Z",
"url": "https://files.pythonhosted.org/packages/f3/fa/9789b0456ed4a44d0b950906e8c9cbc95d281b3f3a750c0aa17e5f7a7ffa/qwop_gym-1.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "927cee5367d65ddd9431225e75f7cdc9686a77c6d516d02a241b9e849f58c4ad",
"md5": "e87a37efdb6b81a23e2a1834a5c135dc",
"sha256": "88631caad1455d6a19657e0a204a1403686851eec8907d628ff626d645b98526"
},
"downloads": -1,
"filename": "qwop-gym-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "e87a37efdb6b81a23e2a1834a5c135dc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 257408,
"upload_time": "2023-10-03T14:38:50",
"upload_time_iso_8601": "2023-10-03T14:38:50.250546Z",
"url": "https://files.pythonhosted.org/packages/92/7c/ee5367d65ddd9431225e75f7cdc9686a77c6d516d02a241b9e849f58c4ad/qwop-gym-1.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-10-03 14:38:50",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "smanolloff",
"github_project": "qwop-gym",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "qwop-gym"
}