qwop-gym

Name	qwop-gym JSON
Version	1.0.1 JSON
	download
home_page
Summary	A Gymnasium environment for Benett Foddy's game QWOP
upload_time	2023-10-03 14:38:50
maintainer
docs_url	None
author
requires_python	>=3.10
license	Apache License v2.0
keywords	gym gymnasium reinforcement learning ai game qwop farama-foundation
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # QWOP Gym

A Gym environment for Bennet Foddy's game called _QWOP_.

![banner-3](./doc/banner-3.gif)

[Give it a try](https://www.foddy.net/Athletics.html) and see why it's such a
good candidate for Reinforcement Learning :)

You should also check this [video](https://www.youtube.com/watch?v=2qNKjRwcx74) for a demo.

### Features

* A call to `.step()` advances exactly N game frames (configurable)
* Option to disable WebGL rendering for improved performance
* Satisfies the Markov property \*
* State extraction for a slim observation of 60 bytes
* Real-time visualization of various game stats (optional)
* Additional in-game controls for easier debugging

\* given the state includes the steps since last hard reset, see [♻️ Resetting](./doc/env.md#resetting)

## Getting started

1. Install [Python](https://www.python.org/downloads/) 3.10 or higher
1. Install a chrome-based web browser (Google Chrome, Brave, Chromium, etc.)
1. Download [chromedriver](https://googlechromelabs.github.io/chrome-for-testing/) 116.0 or higher
1. Install the `qwop-gym` package and patch QWOP.min.js from your terminal:

```bash
pip install qwop-gym

# Fetch & patch QWOP source code
curl -sL https://www.foddy.net/QWOP.min.js | qwop-gym patch
```

Create an instance in your code:

```python
import qwop_gym

env = gym.make("QWOP-v1", browser="/browser/path", driver="/driver/path")
```

## The `qwop-gym` tool

The `qwop-gym` executable is a handy command-line tool which makes it easy to
play, record and replay episodes, train agents and more.

Firstly, perform the initial setup:

```
qwop-gym bootstrap
```

Play the game (use Q, W, O, P keys):

```bash
qwop-gym play
```

Explore the other available commands:

```bash
$ qwop-gym -h
usage: qwop-gym [options] <action>

options:
  -h, --help  show this help message and exit
  -c FILE     config file, defaults to config/<action>.yml

action:
  play              play QWOP, optionally recording actions
  replay            replay recorded game actions
  train_bc          train using Behavioral Cloning (BC)
  train_gail        train using Generative Adversarial Imitation Learning (GAIL)
  train_airl        train using Adversarial Inverse Reinforcement Learning (AIRL)
  train_ppo         train using Proximal Policy Optimization (PPO)
  train_dqn         train using Deep Q Network (DQN)
  train_qrdqn       train using Quantile Regression DQN (QRDQN)
  spectate          watch a trained model play QWOP, optionally recording actions
  benchmark         evaluate the actions/s achievable with this env
  bootstrap         perform initial setup
  patch             apply patch to original QWOP.min.js code
  help              print this help message

examples:
  qwop-gym play
  qwop-gym -c config/record.yml play
```

For example, to train a PPO agent, edit [`config/ppo.yml`](./config/ppo.yml) and run:

```bash
python qwop-gym train_ppo
```

> [!WARNING]
> Although no rendering occurs during training, the browser window must remain
> open as the game is actually running at very high speeds behind the curtains.

Visualize tensorboard graphs:

```bash
tensorboard --logdir data/
```

Configure `model_file` in [`config/spectate.yml`](./config/spectate.yml) and watch your trained agent play the game:

```bash
python qwop-gym spectate
```

### Imitation

> [!NOTE]
> Imitation learning is powered by the
> [`imitation`](https://github.com/HumanCompatibleAI/imitation) library, which
> depends on the deprecated `gym` library which makes it incompatible with
> QwopEnv. This can be resolved as soon as `imitation` introduces support for
> `gymnasium`. As a workaround, you can checkout the `qwop-gym` project
> locally and use the `gym-compat` branch instead.

```bash
# In this branch, QwopEnv works with the deprecated `gym` library
git checkout gym-compat

# Note that python-3.10 is required, see notes in requirements.txt
pip install -r requirements.txt

# Patch the game again as this branch works with different paths
curl -sL https://www.foddy.net/QWOP.min.js | python -m src.game.patcher
```

For imitation learning, first record some of your own games:

```bash
python qwop-gym.py play -c config/record.yml 
```

Train an imitator via [Behavioral Cloning](https://imitation.readthedocs.io/en/latest/tutorials/1_train_bc.html):

```bash
python qwop-gym.py train_bc
```

### W&B sweeps

If you are a fan of [W&B](https://docs.wandb.ai/guides/sweeps), you can 
use the provided configs in `config/wandb/` and create your own sweeps.

`wandb` is a rather bulky dependency and is not installed by default. Install
it with `pip install wandb` before proceeding with the below examples.

```bash
# create a new W&B sweep
wandb sweep config/wandb/qrdqn.yml

# start a new W&B agent
wandb agent <username>/qwop/<sweep>
``` 

You can check out my W&B public QWOP project
[here](https://wandb.ai/s-manolloff/qwop-gym).
There you can find pre-trained model artifacts (zip files) of some
well-performing agents, as well as see how they compare to each other. This
[youtube video](https://www.youtube.com/watch?v=2qNKjRwcx74) showcases some of
them.

![banner](./doc/banner.gif)

## Developer documentation

Info about the Gym env can be found [here](./doc/env.md)

Details about the QWOP game can be found [here](./doc/game.md)

## Similar projects

* https://github.com/Wesleyliao/QWOP-RL
* https://github.com/drakesvoboda/RL-QWOP
* https://github.com/juanto121/qwop-ai
* https://github.com/ShawnHymel/qwop-ai

In comparison, qwop-gym offers several key features:
* the env is _performant_ - perfect for on-policy algorithms as observations
can be collected at great speeds (more than 2000 observations/sec on an Apple
M2 CPU - orders of magnitute faster than the other QWOP RL envs).
* the env satisfies the _Markov property_ - there are no race conditions and
randomness can be removed if desired, so recorded episodes are 100% replayable
* the env has a _simple reward model_ and compared to other QWOP envs, it is
less biased, eg. no special logic for stuff like _knee bending_,
_low torso height_, _vertical movement_, etc.
* the env allows all possible key combinations (15), other QWOP envs usually
allow only the "useful" 8 key combinations.
* great results (fast, human-like running) achieved by RL agents trained
entirely through self-play, without pre-recorded expert demonstrations
* qwop-gym already contains scripts for training with 6 different algorithms
and adding more to the list is simple - this makes it suitable for exploring
and/or benchmarking a variety of RL algorithms.
* qwop-gym uses reliable open-source implementations of RL algorithms in
contrast to many other projects using "roll-your-own" implementations.
* QWOP's original JS source code is barely modified: 99% of all extra
functionality is designed as a plugin, bundled separately and only a "diff"
of QWOP.min.js is published here (in respect to Benett Foddy's kind request
to refrain from publishing the QWOP source code as part of is _not_
open-source).

## Caveats

The below list highlights some areas in which the project could use some
improvements:

* the OS may put some pretty rough restrictions on the web browser's rendering
as soon as it's put in the background (on OS X at least). Ideally, the browser
should run in a headless mode, but I couldn't find a headless browser that can
support WebGL.
* `gym` is deprecated since October 2022, but the `imitation` library still
does not officially support `gymnasium`. As soon as that is addressed, there
will no longer be required to use the special `gym-compat` branch here for
imitation learning.
* `wandb` uses a monkey-patch for collecting tensorboard logs which does not
work well with GAIL/AIRL/BC (and possibly other algos from `imitation`). As a
result, graphs in wandb have weird names. This is mostly an issue with `wandb`
and/or `imitation` libraries, however there could be a way to work around this
here.
* firefox browser and geckodriver are not supported as an alternative
browser/driver pair, but adding support for them should be fairly easy

## Contributing

Here is a simple guide to follow if you want to contribute to this project:

1. Find an existing issue to work on or submit a new issue which you're also
going to fix. Make sure to notify that you're working on a fix for the issue
you picked.
1. Branch out from latest `main`.
1. Make sure you have formatted your code with the [black](https://github.com/psf/black)
formatter.
1. Commit and push your changes in your branch.
1. Submit a PR.

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "qwop-gym",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "gym,gymnasium,Reinforcement Learning,AI,game,QWOP,Farama-Foundation",
    "author": "",
    "author_email": "Simeon Manolov <s.manolloff@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/92/7c/ee5367d65ddd9431225e75f7cdc9686a77c6d516d02a241b9e849f58c4ad/qwop-gym-1.0.1.tar.gz",
    "platform": null,
    "description": "# QWOP Gym\n\nA Gym environment for Bennet Foddy's game called _QWOP_.\n\n![banner-3](./doc/banner-3.gif)\n\n[Give it a try](https://www.foddy.net/Athletics.html) and see why it's such a\ngood candidate for Reinforcement Learning :)\n\nYou should also check this [video](https://www.youtube.com/watch?v=2qNKjRwcx74) for a demo.\n\n### Features\n\n* A call to `.step()` advances exactly N game frames (configurable)\n* Option to disable WebGL rendering for improved performance\n* Satisfies the Markov property \\*\n* State extraction for a slim observation of 60 bytes\n* Real-time visualization of various game stats (optional)\n* Additional in-game controls for easier debugging\n\n\\* given the state includes the steps since last hard reset, see [\u267b\ufe0f Resetting](./doc/env.md#resetting)\n\n## Getting started\n\n1. Install [Python](https://www.python.org/downloads/) 3.10 or higher\n1. Install a chrome-based web browser (Google Chrome, Brave, Chromium, etc.)\n1. Download [chromedriver](https://googlechromelabs.github.io/chrome-for-testing/) 116.0 or higher\n1. Install the `qwop-gym` package and patch QWOP.min.js from your terminal:\n\n```bash\npip install qwop-gym\n\n# Fetch & patch QWOP source code\ncurl -sL https://www.foddy.net/QWOP.min.js | qwop-gym patch\n```\n\nCreate an instance in your code:\n\n```python\nimport qwop_gym\n\nenv = gym.make(\"QWOP-v1\", browser=\"/browser/path\", driver=\"/driver/path\")\n```\n\n## The `qwop-gym` tool\n\nThe `qwop-gym` executable is a handy command-line tool which makes it easy to\nplay, record and replay episodes, train agents and more.\n\nFirstly, perform the initial setup:\n\n```\nqwop-gym bootstrap\n```\n\nPlay the game (use Q, W, O, P keys):\n\n```bash\nqwop-gym play\n```\n\nExplore the other available commands:\n\n```bash\n$ qwop-gym -h\nusage: qwop-gym [options] <action>\n\noptions:\n  -h, --help  show this help message and exit\n  -c FILE     config file, defaults to config/<action>.yml\n\naction:\n  play              play QWOP, optionally recording actions\n  replay            replay recorded game actions\n  train_bc          train using Behavioral Cloning (BC)\n  train_gail        train using Generative Adversarial Imitation Learning (GAIL)\n  train_airl        train using Adversarial Inverse Reinforcement Learning (AIRL)\n  train_ppo         train using Proximal Policy Optimization (PPO)\n  train_dqn         train using Deep Q Network (DQN)\n  train_qrdqn       train using Quantile Regression DQN (QRDQN)\n  spectate          watch a trained model play QWOP, optionally recording actions\n  benchmark         evaluate the actions/s achievable with this env\n  bootstrap         perform initial setup\n  patch             apply patch to original QWOP.min.js code\n  help              print this help message\n\nexamples:\n  qwop-gym play\n  qwop-gym -c config/record.yml play\n```\n\nFor example, to train a PPO agent, edit [`config/ppo.yml`](./config/ppo.yml) and run:\n\n```bash\npython qwop-gym train_ppo\n```\n\n> [!WARNING]\n> Although no rendering occurs during training, the browser window must remain\n> open as the game is actually running at very high speeds behind the curtains.\n\nVisualize tensorboard graphs:\n\n```bash\ntensorboard --logdir data/\n```\n\nConfigure `model_file` in [`config/spectate.yml`](./config/spectate.yml) and watch your trained agent play the game:\n\n```bash\npython qwop-gym spectate\n```\n\n### Imitation\n\n> [!NOTE]\n> Imitation learning is powered by the\n> [`imitation`](https://github.com/HumanCompatibleAI/imitation) library, which\n> depends on the deprecated `gym` library which makes it incompatible with\n> QwopEnv. This can be resolved as soon as `imitation` introduces support for\n> `gymnasium`. As a workaround, you can checkout the `qwop-gym` project\n> locally and use the `gym-compat` branch instead.\n\n```bash\n# In this branch, QwopEnv works with the deprecated `gym` library\ngit checkout gym-compat\n\n# Note that python-3.10 is required, see notes in requirements.txt\npip install -r requirements.txt\n\n# Patch the game again as this branch works with different paths\ncurl -sL https://www.foddy.net/QWOP.min.js | python -m src.game.patcher\n```\n\nFor imitation learning, first record some of your own games:\n\n```bash\npython qwop-gym.py play -c config/record.yml \n```\n\nTrain an imitator via [Behavioral Cloning](https://imitation.readthedocs.io/en/latest/tutorials/1_train_bc.html):\n\n```bash\npython qwop-gym.py train_bc\n```\n\n### W&B sweeps\n\nIf you are a fan of [W&B](https://docs.wandb.ai/guides/sweeps), you can \nuse the provided configs in `config/wandb/` and create your own sweeps.\n\n`wandb` is a rather bulky dependency and is not installed by default. Install\nit with `pip install wandb` before proceeding with the below examples.\n\n```bash\n# create a new W&B sweep\nwandb sweep config/wandb/qrdqn.yml\n\n# start a new W&B agent\nwandb agent <username>/qwop/<sweep>\n``` \n\nYou can check out my W&B public QWOP project\n[here](https://wandb.ai/s-manolloff/qwop-gym).\nThere you can find pre-trained model artifacts (zip files) of some\nwell-performing agents, as well as see how they compare to each other. This\n[youtube video](https://www.youtube.com/watch?v=2qNKjRwcx74) showcases some of\nthem.\n\n![banner](./doc/banner.gif)\n\n## Developer documentation\n\nInfo about the Gym env can be found [here](./doc/env.md)\n\nDetails about the QWOP game can be found [here](./doc/game.md)\n\n## Similar projects\n\n* https://github.com/Wesleyliao/QWOP-RL\n* https://github.com/drakesvoboda/RL-QWOP\n* https://github.com/juanto121/qwop-ai\n* https://github.com/ShawnHymel/qwop-ai\n\nIn comparison, qwop-gym offers several key features:\n* the env is _performant_ - perfect for on-policy algorithms as observations\ncan be collected at great speeds (more than 2000 observations/sec on an Apple\nM2 CPU - orders of magnitute faster than the other QWOP RL envs).\n* the env satisfies the _Markov property_ - there are no race conditions and\nrandomness can be removed if desired, so recorded episodes are 100% replayable\n* the env has a _simple reward model_ and compared to other QWOP envs, it is\nless biased, eg. no special logic for stuff like _knee bending_,\n_low torso height_, _vertical movement_, etc.\n* the env allows all possible key combinations (15), other QWOP envs usually\nallow only the \"useful\" 8 key combinations.\n* great results (fast, human-like running) achieved by RL agents trained\nentirely through self-play, without pre-recorded expert demonstrations\n* qwop-gym already contains scripts for training with 6 different algorithms\nand adding more to the list is simple - this makes it suitable for exploring\nand/or benchmarking a variety of RL algorithms.\n* qwop-gym uses reliable open-source implementations of RL algorithms in\ncontrast to many other projects using \"roll-your-own\" implementations.\n* QWOP's original JS source code is barely modified: 99% of all extra\nfunctionality is designed as a plugin, bundled separately and only a \"diff\"\nof QWOP.min.js is published here (in respect to Benett Foddy's kind request\nto refrain from publishing the QWOP source code as part of is _not_\nopen-source).\n\n## Caveats\n\nThe below list highlights some areas in which the project could use some\nimprovements:\n\n* the OS may put some pretty rough restrictions on the web browser's rendering\nas soon as it's put in the background (on OS X at least). Ideally, the browser\nshould run in a headless mode, but I couldn't find a headless browser that can\nsupport WebGL.\n* `gym` is deprecated since October 2022, but the `imitation` library still\ndoes not officially support `gymnasium`. As soon as that is addressed, there\nwill no longer be required to use the special `gym-compat` branch here for\nimitation learning.\n* `wandb` uses a monkey-patch for collecting tensorboard logs which does not\nwork well with GAIL/AIRL/BC (and possibly other algos from `imitation`). As a\nresult, graphs in wandb have weird names. This is mostly an issue with `wandb`\nand/or `imitation` libraries, however there could be a way to work around this\nhere.\n* firefox browser and geckodriver are not supported as an alternative\nbrowser/driver pair, but adding support for them should be fairly easy\n\n## Contributing\n\nHere is a simple guide to follow if you want to contribute to this project:\n\n1. Find an existing issue to work on or submit a new issue which you're also\ngoing to fix. Make sure to notify that you're working on a fix for the issue\nyou picked.\n1. Branch out from latest `main`.\n1. Make sure you have formatted your code with the [black](https://github.com/psf/black)\nformatter.\n1. Commit and push your changes in your branch.\n1. Submit a PR.\n",
    "bugtrack_url": null,
    "license": "Apache License v2.0",
    "summary": "A Gymnasium environment for Benett Foddy's game QWOP",
    "version": "1.0.1",
    "project_urls": {
        "Bug Tracker": "https://github.com/smanolloff/qwop-gym/issues",
        "Homepage": "https://github.com/smanolloff/qwop-gym"
    },
    "split_keywords": [
        "gym",
        "gymnasium",
        "reinforcement learning",
        "ai",
        "game",
        "qwop",
        "farama-foundation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f3fa9789b0456ed4a44d0b950906e8c9cbc95d281b3f3a750c0aa17e5f7a7ffa",
                "md5": "57f28f2ed11d07c9fa1bcbd9d82e4cac",
                "sha256": "d91673537bfb87b422b25b3391a7d0fea68fe79cde9dac446ee031501322f3fd"
            },
            "downloads": -1,
            "filename": "qwop_gym-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "57f28f2ed11d07c9fa1bcbd9d82e4cac",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 279201,
            "upload_time": "2023-10-03T14:38:48",
            "upload_time_iso_8601": "2023-10-03T14:38:48.501554Z",
            "url": "https://files.pythonhosted.org/packages/f3/fa/9789b0456ed4a44d0b950906e8c9cbc95d281b3f3a750c0aa17e5f7a7ffa/qwop_gym-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "927cee5367d65ddd9431225e75f7cdc9686a77c6d516d02a241b9e849f58c4ad",
                "md5": "e87a37efdb6b81a23e2a1834a5c135dc",
                "sha256": "88631caad1455d6a19657e0a204a1403686851eec8907d628ff626d645b98526"
            },
            "downloads": -1,
            "filename": "qwop-gym-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "e87a37efdb6b81a23e2a1834a5c135dc",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 257408,
            "upload_time": "2023-10-03T14:38:50",
            "upload_time_iso_8601": "2023-10-03T14:38:50.250546Z",
            "url": "https://files.pythonhosted.org/packages/92/7c/ee5367d65ddd9431225e75f7cdc9686a77c6d516d02a241b9e849f58c4ad/qwop-gym-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-10-03 14:38:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "smanolloff",
    "github_project": "qwop-gym",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "qwop-gym"
}