gymnasium-2048

Name	gymnasium-2048 JSON
Version	0.0.1 JSON
	download
home_page
Summary	A reinforcement learning environment for the 2048 game based on Gymnasium
upload_time	2024-01-27 10:09:51
maintainer
docs_url	None
author
requires_python	>=3.10
license	MIT
keywords	2048 reinforcement learning game rl ai gymnasium pygame
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Gymnasium 2048

[![CI](https://github.com/Quentin18/gymnasium-2048/actions/workflows/build.yml/badge.svg)](https://github.com/Quentin18/gymnasium-2048/actions/workflows/build.yml)
[![codestyle](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

Gymnasium environment for the [Game 2048](https://en.wikipedia.org/wiki/2048_(video_game)) and game-playing agents using
temporal difference learning of n-tuple networks.

https://github.com/Quentin18/gymnasium-2048/assets/58831477/c630a605-d1da-412a-a284-75f5c28bab46

<table>
    <tbody>
        <tr>
            <td>Action Space</td>
            <td><code>spaces.Discrete(4)</code></td>
        </tr>
        <tr>
            <td>Observation Space</td>
            <td><code>spaces.Box(low=0, high=1, shape=(4, 4, 16), dtype=np.uint8)</code></td>
        </tr>
        <tr>
            <td>Import</td>
            <td><code>gymnasium.make("gymnasium_2048:gymnasium_2048/TwentyFortyEight-v0")</code></td>
        </tr>
    </tbody>
</table>

## Installation

To install `gymnasium-2048` with pip, execute:

```bash
pip install gymnasium_2048
```

From source:

```bash
git clone https://github.com/Quentin18/gymnasium-2048
cd gymnasium-2048/
pip install -e .
```

## Environment

### Action Space

The action is an integer representing the direction to slide the tiles:

| Direction | Action |
|-----------|--------|
| 0         | UP     |
| 1         | RIGHT  |
| 2         | DOWN   |
| 3         | LEFT   |

### Observation Space

The observation is a 3D `ndarray` encoding the board state. It is encoded into 16 channels, where each channel is a 4x4
binary image. The i-th channel marks each cell of the game position that contains the i-th tile as 1, and 0 otherwise.
Each channel represents the positions of empty cells, 2-tiles, 4-tiles, ... , and 32768-tiles, respectively.

![Observation](./figures/observation.png)

This representation is mostly used for deep convolutional neural networks (DCNN).

### Rewards

At each step, for each tile merge, the player gains a reward
equal to the value of the new tile.
The total reward, corresponding to the game score, is the
sum of rewards obtained throughout the game.

### Starting State

The game starts with two randomly generated tiles. A 2-tile can be generated with probability 0.9 and a 4-tile with
probability 0.1.

### Episode End

The episode ends if there are no legal moves, i.e., all squares are occupied and there are no two adjacent tiles sharing
the same value.

### Arguments

- `size`: the size of the game board. The default value is 4.
- `max_pow`: the maximum power of 2 allowed. The default value is 16.

```python
import gymnasium as gym

gym.make("gymnasium_2048:gymnasium_2048/TwentyFortyEight-v0", size=4, max_pow=16)
```

## Usage

To use the training and evaluation scripts, install the `training` dependencies:

```bash
pip install .[training]
```

### Play Manually

To play the game manually with the four arrows of your keyboard, execute:

```bash
python -m scripts.play
```

See the arguments with the help command:

```bash
python -m scripts.play -h
```

### Train an Agent

To train an agent using temporal difference learning of n-tuple networks, execute:

```bash
python -m scripts.train \
  --algo tdl \
  -n 100000 \
  --eval-freq 5000 \
  --eval-episode 1000 \
  --save-freq 5000 \
  --seed 42 \
  -o models/tdl
```

See the arguments with the help command:

```bash
python -m scripts.train -h
```

### Plot Training Metrics

To plot training metrics from logs, execute:

```bash
python -m scripts.plot \
  -i train.log \
  -t "Temporal Difference Learning" \
  -o figures/training_tdl.png
```

See the arguments with the help command:

```bash
python -m scripts.plot -h
```

Here are the training metrics of trained policies over episodes:

| TDL small                                      | TDL                                |
|------------------------------------------------|------------------------------------|
| ![TDL small](./figures/training_tdl_small.png) | ![TDL](./figures/training_tdl.png) |

### Enjoy a Trained Agent

To see a trained agent in action, execute:

```bash
python -m scripts.enjoy \
  --algo tdl \
  -i models/tdl/best_n_tuple_network_policy.zip \
  -n 1 \
  --seed 42
```

See the arguments with the help command:

```bash
python -m scripts.enjoy -h
```

### Evaluate a Trained Agent

To evaluate the performance of a trained agent, execute:

```bash
python -m scripts.evaluate \
  --algo tdl \
  -i models/tdl/best_n_tuple_network_policy.zip \
  -n 1000 \
  --seed 42 \
  -t "Temporal Difference Learning" \
  -o figures/stats_tdl.png
```

See the arguments with the help command:

```bash
python -m scripts.evaluate -h
```

Here are the performances of trained policies:

| TDL small                                   | TDL                             |
|---------------------------------------------|---------------------------------|
| ![TDL small](./figures/stats_tdl_small.png) | ![TDL](./figures/stats_tdl.png) |

<details>
<summary>Random policy performances</summary>

![Random policy](./figures/stats_random_policy.png)

</details>

## Tests

To run tests, execute:

```bash
pytest
```

## Citing

To cite the repository in publications:

```bibtex
@misc{gymnasium-2048,
  author = {Quentin Deschamps},
  title = {Gymnasium 2048},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/Quentin18/gymnasium-2048}},
}
```

## References

- [Szubert and Jaśkowski: Temporal Difference Learning of N-Tuple Networks
  for the Game 2048](https://www.cs.put.poznan.pl/wjaskowski/pub/papers/Szubert2014_2048.pdf)
- [Guei and Wu: On Reinforcement Learning for the Game of 2048](https://arxiv.org/pdf/2212.11087.pdf)

## Author

[Quentin Deschamps](mailto:quentindeschamps18@gmail.com)

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "gymnasium-2048",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "",
    "keywords": "2048,Reinforcement Learning,game,RL,AI,gymnasium,pygame",
    "author": "",
    "author_email": "Quentin Deschamps <quentindeschamps18@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/dd/d3/ff0c3330e7c6ecdba9e0f8d48e7b89502abb56bbe087bfa4b46e83469e85/gymnasium_2048-0.0.1.tar.gz",
    "platform": null,
    "description": "# Gymnasium 2048\n\n[![CI](https://github.com/Quentin18/gymnasium-2048/actions/workflows/build.yml/badge.svg)](https://github.com/Quentin18/gymnasium-2048/actions/workflows/build.yml)\n[![codestyle](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\nGymnasium environment for the [Game 2048](https://en.wikipedia.org/wiki/2048_(video_game)) and game-playing agents using\ntemporal difference learning of n-tuple networks.\n\nhttps://github.com/Quentin18/gymnasium-2048/assets/58831477/c630a605-d1da-412a-a284-75f5c28bab46\n\n<table>\n    <tbody>\n        <tr>\n            <td>Action Space</td>\n            <td><code>spaces.Discrete(4)</code></td>\n        </tr>\n        <tr>\n            <td>Observation Space</td>\n            <td><code>spaces.Box(low=0, high=1, shape=(4, 4, 16), dtype=np.uint8)</code></td>\n        </tr>\n        <tr>\n            <td>Import</td>\n            <td><code>gymnasium.make(\"gymnasium_2048:gymnasium_2048/TwentyFortyEight-v0\")</code></td>\n        </tr>\n    </tbody>\n</table>\n\n## Installation\n\nTo install `gymnasium-2048` with pip, execute:\n\n```bash\npip install gymnasium_2048\n```\n\nFrom source:\n\n```bash\ngit clone https://github.com/Quentin18/gymnasium-2048\ncd gymnasium-2048/\npip install -e .\n```\n\n## Environment\n\n### Action Space\n\nThe action is an integer representing the direction to slide the tiles:\n\n| Direction | Action |\n|-----------|--------|\n| 0         | UP     |\n| 1         | RIGHT  |\n| 2         | DOWN   |\n| 3         | LEFT   |\n\n### Observation Space\n\nThe observation is a 3D `ndarray` encoding the board state. It is encoded into 16 channels, where each channel is a 4x4\nbinary image. The i-th channel marks each cell of the game position that contains the i-th tile as 1, and 0 otherwise.\nEach channel represents the positions of empty cells, 2-tiles, 4-tiles, ... , and 32768-tiles, respectively.\n\n![Observation](./figures/observation.png)\n\nThis representation is mostly used for deep convolutional neural networks (DCNN).\n\n### Rewards\n\nAt each step, for each tile merge, the player gains a reward\nequal to the value of the new tile.\nThe total reward, corresponding to the game score, is the\nsum of rewards obtained throughout the game.\n\n### Starting State\n\nThe game starts with two randomly generated tiles. A 2-tile can be generated with probability 0.9 and a 4-tile with\nprobability 0.1.\n\n### Episode End\n\nThe episode ends if there are no legal moves, i.e., all squares are occupied and there are no two adjacent tiles sharing\nthe same value.\n\n### Arguments\n\n- `size`: the size of the game board. The default value is 4.\n- `max_pow`: the maximum power of 2 allowed. The default value is 16.\n\n```python\nimport gymnasium as gym\n\ngym.make(\"gymnasium_2048:gymnasium_2048/TwentyFortyEight-v0\", size=4, max_pow=16)\n```\n\n## Usage\n\nTo use the training and evaluation scripts, install the `training` dependencies:\n\n```bash\npip install .[training]\n```\n\n### Play Manually\n\nTo play the game manually with the four arrows of your keyboard, execute:\n\n```bash\npython -m scripts.play\n```\n\nSee the arguments with the help command:\n\n```bash\npython -m scripts.play -h\n```\n\n### Train an Agent\n\nTo train an agent using temporal difference learning of n-tuple networks, execute:\n\n```bash\npython -m scripts.train \\\n  --algo tdl \\\n  -n 100000 \\\n  --eval-freq 5000 \\\n  --eval-episode 1000 \\\n  --save-freq 5000 \\\n  --seed 42 \\\n  -o models/tdl\n```\n\nSee the arguments with the help command:\n\n```bash\npython -m scripts.train -h\n```\n\n### Plot Training Metrics\n\nTo plot training metrics from logs, execute:\n\n```bash\npython -m scripts.plot \\\n  -i train.log \\\n  -t \"Temporal Difference Learning\" \\\n  -o figures/training_tdl.png\n```\n\nSee the arguments with the help command:\n\n```bash\npython -m scripts.plot -h\n```\n\nHere are the training metrics of trained policies over episodes:\n\n| TDL small                                      | TDL                                |\n|------------------------------------------------|------------------------------------|\n| ![TDL small](./figures/training_tdl_small.png) | ![TDL](./figures/training_tdl.png) |\n\n### Enjoy a Trained Agent\n\nTo see a trained agent in action, execute:\n\n```bash\npython -m scripts.enjoy \\\n  --algo tdl \\\n  -i models/tdl/best_n_tuple_network_policy.zip \\\n  -n 1 \\\n  --seed 42\n```\n\nSee the arguments with the help command:\n\n```bash\npython -m scripts.enjoy -h\n```\n\n### Evaluate a Trained Agent\n\nTo evaluate the performance of a trained agent, execute:\n\n```bash\npython -m scripts.evaluate \\\n  --algo tdl \\\n  -i models/tdl/best_n_tuple_network_policy.zip \\\n  -n 1000 \\\n  --seed 42 \\\n  -t \"Temporal Difference Learning\" \\\n  -o figures/stats_tdl.png\n```\n\nSee the arguments with the help command:\n\n```bash\npython -m scripts.evaluate -h\n```\n\nHere are the performances of trained policies:\n\n| TDL small                                   | TDL                             |\n|---------------------------------------------|---------------------------------|\n| ![TDL small](./figures/stats_tdl_small.png) | ![TDL](./figures/stats_tdl.png) |\n\n<details>\n<summary>Random policy performances</summary>\n\n![Random policy](./figures/stats_random_policy.png)\n\n</details>\n\n## Tests\n\nTo run tests, execute:\n\n```bash\npytest\n```\n\n## Citing\n\nTo cite the repository in publications:\n\n```bibtex\n@misc{gymnasium-2048,\n  author = {Quentin Deschamps},\n  title = {Gymnasium 2048},\n  year = {2023},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/Quentin18/gymnasium-2048}},\n}\n```\n\n## References\n\n- [Szubert and Ja\u015bkowski: Temporal Difference Learning of N-Tuple Networks\n  for the Game 2048](https://www.cs.put.poznan.pl/wjaskowski/pub/papers/Szubert2014_2048.pdf)\n- [Guei and Wu: On Reinforcement Learning for the Game of 2048](https://arxiv.org/pdf/2212.11087.pdf)\n\n## Author\n\n[Quentin Deschamps](mailto:quentindeschamps18@gmail.com)\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A reinforcement learning environment for the 2048 game based on Gymnasium",
    "version": "0.0.1",
    "project_urls": {
        "Repository": "https://github.com/Quentin18/gymnasium-2048"
    },
    "split_keywords": [
        "2048",
        "reinforcement learning",
        "game",
        "rl",
        "ai",
        "gymnasium",
        "pygame"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9c04cc40e868993caf6beeb4ca620a76cbb6c148cdea84c3cd453b4dd31ab2e0",
                "md5": "c3e4bf3d4cd57b54a022e7f9eedaae82",
                "sha256": "28147aca9e9af1f022222d646129eafd34c9c1c2f23324e89a033fae45628d48"
            },
            "downloads": -1,
            "filename": "gymnasium_2048-0.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c3e4bf3d4cd57b54a022e7f9eedaae82",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 14327,
            "upload_time": "2024-01-27T10:09:49",
            "upload_time_iso_8601": "2024-01-27T10:09:49.568304Z",
            "url": "https://files.pythonhosted.org/packages/9c/04/cc40e868993caf6beeb4ca620a76cbb6c148cdea84c3cd453b4dd31ab2e0/gymnasium_2048-0.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ddd3ff0c3330e7c6ecdba9e0f8d48e7b89502abb56bbe087bfa4b46e83469e85",
                "md5": "460334d52978a962d2fd2d4ac0188d17",
                "sha256": "ab60a6c5f968ec6fcdbdc2408ad7a2a4aa8b8a9a3c5adce68efbb6c38725231f"
            },
            "downloads": -1,
            "filename": "gymnasium_2048-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "460334d52978a962d2fd2d4ac0188d17",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 2602680,
            "upload_time": "2024-01-27T10:09:51",
            "upload_time_iso_8601": "2024-01-27T10:09:51.749709Z",
            "url": "https://files.pythonhosted.org/packages/dd/d3/ff0c3330e7c6ecdba9e0f8d48e7b89502abb56bbe087bfa4b46e83469e85/gymnasium_2048-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-27 10:09:51",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Quentin18",
    "github_project": "gymnasium-2048",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "gymnasium-2048"
}