# Gymnasium 2048
[![CI](https://github.com/Quentin18/gymnasium-2048/actions/workflows/build.yml/badge.svg)](https://github.com/Quentin18/gymnasium-2048/actions/workflows/build.yml)
[![codestyle](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
Gymnasium environment for the [Game 2048](https://en.wikipedia.org/wiki/2048_(video_game)) and game-playing agents using
temporal difference learning of n-tuple networks.
https://github.com/Quentin18/gymnasium-2048/assets/58831477/c630a605-d1da-412a-a284-75f5c28bab46
<table>
<tbody>
<tr>
<td>Action Space</td>
<td><code>spaces.Discrete(4)</code></td>
</tr>
<tr>
<td>Observation Space</td>
<td><code>spaces.Box(low=0, high=1, shape=(4, 4, 16), dtype=np.uint8)</code></td>
</tr>
<tr>
<td>Import</td>
<td><code>gymnasium.make("gymnasium_2048:gymnasium_2048/TwentyFortyEight-v0")</code></td>
</tr>
</tbody>
</table>
## Installation
To install `gymnasium-2048` with pip, execute:
```bash
pip install gymnasium_2048
```
From source:
```bash
git clone https://github.com/Quentin18/gymnasium-2048
cd gymnasium-2048/
pip install -e .
```
## Environment
### Action Space
The action is an integer representing the direction to slide the tiles:
| Direction | Action |
|-----------|--------|
| 0 | UP |
| 1 | RIGHT |
| 2 | DOWN |
| 3 | LEFT |
### Observation Space
The observation is a 3D `ndarray` encoding the board state. It is encoded into 16 channels, where each channel is a 4x4
binary image. The i-th channel marks each cell of the game position that contains the i-th tile as 1, and 0 otherwise.
Each channel represents the positions of empty cells, 2-tiles, 4-tiles, ... , and 32768-tiles, respectively.
![Observation](./figures/observation.png)
This representation is mostly used for deep convolutional neural networks (DCNN).
### Rewards
At each step, for each tile merge, the player gains a reward
equal to the value of the new tile.
The total reward, corresponding to the game score, is the
sum of rewards obtained throughout the game.
### Starting State
The game starts with two randomly generated tiles. A 2-tile can be generated with probability 0.9 and a 4-tile with
probability 0.1.
### Episode End
The episode ends if there are no legal moves, i.e., all squares are occupied and there are no two adjacent tiles sharing
the same value.
### Arguments
- `size`: the size of the game board. The default value is 4.
- `max_pow`: the maximum power of 2 allowed. The default value is 16.
```python
import gymnasium as gym
gym.make("gymnasium_2048:gymnasium_2048/TwentyFortyEight-v0", size=4, max_pow=16)
```
## Usage
To use the training and evaluation scripts, install the `training` dependencies:
```bash
pip install .[training]
```
### Play Manually
To play the game manually with the four arrows of your keyboard, execute:
```bash
python -m scripts.play
```
See the arguments with the help command:
```bash
python -m scripts.play -h
```
### Train an Agent
To train an agent using temporal difference learning of n-tuple networks, execute:
```bash
python -m scripts.train \
--algo tdl \
-n 100000 \
--eval-freq 5000 \
--eval-episode 1000 \
--save-freq 5000 \
--seed 42 \
-o models/tdl
```
See the arguments with the help command:
```bash
python -m scripts.train -h
```
### Plot Training Metrics
To plot training metrics from logs, execute:
```bash
python -m scripts.plot \
-i train.log \
-t "Temporal Difference Learning" \
-o figures/training_tdl.png
```
See the arguments with the help command:
```bash
python -m scripts.plot -h
```
Here are the training metrics of trained policies over episodes:
| TDL small | TDL |
|------------------------------------------------|------------------------------------|
| ![TDL small](./figures/training_tdl_small.png) | ![TDL](./figures/training_tdl.png) |
### Enjoy a Trained Agent
To see a trained agent in action, execute:
```bash
python -m scripts.enjoy \
--algo tdl \
-i models/tdl/best_n_tuple_network_policy.zip \
-n 1 \
--seed 42
```
See the arguments with the help command:
```bash
python -m scripts.enjoy -h
```
### Evaluate a Trained Agent
To evaluate the performance of a trained agent, execute:
```bash
python -m scripts.evaluate \
--algo tdl \
-i models/tdl/best_n_tuple_network_policy.zip \
-n 1000 \
--seed 42 \
-t "Temporal Difference Learning" \
-o figures/stats_tdl.png
```
See the arguments with the help command:
```bash
python -m scripts.evaluate -h
```
Here are the performances of trained policies:
| TDL small | TDL |
|---------------------------------------------|---------------------------------|
| ![TDL small](./figures/stats_tdl_small.png) | ![TDL](./figures/stats_tdl.png) |
<details>
<summary>Random policy performances</summary>
![Random policy](./figures/stats_random_policy.png)
</details>
## Tests
To run tests, execute:
```bash
pytest
```
## Citing
To cite the repository in publications:
```bibtex
@misc{gymnasium-2048,
author = {Quentin Deschamps},
title = {Gymnasium 2048},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/Quentin18/gymnasium-2048}},
}
```
## References
- [Szubert and Jaśkowski: Temporal Difference Learning of N-Tuple Networks
for the Game 2048](https://www.cs.put.poznan.pl/wjaskowski/pub/papers/Szubert2014_2048.pdf)
- [Guei and Wu: On Reinforcement Learning for the Game of 2048](https://arxiv.org/pdf/2212.11087.pdf)
## Author
[Quentin Deschamps](mailto:quentindeschamps18@gmail.com)
Raw data
{
"_id": null,
"home_page": "",
"name": "gymnasium-2048",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": "",
"keywords": "2048,Reinforcement Learning,game,RL,AI,gymnasium,pygame",
"author": "",
"author_email": "Quentin Deschamps <quentindeschamps18@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/dd/d3/ff0c3330e7c6ecdba9e0f8d48e7b89502abb56bbe087bfa4b46e83469e85/gymnasium_2048-0.0.1.tar.gz",
"platform": null,
"description": "# Gymnasium 2048\n\n[![CI](https://github.com/Quentin18/gymnasium-2048/actions/workflows/build.yml/badge.svg)](https://github.com/Quentin18/gymnasium-2048/actions/workflows/build.yml)\n[![codestyle](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\nGymnasium environment for the [Game 2048](https://en.wikipedia.org/wiki/2048_(video_game)) and game-playing agents using\ntemporal difference learning of n-tuple networks.\n\nhttps://github.com/Quentin18/gymnasium-2048/assets/58831477/c630a605-d1da-412a-a284-75f5c28bab46\n\n<table>\n <tbody>\n <tr>\n <td>Action Space</td>\n <td><code>spaces.Discrete(4)</code></td>\n </tr>\n <tr>\n <td>Observation Space</td>\n <td><code>spaces.Box(low=0, high=1, shape=(4, 4, 16), dtype=np.uint8)</code></td>\n </tr>\n <tr>\n <td>Import</td>\n <td><code>gymnasium.make(\"gymnasium_2048:gymnasium_2048/TwentyFortyEight-v0\")</code></td>\n </tr>\n </tbody>\n</table>\n\n## Installation\n\nTo install `gymnasium-2048` with pip, execute:\n\n```bash\npip install gymnasium_2048\n```\n\nFrom source:\n\n```bash\ngit clone https://github.com/Quentin18/gymnasium-2048\ncd gymnasium-2048/\npip install -e .\n```\n\n## Environment\n\n### Action Space\n\nThe action is an integer representing the direction to slide the tiles:\n\n| Direction | Action |\n|-----------|--------|\n| 0 | UP |\n| 1 | RIGHT |\n| 2 | DOWN |\n| 3 | LEFT |\n\n### Observation Space\n\nThe observation is a 3D `ndarray` encoding the board state. It is encoded into 16 channels, where each channel is a 4x4\nbinary image. The i-th channel marks each cell of the game position that contains the i-th tile as 1, and 0 otherwise.\nEach channel represents the positions of empty cells, 2-tiles, 4-tiles, ... , and 32768-tiles, respectively.\n\n![Observation](./figures/observation.png)\n\nThis representation is mostly used for deep convolutional neural networks (DCNN).\n\n### Rewards\n\nAt each step, for each tile merge, the player gains a reward\nequal to the value of the new tile.\nThe total reward, corresponding to the game score, is the\nsum of rewards obtained throughout the game.\n\n### Starting State\n\nThe game starts with two randomly generated tiles. A 2-tile can be generated with probability 0.9 and a 4-tile with\nprobability 0.1.\n\n### Episode End\n\nThe episode ends if there are no legal moves, i.e., all squares are occupied and there are no two adjacent tiles sharing\nthe same value.\n\n### Arguments\n\n- `size`: the size of the game board. The default value is 4.\n- `max_pow`: the maximum power of 2 allowed. The default value is 16.\n\n```python\nimport gymnasium as gym\n\ngym.make(\"gymnasium_2048:gymnasium_2048/TwentyFortyEight-v0\", size=4, max_pow=16)\n```\n\n## Usage\n\nTo use the training and evaluation scripts, install the `training` dependencies:\n\n```bash\npip install .[training]\n```\n\n### Play Manually\n\nTo play the game manually with the four arrows of your keyboard, execute:\n\n```bash\npython -m scripts.play\n```\n\nSee the arguments with the help command:\n\n```bash\npython -m scripts.play -h\n```\n\n### Train an Agent\n\nTo train an agent using temporal difference learning of n-tuple networks, execute:\n\n```bash\npython -m scripts.train \\\n --algo tdl \\\n -n 100000 \\\n --eval-freq 5000 \\\n --eval-episode 1000 \\\n --save-freq 5000 \\\n --seed 42 \\\n -o models/tdl\n```\n\nSee the arguments with the help command:\n\n```bash\npython -m scripts.train -h\n```\n\n### Plot Training Metrics\n\nTo plot training metrics from logs, execute:\n\n```bash\npython -m scripts.plot \\\n -i train.log \\\n -t \"Temporal Difference Learning\" \\\n -o figures/training_tdl.png\n```\n\nSee the arguments with the help command:\n\n```bash\npython -m scripts.plot -h\n```\n\nHere are the training metrics of trained policies over episodes:\n\n| TDL small | TDL |\n|------------------------------------------------|------------------------------------|\n| ![TDL small](./figures/training_tdl_small.png) | ![TDL](./figures/training_tdl.png) |\n\n### Enjoy a Trained Agent\n\nTo see a trained agent in action, execute:\n\n```bash\npython -m scripts.enjoy \\\n --algo tdl \\\n -i models/tdl/best_n_tuple_network_policy.zip \\\n -n 1 \\\n --seed 42\n```\n\nSee the arguments with the help command:\n\n```bash\npython -m scripts.enjoy -h\n```\n\n### Evaluate a Trained Agent\n\nTo evaluate the performance of a trained agent, execute:\n\n```bash\npython -m scripts.evaluate \\\n --algo tdl \\\n -i models/tdl/best_n_tuple_network_policy.zip \\\n -n 1000 \\\n --seed 42 \\\n -t \"Temporal Difference Learning\" \\\n -o figures/stats_tdl.png\n```\n\nSee the arguments with the help command:\n\n```bash\npython -m scripts.evaluate -h\n```\n\nHere are the performances of trained policies:\n\n| TDL small | TDL |\n|---------------------------------------------|---------------------------------|\n| ![TDL small](./figures/stats_tdl_small.png) | ![TDL](./figures/stats_tdl.png) |\n\n<details>\n<summary>Random policy performances</summary>\n\n![Random policy](./figures/stats_random_policy.png)\n\n</details>\n\n## Tests\n\nTo run tests, execute:\n\n```bash\npytest\n```\n\n## Citing\n\nTo cite the repository in publications:\n\n```bibtex\n@misc{gymnasium-2048,\n author = {Quentin Deschamps},\n title = {Gymnasium 2048},\n year = {2023},\n publisher = {GitHub},\n journal = {GitHub repository},\n howpublished = {\\url{https://github.com/Quentin18/gymnasium-2048}},\n}\n```\n\n## References\n\n- [Szubert and Ja\u015bkowski: Temporal Difference Learning of N-Tuple Networks\n for the Game 2048](https://www.cs.put.poznan.pl/wjaskowski/pub/papers/Szubert2014_2048.pdf)\n- [Guei and Wu: On Reinforcement Learning for the Game of 2048](https://arxiv.org/pdf/2212.11087.pdf)\n\n## Author\n\n[Quentin Deschamps](mailto:quentindeschamps18@gmail.com)\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A reinforcement learning environment for the 2048 game based on Gymnasium",
"version": "0.0.1",
"project_urls": {
"Repository": "https://github.com/Quentin18/gymnasium-2048"
},
"split_keywords": [
"2048",
"reinforcement learning",
"game",
"rl",
"ai",
"gymnasium",
"pygame"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "9c04cc40e868993caf6beeb4ca620a76cbb6c148cdea84c3cd453b4dd31ab2e0",
"md5": "c3e4bf3d4cd57b54a022e7f9eedaae82",
"sha256": "28147aca9e9af1f022222d646129eafd34c9c1c2f23324e89a033fae45628d48"
},
"downloads": -1,
"filename": "gymnasium_2048-0.0.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "c3e4bf3d4cd57b54a022e7f9eedaae82",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 14327,
"upload_time": "2024-01-27T10:09:49",
"upload_time_iso_8601": "2024-01-27T10:09:49.568304Z",
"url": "https://files.pythonhosted.org/packages/9c/04/cc40e868993caf6beeb4ca620a76cbb6c148cdea84c3cd453b4dd31ab2e0/gymnasium_2048-0.0.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ddd3ff0c3330e7c6ecdba9e0f8d48e7b89502abb56bbe087bfa4b46e83469e85",
"md5": "460334d52978a962d2fd2d4ac0188d17",
"sha256": "ab60a6c5f968ec6fcdbdc2408ad7a2a4aa8b8a9a3c5adce68efbb6c38725231f"
},
"downloads": -1,
"filename": "gymnasium_2048-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "460334d52978a962d2fd2d4ac0188d17",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 2602680,
"upload_time": "2024-01-27T10:09:51",
"upload_time_iso_8601": "2024-01-27T10:09:51.749709Z",
"url": "https://files.pythonhosted.org/packages/dd/d3/ff0c3330e7c6ecdba9e0f8d48e7b89502abb56bbe087bfa4b46e83469e85/gymnasium_2048-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-27 10:09:51",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Quentin18",
"github_project": "gymnasium-2048",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "gymnasium-2048"
}