# POPGym: Partially Observable Process Gym
![tests](https://github.com/smorad/popgym/actions/workflows/python-app.yml/badge.svg)
[![codecov](https://codecov.io/gh/smorad/popgym/branch/master/graph/badge.svg?token=I47IDFZXSV)](https://codecov.io/gh/smorad/popgym)
POPGym is designed to benchmark memory in deep reinforcement learning. It contains a set of [environments](#popgym-environments) and a collection of [memory model baselines](#popgym-baselines). The full paper is available on [OpenReview](https://openreview.net/forum?id=chDrutUTs0K).
Please see the [documentation](https://popgym.readthedocs.io/en/latest/) for advanced installation instructions and examples. The [environment quickstart](https://popgym.readthedocs.io/en/latest/environment_quickstart.html) will get you up and running in a few minutes.
## Quickstart Install
```python
# Install base environments, only requires numpy and gymnasium
pip install popgym
# Also include navigation environments, which require mazelib
# NOTE: navigation envs require python <3.12 due to mazelib not supporting 3.12
pip install "popgym[navigation]"
# Install memory baselines w/ RLlib
pip install "popgym[baselines]"
```
## Quickstart Usage
```python
import popgym
from popgym.wrappers import PreviousAction, Antialias, Flatten, DiscreteAction
env = popgym.envs.position_only_cartpole.PositionOnlyCartPoleEasy()
print(env.reset(seed=0))
wrapped = DiscreteAction(Flatten(PreviousAction(env))) # Append prev action to obs, flatten obs/action spaces, then map the multidiscrete action space to a single discrete action for Q learning
print(wrapped.reset(seed=0))
```
## POPGym Environments
POPGym contains Partially Observable Markov Decision Process (POMDP) environments following the [Gymnasium](https://github.com/Farama-Foundation/Gymnasium) interface. POPGym environments have minimal dependencies and fast enough to solve on a laptop CPU in less than a day. We provide the following environments:
| Environment | Tags | Temporal Ordering | Colab FPS | Macbook Air (2020) FPS |
|---------------------------------------------------------------------------------------------------------|-------------------|-------------------|-------------------|---------------------------|
| [Battleship](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/autoencode/index.html) |Game |None | 117,158 | 235,402 |
| [Concentration](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/concentration/index.html) |Game |Weak | 47,515 | 157,217 |
| [Higher Lower](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/higher_lower/index.html) |Game, Noisy |None | 24,312 | 76,903 |
| [Labyrinth Escape](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/labyrinth_escape/index.html) |Navigation |Strong | 1,399 | 41,122 |
| [Labyrinth Explore](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/labyrinth_explore/index.html) |Navigation |Strong | 1,374 | 30,611 |
| [Minesweeper](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/minesweeper/index.html) |Game |None | 8,434 | 32,003 |
| [Multiarmed Bandit](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/multiarmed_bandit/index.html) |Noisy |None | 48,751 | 469,325 |
| [Autoencode](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/autoencode/index.html) |Diagnostic |Strong | 121,756 | 251,997 |
| [Count Recall](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/count_recall/index.html) |Diagnostic, Noisy |None | 16,799 | 50,311 |
| [Repeat First](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/repeat_first/index.html) |Diagnostic |None | 23,895 | 155,201 |
| [Repeat Previous](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/repeat_previous/index.html) |Diagnostic |Strong | 50,349 | 136,392 |
| [Position Only Cartpole](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/position_only_cartpole/index.html) |Control |Strong | 73,622 | 218,446 |
| [Velocity Only Cartpole](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/velocity_only_cartpole/index.html) |Control |Strong | 69,476 | 214,352 |
| [Noisy Position Only Cartpole](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/noisy_position_only_cartpole/index.html) |Control, Noisy |Strong | 6,269 | 66,891 |
| [Position Only Pendulum](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/position_only_pendulum/index.html) |Control |Strong | 8,168 | 26,358 |
| [Noisy Position Only Pendulum](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/noisy_position_only_pendulum/index.html) |Control, Noisy |Strong | 6,808 | 20,090 |
Feel free to rerun this benchmark using [this colab notebook](https://colab.research.google.com/drive/1_ew-Piq5d9R_NkmP1lSzFX1fbK-swuAN?usp=sharing).
## POPGym Baselines
POPGym baselines implements recurrent and memory model in an efficient manner. POPGym baselines is implemented on top of [`rllib`](https://github.com/ray-project/ray) using their custom model API. We provide the following baselines:
1. [MLP](popgym/baselines/ray_models/ray_mlp.py)
2. [Positional MLP](popgym/baselines/ray_models/ray_mlp.py)
3. [Framestacking](popgym/baselines/ray_models/ray_framestack.py) [(Paper)](https://arxiv.org/abs/1312.5602)
4. [Temporal Convolution Networks](popgym/baselines/ray_models/ray_frameconv.py) [(Paper)](https://arxiv.org/pdf/1803.01271.pdf)
5. [Elman Networks](https://github.com/smorad/popgym/blob/master/popgym/baselines/ray_models/ray_elman.py) [(Paper)](http://faculty.otterbein.edu/dstucki/COMP4230/FindingStructureInTime.pdf)
6. [Long Short-Term Memory](popgym/baselines/ray_models/ray_lstm.py) [(Paper)](http://www.bioinf.jku.at/publications/older/2604.pdf)
7. [Gated Recurrent Units](popgym/baselines/ray_models/ray_gru.py) [(Paper)](https://arxiv.org/abs/1412.3555)
8. [Independently Recurrent Neural Networks](popgym/baselines/ray_models/ray_indrnn.py) [(Paper)](https://openaccess.thecvf.com/content_cvpr_2018/papers_backup/Li_Independently_Recurrent_Neural_CVPR_2018_paper.pdf)
9. [Fast Autoregressive Transformers](popgym/baselines/ray_models/ray_linear_attention.py) [(Paper)](https://proceedings.mlr.press/v119/katharopoulos20a.html)
10. [Fast Weight Programmers](popgym/baselines/ray_models/ray_fwp.py) [(Paper)](https://proceedings.mlr.press/v139/schlag21a.html)
12. [Legendre Memory Units](popgym/baselines/ray_models/ray_lmu.py) [(Paper)](https://proceedings.neurips.cc/paper/2019/hash/952285b9b7e7a1be5aa7849f32ffff05-Abstract.html)
12. [Diagonal State Space Models](popgym/baselines/ray_models/ray_s4d.py) [(Paper)](https://arxiv.org/abs/2206.11893)
13. [Differentiable Neural Computers](popgym/baselines/ray_models/ray_diffnc.py) [(Paper)](http://clgiles.ist.psu.edu/IST597/materials/slides/papers-memory/2016-graves.pdf)
# Leaderboard
The leaderboard is available at [paperswithcode](https://paperswithcode.com/dataset/popgym).
# Contributing
Follow style and ensure tests pass
```python
pip install pre-commit
pre-commit install
pytest popgym/tests
```
# Citing
```
@inproceedings{
morad2023popgym,
title={{POPG}ym: Benchmarking Partially Observable Reinforcement Learning},
author={Steven Morad and Ryan Kortvelesy and Matteo Bettini and Stephan Liwicki and Amanda Prorok},
booktitle={The Eleventh International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=chDrutUTs0K}
}
```
Raw data
{
"_id": null,
"home_page": null,
"name": "popgym",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "gym, gymnasium, pomdp, partially observable, reinforcement learning, rl",
"author": "Steven Morad",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/d7/00/5622284465ca906e457dcdf5aaec4b7d814d44bf339e4e0c90a700afe14d/popgym-1.0.6.tar.gz",
"platform": null,
"description": "# POPGym: Partially Observable Process Gym\n![tests](https://github.com/smorad/popgym/actions/workflows/python-app.yml/badge.svg)\n[![codecov](https://codecov.io/gh/smorad/popgym/branch/master/graph/badge.svg?token=I47IDFZXSV)](https://codecov.io/gh/smorad/popgym)\n\nPOPGym is designed to benchmark memory in deep reinforcement learning. It contains a set of [environments](#popgym-environments) and a collection of [memory model baselines](#popgym-baselines). The full paper is available on [OpenReview](https://openreview.net/forum?id=chDrutUTs0K). \n\nPlease see the [documentation](https://popgym.readthedocs.io/en/latest/) for advanced installation instructions and examples. The [environment quickstart](https://popgym.readthedocs.io/en/latest/environment_quickstart.html) will get you up and running in a few minutes.\n\n## Quickstart Install\n\n```python\n# Install base environments, only requires numpy and gymnasium\npip install popgym \n# Also include navigation environments, which require mazelib\n# NOTE: navigation envs require python <3.12 due to mazelib not supporting 3.12\npip install \"popgym[navigation]\" \n# Install memory baselines w/ RLlib \npip install \"popgym[baselines]\" \n```\n\n## Quickstart Usage\n\n```python\nimport popgym\nfrom popgym.wrappers import PreviousAction, Antialias, Flatten, DiscreteAction\nenv = popgym.envs.position_only_cartpole.PositionOnlyCartPoleEasy()\nprint(env.reset(seed=0))\nwrapped = DiscreteAction(Flatten(PreviousAction(env))) # Append prev action to obs, flatten obs/action spaces, then map the multidiscrete action space to a single discrete action for Q learning\nprint(wrapped.reset(seed=0))\n```\n\n## POPGym Environments\n\nPOPGym contains Partially Observable Markov Decision Process (POMDP) environments following the [Gymnasium](https://github.com/Farama-Foundation/Gymnasium) interface. POPGym environments have minimal dependencies and fast enough to solve on a laptop CPU in less than a day. We provide the following environments:\n\n| Environment | Tags | Temporal Ordering | Colab FPS | Macbook Air (2020) FPS |\n|---------------------------------------------------------------------------------------------------------|-------------------|-------------------|-------------------|---------------------------|\n| [Battleship](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/autoencode/index.html) |Game |None | 117,158 | 235,402 |\n| [Concentration](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/concentration/index.html) |Game |Weak | 47,515 | 157,217 |\n| [Higher Lower](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/higher_lower/index.html) |Game, Noisy |None | 24,312 | 76,903 |\n| [Labyrinth Escape](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/labyrinth_escape/index.html) |Navigation |Strong | 1,399 | 41,122 |\n| [Labyrinth Explore](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/labyrinth_explore/index.html) |Navigation |Strong | 1,374 | 30,611 |\n| [Minesweeper](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/minesweeper/index.html) |Game |None | 8,434 | 32,003 |\n| [Multiarmed Bandit](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/multiarmed_bandit/index.html) |Noisy |None | 48,751 | 469,325 |\n| [Autoencode](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/autoencode/index.html) |Diagnostic |Strong | 121,756 | 251,997 |\n| [Count Recall](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/count_recall/index.html) |Diagnostic, Noisy |None | 16,799 | 50,311 |\n| [Repeat First](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/repeat_first/index.html) |Diagnostic |None | 23,895 | 155,201 |\n| [Repeat Previous](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/repeat_previous/index.html) |Diagnostic |Strong | 50,349 | 136,392 |\n| [Position Only Cartpole](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/position_only_cartpole/index.html) |Control |Strong | 73,622 | 218,446 |\n| [Velocity Only Cartpole](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/velocity_only_cartpole/index.html) |Control |Strong | 69,476 | 214,352 |\n| [Noisy Position Only Cartpole](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/noisy_position_only_cartpole/index.html) |Control, Noisy |Strong | 6,269 | 66,891 |\n| [Position Only Pendulum](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/position_only_pendulum/index.html) |Control |Strong | 8,168 | 26,358 |\n| [Noisy Position Only Pendulum](https://popgym.readthedocs.io/en/latest/autoapi/popgym/envs/noisy_position_only_pendulum/index.html) |Control, Noisy |Strong | 6,808 | 20,090 |\n\nFeel free to rerun this benchmark using [this colab notebook](https://colab.research.google.com/drive/1_ew-Piq5d9R_NkmP1lSzFX1fbK-swuAN?usp=sharing).\n\n## POPGym Baselines\nPOPGym baselines implements recurrent and memory model in an efficient manner. POPGym baselines is implemented on top of [`rllib`](https://github.com/ray-project/ray) using their custom model API. We provide the following baselines:\n\n1. [MLP](popgym/baselines/ray_models/ray_mlp.py)\n2. [Positional MLP](popgym/baselines/ray_models/ray_mlp.py)\n3. [Framestacking](popgym/baselines/ray_models/ray_framestack.py) [(Paper)](https://arxiv.org/abs/1312.5602)\n4. [Temporal Convolution Networks](popgym/baselines/ray_models/ray_frameconv.py) [(Paper)](https://arxiv.org/pdf/1803.01271.pdf)\n5. [Elman Networks](https://github.com/smorad/popgym/blob/master/popgym/baselines/ray_models/ray_elman.py) [(Paper)](http://faculty.otterbein.edu/dstucki/COMP4230/FindingStructureInTime.pdf)\n6. [Long Short-Term Memory](popgym/baselines/ray_models/ray_lstm.py) [(Paper)](http://www.bioinf.jku.at/publications/older/2604.pdf)\n7. [Gated Recurrent Units](popgym/baselines/ray_models/ray_gru.py) [(Paper)](https://arxiv.org/abs/1412.3555)\n8. [Independently Recurrent Neural Networks](popgym/baselines/ray_models/ray_indrnn.py) [(Paper)](https://openaccess.thecvf.com/content_cvpr_2018/papers_backup/Li_Independently_Recurrent_Neural_CVPR_2018_paper.pdf)\n9. [Fast Autoregressive Transformers](popgym/baselines/ray_models/ray_linear_attention.py) [(Paper)](https://proceedings.mlr.press/v119/katharopoulos20a.html)\n10. [Fast Weight Programmers](popgym/baselines/ray_models/ray_fwp.py) [(Paper)](https://proceedings.mlr.press/v139/schlag21a.html) \n12. [Legendre Memory Units](popgym/baselines/ray_models/ray_lmu.py) [(Paper)](https://proceedings.neurips.cc/paper/2019/hash/952285b9b7e7a1be5aa7849f32ffff05-Abstract.html)\n12. [Diagonal State Space Models](popgym/baselines/ray_models/ray_s4d.py) [(Paper)](https://arxiv.org/abs/2206.11893)\n13. [Differentiable Neural Computers](popgym/baselines/ray_models/ray_diffnc.py) [(Paper)](http://clgiles.ist.psu.edu/IST597/materials/slides/papers-memory/2016-graves.pdf)\n\n# Leaderboard\n\nThe leaderboard is available at [paperswithcode](https://paperswithcode.com/dataset/popgym).\n\n# Contributing\nFollow style and ensure tests pass\n\n```python\npip install pre-commit\npre-commit install\npytest popgym/tests\n```\n\n# Citing\n```\n@inproceedings{\nmorad2023popgym,\ntitle={{POPG}ym: Benchmarking Partially Observable Reinforcement Learning},\nauthor={Steven Morad and Ryan Kortvelesy and Matteo Bettini and Stephan Liwicki and Amanda Prorok},\nbooktitle={The Eleventh International Conference on Learning Representations},\nyear={2023},\nurl={https://openreview.net/forum?id=chDrutUTs0K}\n}\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A collection of partially-observable procedural gym environments",
"version": "1.0.6",
"project_urls": null,
"split_keywords": [
"gym",
" gymnasium",
" pomdp",
" partially observable",
" reinforcement learning",
" rl"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "47b041fce7d8938e2a6514c55bb016390fddcf15466743675cb4f42f38031f6a",
"md5": "a7c198cd2d0d9058b6dac9cafdd461f1",
"sha256": "cb9f98f7424aefb78e88249f8ebb429fbc8d37de3d9810a5fa9d6b1ec9245c35"
},
"downloads": -1,
"filename": "popgym-1.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "a7c198cd2d0d9058b6dac9cafdd461f1",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 104295,
"upload_time": "2024-04-03T09:10:51",
"upload_time_iso_8601": "2024-04-03T09:10:51.198847Z",
"url": "https://files.pythonhosted.org/packages/47/b0/41fce7d8938e2a6514c55bb016390fddcf15466743675cb4f42f38031f6a/popgym-1.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d7005622284465ca906e457dcdf5aaec4b7d814d44bf339e4e0c90a700afe14d",
"md5": "3c28c763efcbb506768514f5948402f8",
"sha256": "87aeb14c3294780160c2b62572ebcc76b8f33b3a82c72fc4ce2aec0801f7ce2c"
},
"downloads": -1,
"filename": "popgym-1.0.6.tar.gz",
"has_sig": false,
"md5_digest": "3c28c763efcbb506768514f5948402f8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 72287,
"upload_time": "2024-04-03T09:10:53",
"upload_time_iso_8601": "2024-04-03T09:10:53.087784Z",
"url": "https://files.pythonhosted.org/packages/d7/00/5622284465ca906e457dcdf5aaec4b7d814d44bf339e4e0c90a700afe14d/popgym-1.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-03 09:10:53",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "popgym"
}