HASARD

Name	HASARD JSON
Version	0.2.0 JSON
	download
home_page	https://github.com/TTomilin/HASARD
Summary	Egocentric 3D Safe Reinforcement Learning Benchmark
upload_time	2025-08-26 21:39:27
maintainer	None
docs_url	None
author	Tristan Tomilin
requires_python	>=3.8
license	MIT
keywords	safe rl reinforcement learning vizdoom benchmark safety
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # HASARD: A Benchmark for Harnessing Safe Reinforcement Learning with Doom

**HASARD** (**Ha**rnessing **Sa**fe **R**einforcement Learning with **D**oom) is a benchmark for Safe Reinforcement 
Learning within complex, egocentric perception 3D environments derived from the classic DOOM video game. It features 6 
diverse scenarios each spanning across 3 levels of difficulty.

## 🔗 Useful Links
- 🌐 [Project Page](https://sites.google.com/view/hasard-bench/)
- 🎥 [Short Presentation](https://www.youtube.com/watch?v=A-uKxVVKfvo)
- 🎮 [Demo Video](https://www.youtube.com/watch?v=A-uKxVVKfvo)


<p align="center">
</p>


| Scenario                | Level 1                                                                                                  | Level 2                                                                                              | Level 3                                                                                              |
|-------------------------|----------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|
| **Armament Burden**     | <img src="assets/images/armament_burden/level_1.png" alt="Level 1" style="width:400px; height:auto;"/>   | <img src="assets/images/armament_burden/level_2.png" alt="Level 2" style="width:400px; height:auto;"/>    | <img src="assets/images/armament_burden/level_3.png" alt="Level 3" style="width:400px; height:auto;"/>    |
| **Detonator’s Dilemma** | <img src="assets/images/detonators_dilemma/level_1.png" alt="Level 1" style="width:400px; height:auto;"/> | <img src="assets/images/detonators_dilemma/level_2.png" alt="Level 2" style="width:400px; height:auto;"/> | <img src="assets/images/detonators_dilemma/level_3.png" alt="Level 3" style="width:400px; height:auto;"/> |
| **Volcanic Venture**    | <img src="assets/images/volcanic_venture/level_1.png" alt="Level 1" style="width:400px; height:auto;"/>  | <img src="assets/images/volcanic_venture/level_2.png" alt="Level 2" style="width:400px; height:auto;"/>   | <img src="assets/images/volcanic_venture/level_3.png" alt="Level 3" style="width:400px; height:auto;"/>   |
| **Precipice Plunge**    | <img src="assets/images/precipice_plunge/level_1.png" alt="Level 1" style="width:400px; height:auto;"/>  | <img src="assets/images/precipice_plunge/level_2.png" alt="Level 2" style="width:400px; height:auto;"/>   | <img src="assets/images/precipice_plunge/level_3.png" alt="Level 3" style="width:400px; height:auto;"/>   |
| **Collateral Damage**   | <img src="assets/images/collateral_damage/level_1.png" alt="Level 1" style="width:400px; height:auto;"/> | <img src="assets/images/collateral_damage/level_2.png" alt="Level 2" style="width:400px; height:auto;"/>  | <img src="assets/images/collateral_damage/level_3.png" alt="Level 3" style="width:400px; height:auto;"/>  |
| **Remedy Rush**         | <img src="assets/images/remedy_rush/level_1.png" alt="Level 1" style="width:400px; height:auto;"/>       | <img src="assets/images/remedy_rush/level_2.png" alt="Level 2" style="width:400px; height:auto;"/>        | <img src="assets/images/remedy_rush/level_3.png" alt="Level 3" style="width:400px; height:auto;"/>        |


### Key Features
- **Egocentric Perception**: Agents learn solely from first-person pixel observations under partial observability.
- **Beyond Simple Navigation**: Whereas prior benchmarks merely require the agent to reach goal locations on flat surfaces while avoiding obstacles, HASARD necessitates comprehending complex environment dynamics, anticipating the movement of entities, and grasping spatial relationships. 
- **Dynamic Environments**: HASARD features random spawns, unpredictably moving units, and terrain that is constantly moving or periodically changing.
- **Difficulty Levels**: Higher levels go beyond parameter adjustments, introducing entirely new elements and mechanics.
- **Reward-Cost Trade-offs**: Rewards and costs are closely intertwined, with tightening cost budget necessitating a sacrifice of rewards.
- **Safety Constraints**: Each scenario features a hard constraint setting, where any error results in immediate in-game penalties.
- **Focus on Safety**: Achieving high rewards is straightforward, but doing so while staying within the safety budget demands learning complex and nuanced behaviors. 


### Policy Visualization
HASARD enables overlaying a heatmap of the agent's most frequently visited locations providing further insights into its policy and behavior within the environment.
These examples show how an agent navigates Volcanic Venture, Remedy Rush, and Armament Burden during the course of training:

<p align="center">
</p>

### Augmented Observations
HASARD supports augmented observation modes for further visual analysis. By utilizing privileged game state information, 
it can generate simplified observation representations, such as segmenting objects in the scene or rendering the 
environment displaying only depth from surroundings.

<p align="center">
</p>

## Installation
HASARD supports modular installation to install only the dependencies you need:

```bash
# Core dependencies only (environments and basic functionality)
pip install HASARD

# With sample-factory support for training RL agents
pip install HASARD[sample-factory]

# With results analysis and plotting tools
pip install HASARD[results]

# Full installation with all optional dependencies
pip install HASARD[sample-factory,results]
```

To install from source:
```bash
git clone https://github.com/TTomilin/HASARD
cd HASARD
pip install .  # or pip install .[sample-factory,results] for extras
```

## Getting Started
To get started with HASARD, here's a minimal example of running a task environment.
This script can also be found in [`run_env.py`](hasard/examples/run_env.py):

```python
import hasard

env = hasard.make('RemedyRushLevel1-v0')
env.reset()
terminated = truncated = False
steps = total_cost = total_reward = 0
while not (terminated or truncated):
    action = env.action_space.sample()
    state, reward, terminated, truncated, info = env.step(action)
    env.render()
    steps += 1
    total_cost += info['cost']
    total_reward += reward
print(f"Episode finished in {steps} steps. Reward: {total_reward:.2f}. Cost: {total_cost:.2f}")
env.close()
```

## Training
For highly parallelized training of Safe RL agents on HASARD environments, and to reproduce the results from the paper, 
refer to [`sample_factory`](sample_factory/) for detailed usage instructions and examples.

# Acknowledgements
HASARD environments are built on top of the [ViZDoom](https://github.com/mwydmuch/ViZDoom) platform.  
Our Safe RL baseline methods are implemented in [Sample-Factory](https://github.com/alex-petrenko/sample-factory).  
Our experiments were managed using [WandB](https://wandb.ai).

# Citation
If you use our work in your research, please cite it as follows:
```
@inproceedings{tomilin2025hasard,
    title={HASARD: A Benchmark for Vision-Based Safe Reinforcement Learning in Embodied Agents},
    author={T. Tomilin, M. Fang, and M. Pechenizkiy},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025}
}
```

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/TTomilin/HASARD",
    "name": "HASARD",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "safe rl, reinforcement learning, vizdoom, benchmark, safety",
    "author": "Tristan Tomilin",
    "author_email": "tristan.tomilin@hotmail.com",
    "download_url": "https://files.pythonhosted.org/packages/a7/0c/1507bd1bc482227a0079d6d398de03f186fdea364513de94d0a77413b73f/HASARD-0.2.0.tar.gz",
    "platform": null,
    "description": "# HASARD: A Benchmark for Harnessing Safe Reinforcement Learning with Doom\n\n**HASARD** (**Ha**rnessing **Sa**fe **R**einforcement Learning with **D**oom) is a benchmark for Safe Reinforcement \nLearning within complex, egocentric perception 3D environments derived from the classic DOOM video game. It features 6 \ndiverse scenarios each spanning across 3 levels of difficulty.\n\n## \ud83d\udd17 Useful Links\n- \ud83c\udf10 [Project Page](https://sites.google.com/view/hasard-bench/)\n- \ud83c\udfa5 [Short Presentation](https://www.youtube.com/watch?v=A-uKxVVKfvo)\n- \ud83c\udfae [Demo Video](https://www.youtube.com/watch?v=A-uKxVVKfvo)\n\n\n<p align=\"center\">\n</p>\n\n\n| Scenario                | Level 1                                                                                                  | Level 2                                                                                              | Level 3                                                                                              |\n|-------------------------|----------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|\n| **Armament Burden**     | <img src=\"assets/images/armament_burden/level_1.png\" alt=\"Level 1\" style=\"width:400px; height:auto;\"/>   | <img src=\"assets/images/armament_burden/level_2.png\" alt=\"Level 2\" style=\"width:400px; height:auto;\"/>    | <img src=\"assets/images/armament_burden/level_3.png\" alt=\"Level 3\" style=\"width:400px; height:auto;\"/>    |\n| **Detonator\u2019s Dilemma** | <img src=\"assets/images/detonators_dilemma/level_1.png\" alt=\"Level 1\" style=\"width:400px; height:auto;\"/> | <img src=\"assets/images/detonators_dilemma/level_2.png\" alt=\"Level 2\" style=\"width:400px; height:auto;\"/> | <img src=\"assets/images/detonators_dilemma/level_3.png\" alt=\"Level 3\" style=\"width:400px; height:auto;\"/> |\n| **Volcanic Venture**    | <img src=\"assets/images/volcanic_venture/level_1.png\" alt=\"Level 1\" style=\"width:400px; height:auto;\"/>  | <img src=\"assets/images/volcanic_venture/level_2.png\" alt=\"Level 2\" style=\"width:400px; height:auto;\"/>   | <img src=\"assets/images/volcanic_venture/level_3.png\" alt=\"Level 3\" style=\"width:400px; height:auto;\"/>   |\n| **Precipice Plunge**    | <img src=\"assets/images/precipice_plunge/level_1.png\" alt=\"Level 1\" style=\"width:400px; height:auto;\"/>  | <img src=\"assets/images/precipice_plunge/level_2.png\" alt=\"Level 2\" style=\"width:400px; height:auto;\"/>   | <img src=\"assets/images/precipice_plunge/level_3.png\" alt=\"Level 3\" style=\"width:400px; height:auto;\"/>   |\n| **Collateral Damage**   | <img src=\"assets/images/collateral_damage/level_1.png\" alt=\"Level 1\" style=\"width:400px; height:auto;\"/> | <img src=\"assets/images/collateral_damage/level_2.png\" alt=\"Level 2\" style=\"width:400px; height:auto;\"/>  | <img src=\"assets/images/collateral_damage/level_3.png\" alt=\"Level 3\" style=\"width:400px; height:auto;\"/>  |\n| **Remedy Rush**         | <img src=\"assets/images/remedy_rush/level_1.png\" alt=\"Level 1\" style=\"width:400px; height:auto;\"/>       | <img src=\"assets/images/remedy_rush/level_2.png\" alt=\"Level 2\" style=\"width:400px; height:auto;\"/>        | <img src=\"assets/images/remedy_rush/level_3.png\" alt=\"Level 3\" style=\"width:400px; height:auto;\"/>        |\n\n\n### Key Features\n- **Egocentric Perception**: Agents learn solely from first-person pixel observations under partial observability.\n- **Beyond Simple Navigation**: Whereas prior benchmarks merely require the agent to reach goal locations on flat surfaces while avoiding obstacles, HASARD necessitates comprehending complex environment dynamics, anticipating the movement of entities, and grasping spatial relationships. \n- **Dynamic Environments**: HASARD features random spawns, unpredictably moving units, and terrain that is constantly moving or periodically changing.\n- **Difficulty Levels**: Higher levels go beyond parameter adjustments, introducing entirely new elements and mechanics.\n- **Reward-Cost Trade-offs**: Rewards and costs are closely intertwined, with tightening cost budget necessitating a sacrifice of rewards.\n- **Safety Constraints**: Each scenario features a hard constraint setting, where any error results in immediate in-game penalties.\n- **Focus on Safety**: Achieving high rewards is straightforward, but doing so while staying within the safety budget demands learning complex and nuanced behaviors. \n\n\n### Policy Visualization\nHASARD enables overlaying a heatmap of the agent's most frequently visited locations providing further insights into its policy and behavior within the environment.\nThese examples show how an agent navigates Volcanic Venture, Remedy Rush, and Armament Burden during the course of training:\n\n<p align=\"center\">\n</p>\n\n### Augmented Observations\nHASARD supports augmented observation modes for further visual analysis. By utilizing privileged game state information, \nit can generate simplified observation representations, such as segmenting objects in the scene or rendering the \nenvironment displaying only depth from surroundings.\n\n<p align=\"center\">\n</p>\n\n## Installation\nHASARD supports modular installation to install only the dependencies you need:\n\n```bash\n# Core dependencies only (environments and basic functionality)\npip install HASARD\n\n# With sample-factory support for training RL agents\npip install HASARD[sample-factory]\n\n# With results analysis and plotting tools\npip install HASARD[results]\n\n# Full installation with all optional dependencies\npip install HASARD[sample-factory,results]\n```\n\nTo install from source:\n```bash\ngit clone https://github.com/TTomilin/HASARD\ncd HASARD\npip install .  # or pip install .[sample-factory,results] for extras\n```\n\n## Getting Started\nTo get started with HASARD, here's a minimal example of running a task environment.\nThis script can also be found in [`run_env.py`](hasard/examples/run_env.py):\n\n```python\nimport hasard\n\nenv = hasard.make('RemedyRushLevel1-v0')\nenv.reset()\nterminated = truncated = False\nsteps = total_cost = total_reward = 0\nwhile not (terminated or truncated):\n    action = env.action_space.sample()\n    state, reward, terminated, truncated, info = env.step(action)\n    env.render()\n    steps += 1\n    total_cost += info['cost']\n    total_reward += reward\nprint(f\"Episode finished in {steps} steps. Reward: {total_reward:.2f}. Cost: {total_cost:.2f}\")\nenv.close()\n```\n\n## Training\nFor highly parallelized training of Safe RL agents on HASARD environments, and to reproduce the results from the paper, \nrefer to [`sample_factory`](sample_factory/) for detailed usage instructions and examples.\n\n# Acknowledgements\nHASARD environments are built on top of the [ViZDoom](https://github.com/mwydmuch/ViZDoom) platform.  \nOur Safe RL baseline methods are implemented in [Sample-Factory](https://github.com/alex-petrenko/sample-factory).  \nOur experiments were managed using [WandB](https://wandb.ai).\n\n# Citation\nIf you use our work in your research, please cite it as follows:\n```\n@inproceedings{tomilin2025hasard,\n    title={HASARD: A Benchmark for Vision-Based Safe Reinforcement Learning in Embodied Agents},\n    author={T. Tomilin, M. Fang, and M. Pechenizkiy},\n    booktitle={The Thirteenth International Conference on Learning Representations},\n    year={2025}\n}\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Egocentric 3D Safe Reinforcement Learning Benchmark",
    "version": "0.2.0",
    "project_urls": {
        "Homepage": "https://github.com/TTomilin/HASARD"
    },
    "split_keywords": [
        "safe rl",
        " reinforcement learning",
        " vizdoom",
        " benchmark",
        " safety"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0021004aef74b8b27beb66803c1e41de7d4a1cab85bb10777922fa3a2f98c569",
                "md5": "e9a7a3d05b35d05b19fff56db6912d9e",
                "sha256": "5f909ce0af8faa49cdb4fd10f58a7a047fa7889155bf85e3c19133e5c9aa22ea"
            },
            "downloads": -1,
            "filename": "HASARD-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e9a7a3d05b35d05b19fff56db6912d9e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 88786108,
            "upload_time": "2025-08-26T21:39:00",
            "upload_time_iso_8601": "2025-08-26T21:39:00.384239Z",
            "url": "https://files.pythonhosted.org/packages/00/21/004aef74b8b27beb66803c1e41de7d4a1cab85bb10777922fa3a2f98c569/HASARD-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a70c1507bd1bc482227a0079d6d398de03f186fdea364513de94d0a77413b73f",
                "md5": "bfe1a038976c49088cc777ab5b397dc3",
                "sha256": "df698fa83370e72867a82084a481d998dac4007af8a8d4c39b7697e7a1382852"
            },
            "downloads": -1,
            "filename": "HASARD-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "bfe1a038976c49088cc777ab5b397dc3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 88395636,
            "upload_time": "2025-08-26T21:39:27",
            "upload_time_iso_8601": "2025-08-26T21:39:27.390438Z",
            "url": "https://files.pythonhosted.org/packages/a7/0c/1507bd1bc482227a0079d6d398de03f186fdea364513de94d0a77413b73f/HASARD-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-26 21:39:27",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "TTomilin",
    "github_project": "HASARD",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "hasard"
}

Tristan Tomilin