# <img src="media/HYDRA_icon_minimal.png" width="20"> HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
<div align="center">
<img src="media/Frame.png">
<p></p>
</div>
<div align="center">
<a href="https://github.com/ControlNet/HYDRA/issues">
<img src="https://img.shields.io/github/issues/ControlNet/HYDRA?style=flat-square">
</a>
<a href="https://github.com/ControlNet/HYDRA/network/members">
<img src="https://img.shields.io/github/forks/ControlNet/HYDRA?style=flat-square">
</a>
<a href="https://github.com/ControlNet/HYDRA/stargazers">
<img src="https://img.shields.io/github/stars/ControlNet/HYDRA?style=flat-square">
</a>
<a href="https://github.com/ControlNet/HYDRA/blob/master/LICENSE">
<img src="https://img.shields.io/github/license/ControlNet/HYDRA?style=flat-square">
</a>
<a href="https://arxiv.org/abs/2403.12884">
<img src="https://img.shields.io/badge/arXiv-2403.12884-b31b1b.svg?style=flat-square">
</a>
</div>
<div align="center">
<a href="https://pypi.org/project/hydra-vl4ai/">
<img src="https://img.shields.io/pypi/v/hydra-vl4ai?style=flat-square">
</a>
<a href="https://pypi.org/project/hydra-vl4ai/">
<img src="https://img.shields.io/pypi/dm/hydra-vl4ai?style=flat-square">
</a>
<a href="https://www.python.org/"><img src="https://img.shields.io/pypi/pyversions/hydra-vl4ai?style=flat-square"></a>
</div>
**This is the code for the paper [HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning](https://link.springer.com/chapter/10.1007/978-3-031-72661-3_8), accepted by ECCV 2024 \[[Project Page](https://hydra-vl4ai.github.io)\]. We released the code that <span style="color: #FF6347; font-weight: bold;">uses Reinforcement Learning (DQN) to fine-tune the LLM</span>🔥🔥🔥**
## Release
- [2025/02/11] 🤖 HYDRA with RL is released.
- [2024/08/05] 🚀 [PYPI package](https://pypi.org/project/hydra-vl4ai/) is released.
- [2024/07/29] 🔥 **HYDRA** is open sourced in GitHub.
## TODOs
We realize that `gpt-3.5-turbo-0613` is deprecated, and `gpt-3.5` will be replaced by `gpt-4o-mini`. We will release another version of HYDRA.
>As of July 2024, `gpt-4o-mini` should be used in place of `gpt-3.5-turbo`, as it is cheaper, more capable, multimodal, and just as fast [Openai API Page](https://platform.openai.com/docs/models/gpt-3-5-turbo).
We also notice the embedding model is updated by OpenAI as shown in this [link](https://openai.com/index/new-embedding-models-and-api-updates/). Due to the uncertainty of the embedding model updates from OpenAI, we suggest you train a new version of the RL controller yourself and update the RL models.
- [x] GPT-4o-mini replacement.
- [x] LLaMA3.1 (ollama) replacement.
- [x] Gradio Demo
- [x] GPT-4o Version.
- [x] HYDRA with RL(DQN).
- [ ] HYDRA with Deepseek R1.
https://github.com/user-attachments/assets/39a897ab-d457-49d2-8527-0d6fe3a3b922
## Installation
### Requirements
- Python >= 3.10
- conda
Please follow the instructions below to install the required packages and set up the environment.
### 1. Clone this repository.
```Bash
git clone https://github.com/ControlNet/HYDRA
```
### 2. Setup conda environment and install dependencies.
Option 1: Using [pixi](https://prefix.dev/) (recommended):
```Bash
pixi install
pixi shell
```
Option 2: Building from source:
```Bash
bash -i build_env.sh
```
If you meet errors, please consider going through the `build_env.sh` file and install the packages manually.
### 3. Configure the environments
Edit the file `.env` or setup in CLI to configure the environment variables.
```
OPENAI_API_KEY=your-api-key # if you want to use OpenAI LLMs
OLLAMA_HOST=http://ollama.server:11434 # if you want to use your OLLaMA server for llama or deepseek
# do not change this TORCH_HOME variable
TORCH_HOME=./pretrained_models
```
### 4. Download the pretrained models
Run the scripts to download the pretrained models to the `./pretrained_models` directory.
```Bash
python -m hydra_vl4ai.download_model --base_config <EXP-CONFIG-DIR> --model_config <MODEL-CONFIG-PATH>
```
For example,
```Bash
python -m hydra_vl4ai.download_model --base_config ./config/okvqa.yaml --model_config ./config/model_config_1gpu.yaml
```
## Inference
A worker is required to run the inference.
```Bash
python -m hydra_vl4ai.executor --base_config <EXP-CONFIG-DIR> --model_config <MODEL-CONFIG-PATH>
```
### Inference with given one image and prompt
```Bash
python demo_cli.py \
--image <IMAGE_PATH> \
--prompt <PROMPT> \
--base_config <YOUR-CONFIG-DIR> \
--model_config <MODEL-PATH>
```
### Inference with Gradio GUI
```Bash
python demo_gradio.py \
--base_config <YOUR-CONFIG-DIR> \
--model_config <MODEL-PATH>
```
---
### Inference dataset
```Bash
python main.py \
--data_root <YOUR-DATA-ROOT> \
--base_config <YOUR-CONFIG-DIR> \
--model_config <MODEL-PATH>
```
Then the inference results are saved in the `./result` directory for evaluation.
## Evaluation
```Bash
python evaluate.py <RESULT_JSON_PATH> <DATASET_NAME>
```
For example,
```Bash
python evaluate.py result/result_okvqa.jsonl okvqa
```
## Training Controller with RL(DQN)
```Bash
python train.py \
--data_root <IMAGE_PATH> \
--base_config <YOUR-CONFIG-DIR>\
--model_config <MODEL-PATH> \
--dqn_config <YOUR-DQN-CONFIG-DIR>
```
For example,
```Bash
python train.py \
--data_root ../coco2014 \
--base_config ./config/okvqa.yaml\
--model_config ./config/model_config_1gpu.yaml \
--dqn_config ./config/dqn_debug.yaml
```
## Citation
```bibtex
@inproceedings{ke2024hydra,
title={HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning},
author={Ke, Fucai and Cai, Zhixi and Jahangard, Simindokht and Wang, Weiqing and Haghighi, Pari Delir and Rezatofighi, Hamid},
booktitle={European Conference on Computer Vision},
year={2024},
organization={Springer},
doi={10.1007/978-3-031-72661-3_8},
isbn={978-3-031-72661-3},
pages={132--149},
}
```
## Acknowledgements
Some code and prompts are based on [cvlab-columbia/viper](https://github.com/cvlab-columbia/viper).
Raw data
{
"_id": null,
"home_page": "https://hydra-vl4ai.github.io/",
"name": "hydra-vl4ai",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "deep learning, pytorch, AI",
"author": "ControlNet",
"author_email": "smczx@hotmail.com",
"download_url": "https://files.pythonhosted.org/packages/b3/c3/2bfc11664c0e39a72c17ff4bd32e3ae3a7d55d1b26186937f4f6daf1729a/hydra_vl4ai-0.0.6.tar.gz",
"platform": null,
"description": "# <img src=\"media/HYDRA_icon_minimal.png\" width=\"20\"> HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning\n\n<div align=\"center\">\n <img src=\"media/Frame.png\">\n <p></p>\n</div>\n\n\n<div align=\"center\">\n <a href=\"https://github.com/ControlNet/HYDRA/issues\">\n <img src=\"https://img.shields.io/github/issues/ControlNet/HYDRA?style=flat-square\">\n </a>\n <a href=\"https://github.com/ControlNet/HYDRA/network/members\">\n <img src=\"https://img.shields.io/github/forks/ControlNet/HYDRA?style=flat-square\">\n </a>\n <a href=\"https://github.com/ControlNet/HYDRA/stargazers\">\n <img src=\"https://img.shields.io/github/stars/ControlNet/HYDRA?style=flat-square\">\n </a>\n <a href=\"https://github.com/ControlNet/HYDRA/blob/master/LICENSE\">\n <img src=\"https://img.shields.io/github/license/ControlNet/HYDRA?style=flat-square\">\n </a>\n <a href=\"https://arxiv.org/abs/2403.12884\">\n <img src=\"https://img.shields.io/badge/arXiv-2403.12884-b31b1b.svg?style=flat-square\">\n </a>\n</div>\n\n<div align=\"center\"> \n <a href=\"https://pypi.org/project/hydra-vl4ai/\">\n <img src=\"https://img.shields.io/pypi/v/hydra-vl4ai?style=flat-square\">\n </a>\n <a href=\"https://pypi.org/project/hydra-vl4ai/\">\n <img src=\"https://img.shields.io/pypi/dm/hydra-vl4ai?style=flat-square\">\n </a>\n <a href=\"https://www.python.org/\"><img src=\"https://img.shields.io/pypi/pyversions/hydra-vl4ai?style=flat-square\"></a>\n</div>\n\n**This is the code for the paper [HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning](https://link.springer.com/chapter/10.1007/978-3-031-72661-3_8), accepted by ECCV 2024 \\[[Project Page](https://hydra-vl4ai.github.io)\\]. We released the code that <span style=\"color: #FF6347; font-weight: bold;\">uses Reinforcement Learning (DQN) to fine-tune the LLM</span>\ud83d\udd25\ud83d\udd25\ud83d\udd25**\n\n## Release\n\n- [2025/02/11] \ud83e\udd16 HYDRA with RL is released.\n- [2024/08/05] \ud83d\ude80 [PYPI package](https://pypi.org/project/hydra-vl4ai/) is released.\n- [2024/07/29] \ud83d\udd25 **HYDRA** is open sourced in GitHub.\n\n## TODOs\nWe realize that `gpt-3.5-turbo-0613` is deprecated, and `gpt-3.5` will be replaced by `gpt-4o-mini`. We will release another version of HYDRA.\n>As of July 2024, `gpt-4o-mini` should be used in place of `gpt-3.5-turbo`, as it is cheaper, more capable, multimodal, and just as fast [Openai API Page](https://platform.openai.com/docs/models/gpt-3-5-turbo).\n\nWe also notice the embedding model is updated by OpenAI as shown in this [link](https://openai.com/index/new-embedding-models-and-api-updates/). Due to the uncertainty of the embedding model updates from OpenAI, we suggest you train a new version of the RL controller yourself and update the RL models.\n- [x] GPT-4o-mini replacement.\n- [x] LLaMA3.1 (ollama) replacement.\n- [x] Gradio Demo\n- [x] GPT-4o Version.\n- [x] HYDRA with RL(DQN).\n- [ ] HYDRA with Deepseek R1.\n\nhttps://github.com/user-attachments/assets/39a897ab-d457-49d2-8527-0d6fe3a3b922\n\n## Installation\n\n### Requirements\n\n- Python >= 3.10\n- conda\n\nPlease follow the instructions below to install the required packages and set up the environment.\n\n### 1. Clone this repository.\n```Bash\ngit clone https://github.com/ControlNet/HYDRA\n```\n\n### 2. Setup conda environment and install dependencies. \n\nOption 1: Using [pixi](https://prefix.dev/) (recommended):\n```Bash\npixi install\npixi shell\n```\n\nOption 2: Building from source:\n```Bash\nbash -i build_env.sh\n```\nIf you meet errors, please consider going through the `build_env.sh` file and install the packages manually.\n\n### 3. Configure the environments\n\nEdit the file `.env` or setup in CLI to configure the environment variables.\n\n```\nOPENAI_API_KEY=your-api-key # if you want to use OpenAI LLMs\nOLLAMA_HOST=http://ollama.server:11434 # if you want to use your OLLaMA server for llama or deepseek\n# do not change this TORCH_HOME variable\nTORCH_HOME=./pretrained_models\n```\n\n### 4. Download the pretrained models\nRun the scripts to download the pretrained models to the `./pretrained_models` directory. \n\n```Bash\npython -m hydra_vl4ai.download_model --base_config <EXP-CONFIG-DIR> --model_config <MODEL-CONFIG-PATH>\n```\n\nFor example,\n```Bash\npython -m hydra_vl4ai.download_model --base_config ./config/okvqa.yaml --model_config ./config/model_config_1gpu.yaml\n```\n\n## Inference\nA worker is required to run the inference. \n\n```Bash\npython -m hydra_vl4ai.executor --base_config <EXP-CONFIG-DIR> --model_config <MODEL-CONFIG-PATH>\n```\n\n### Inference with given one image and prompt\n```Bash\npython demo_cli.py \\\n --image <IMAGE_PATH> \\\n --prompt <PROMPT> \\\n --base_config <YOUR-CONFIG-DIR> \\\n --model_config <MODEL-PATH>\n```\n\n### Inference with Gradio GUI\n```Bash\npython demo_gradio.py \\\n --base_config <YOUR-CONFIG-DIR> \\\n --model_config <MODEL-PATH>\n```\n\n---\n### Inference dataset\n\n```Bash\npython main.py \\\n --data_root <YOUR-DATA-ROOT> \\\n --base_config <YOUR-CONFIG-DIR> \\\n --model_config <MODEL-PATH>\n```\n\nThen the inference results are saved in the `./result` directory for evaluation.\n\n## Evaluation\n\n```Bash\npython evaluate.py <RESULT_JSON_PATH> <DATASET_NAME>\n```\n\nFor example,\n\n```Bash\npython evaluate.py result/result_okvqa.jsonl okvqa\n```\n\n## Training Controller with RL(DQN)\n\n```Bash\npython train.py \\\n --data_root <IMAGE_PATH> \\\n --base_config <YOUR-CONFIG-DIR>\\\n --model_config <MODEL-PATH> \\\n --dqn_config <YOUR-DQN-CONFIG-DIR>\n```\nFor example,\n```Bash\npython train.py \\\n --data_root ../coco2014 \\\n --base_config ./config/okvqa.yaml\\\n --model_config ./config/model_config_1gpu.yaml \\\n --dqn_config ./config/dqn_debug.yaml\n```\n\n## Citation\n```bibtex\n@inproceedings{ke2024hydra,\n title={HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning},\n author={Ke, Fucai and Cai, Zhixi and Jahangard, Simindokht and Wang, Weiqing and Haghighi, Pari Delir and Rezatofighi, Hamid},\n booktitle={European Conference on Computer Vision},\n year={2024},\n organization={Springer},\n doi={10.1007/978-3-031-72661-3_8},\n isbn={978-3-031-72661-3},\n pages={132--149},\n}\n```\n\n## Acknowledgements\n\nSome code and prompts are based on [cvlab-columbia/viper](https://github.com/cvlab-columbia/viper).\n",
"bugtrack_url": null,
"license": null,
"summary": "Official implementation for HYDRA.",
"version": "0.0.6",
"project_urls": {
"Bug Tracker": "https://github.com/ControlNet/HYDRA/issues",
"Homepage": "https://hydra-vl4ai.github.io/",
"Source Code": "https://github.com/ControlNet/HYDRA"
},
"split_keywords": [
"deep learning",
" pytorch",
" ai"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "fb9b32aa6cb72f74ed5fc58e3e987ee257238ee84bd98a86899376391b49a537",
"md5": "2f801c1605500760c00dacf44bf35cfc",
"sha256": "1504600819550ba3cbcd0eb3a26bf2270d25d7655a95dc42b8b642977f4fda51"
},
"downloads": -1,
"filename": "hydra_vl4ai-0.0.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "2f801c1605500760c00dacf44bf35cfc",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 186055,
"upload_time": "2025-09-06T13:52:03",
"upload_time_iso_8601": "2025-09-06T13:52:03.436074Z",
"url": "https://files.pythonhosted.org/packages/fb/9b/32aa6cb72f74ed5fc58e3e987ee257238ee84bd98a86899376391b49a537/hydra_vl4ai-0.0.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "b3c32bfc11664c0e39a72c17ff4bd32e3ae3a7d55d1b26186937f4f6daf1729a",
"md5": "f29d706f4134d11264a82bb738f63e95",
"sha256": "519001997584cf054429f6b91ec7eec62191da7cd1a60719860d1c6f3b344b28"
},
"downloads": -1,
"filename": "hydra_vl4ai-0.0.6.tar.gz",
"has_sig": false,
"md5_digest": "f29d706f4134d11264a82bb738f63e95",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 147120,
"upload_time": "2025-09-06T13:52:04",
"upload_time_iso_8601": "2025-09-06T13:52:04.837157Z",
"url": "https://files.pythonhosted.org/packages/b3/c3/2bfc11664c0e39a72c17ff4bd32e3ae3a7d55d1b26186937f4f6daf1729a/hydra_vl4ai-0.0.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-06 13:52:04",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ControlNet",
"github_project": "HYDRA",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "torch",
"specs": []
},
{
"name": "fastapi",
"specs": []
},
{
"name": "uvicorn",
"specs": []
},
{
"name": "starlette",
"specs": []
},
{
"name": "openai",
"specs": []
},
{
"name": "websockets",
"specs": []
},
{
"name": "tensorneko_util",
"specs": [
[
">=",
"0.3.21"
],
[
"<",
"0.4.0"
]
]
},
{
"name": "requests",
"specs": []
},
{
"name": "pillow",
"specs": []
},
{
"name": "word2number",
"specs": []
},
{
"name": "ollama",
"specs": []
},
{
"name": "bitsandbytes",
"specs": []
},
{
"name": "numpy",
"specs": [
[
"<",
"2"
]
]
},
{
"name": "gradio",
"specs": [
[
">=",
"5.0"
],
[
"<",
"5.12"
]
]
},
{
"name": "pydantic",
"specs": [
[
"<",
"2.11"
]
]
},
{
"name": "python-dotenv",
"specs": []
},
{
"name": "transformers",
"specs": []
},
{
"name": "opencv-python",
"specs": []
},
{
"name": "rich",
"specs": []
}
],
"lcname": "hydra-vl4ai"
}