# <img src="media/HYDRA_icon_minimal.png" width="20"> HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
<div align="center">
<img src="media/Frame.png">
<p></p>
</div>
<div align="center">
<a href="https://github.com/ControlNet/HYDRA/issues">
<img src="https://img.shields.io/github/issues/ControlNet/HYDRA?style=flat-square">
</a>
<a href="https://github.com/ControlNet/HYDRA/network/members">
<img src="https://img.shields.io/github/forks/ControlNet/HYDRA?style=flat-square">
</a>
<a href="https://github.com/ControlNet/HYDRA/stargazers">
<img src="https://img.shields.io/github/stars/ControlNet/HYDRA?style=flat-square">
</a>
<a href="https://github.com/ControlNet/HYDRA/blob/master/LICENSE">
<img src="https://img.shields.io/github/license/ControlNet/HYDRA?style=flat-square">
</a>
<a href="https://arxiv.org/abs/2403.12884">
<img src="https://img.shields.io/badge/arXiv-2403.12884-b31b1b.svg?style=flat-square">
</a>
</div>
**This is the code for the paper [HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning](https://arxiv.org/abs/2403.12884), accepted by ECCV 2024 \[[Project Page](https://hydra-vl4ai.github.io)\].**
## Release
- [2024/07/29] 🔥 **HYDRA** is open sourced in GitHub.
## TODOs
We realize that `gpt-3.5-turbo-0613` is deprecated, and `gpt-3.5` will be replaced by `gpt-4o-mini`. We will release another version of HYDRA.
>As of July 2024, `gpt-4o-mini` should be used in place of `gpt-3.5-turbo`, as it is cheaper, more capable, multimodal, and just as fast [Openai API Page](https://platform.openai.com/docs/models/gpt-3-5-turbo).
We also notice the embedding model is updated by OpenAI as shown in this [link](https://openai.com/index/new-embedding-models-and-api-updates/). Due to the uncertainty of the embedding model updates from OpenAI, we suggest you train a new version of the RL controller yourself and update the RL models.
- [x] GPT-4o-mini replacement.
- [x] LLaMA3.1 (ollama) replacement.
- [ ] Gradio Demo
- [ ] GPT-4o Version.
- [ ] HYDRA with RL
## Installation
### Requirements
- Python >= 3.10
- conda
Please follow the instructions below to install the required packages and set up the environment.
### 1. Clone this repository.
```Bash
git clone https://github.com/ControlNet/HYDRA
```
### 2. Setup conda environment and install dependencies.
```Bash
bash -i build_env.sh
```
If you meet errors, please consider going through the `build_env.sh` file and install the packages manually.
### 3. Configure the environments
Edit the file `.env` or setup in CLI to configure the environment variables.
```
OPENAI_API_KEY=your-api-key
OLLAMA_HOST=http://ollama.server:11434
# do not change this TORCH_HOME variable
TORCH_HOME=./pretrained_models
```
### 4. Download the pretrained models
Run the scripts to download the pretrained models to the `./pretrained_models` directory.
```Bash
python -m hydra_vl4ai.download_models --base_config <EXP-CONFIG-DIR> --model_config <MODEL-CONFIG-PATH>
```
For example,
```Bash
python -m hydra_vl4ai.download_models --base_config ./config/okvqa.yaml --model_config ./configs/model_config_1gpu.yaml
```
## Inference
A worker is required to run the inference.
```Bash
python -m hydra_vl4ai.executor --base_config <EXP-CONFIG-DIR> --model_config <MODEL-CONFIG-PATH>
```
### Inference with given one image and prompt
```Bash
python demo_cli.py \
--image <IMAGE_PATH> \
--prompt <PROMPT> \
--base_config <YOUR-CONFIG-DIR> \
--model_config <MODEL-PATH>
```
### Inference with Gradio GUI
TODO.
### Inference dataset
```Bash
python main.py \
--data_root <YOUR-DATA-ROOT> \
--base_config <YOUR-CONFIG-DIR> \
--model_config <MODEL-PATH>
```
Then the inference results are saved in the `./result` directory for evaluation.
## Evaluation
```Bash
python evaluate.py <RESULT_JSON_PATH> <DATASET_NAME>
```
For example,
```Bash
python evaluate.py result/result_okvqa.jsonl okvqa
```
## Citation
```bibtex
@inproceedings{ke2024hydra,
title={HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning},
author={Fucai Ke and Zhixi Cai and Simindokht Jahangard and Weiqing Wang and Pari Delir Haghighi and Hamid Rezatofighi},
booktitle={European Conference on Computer Vision},
year={2024},
organization={Springer}
}
```
## Acknowledgements
Some code and prompts are based on [cvlab-columbia/viper](https://github.com/cvlab-columbia/viper).
Raw data
{
"_id": null,
"home_page": "https://hydra-vl4ai.github.io/",
"name": "hydra-vl4ai",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "deep learning, pytorch, AI",
"author": "ControlNet",
"author_email": "smczx@hotmail.com",
"download_url": "https://files.pythonhosted.org/packages/f3/41/2f696e5bee28836179aaf8e547244ad04234211499cdb8eaca6cd2adcfed/hydra_vl4ai-0.0.0.tar.gz",
"platform": null,
"description": "# <img src=\"media/HYDRA_icon_minimal.png\" width=\"20\"> HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning\n\n<div align=\"center\">\n <img src=\"media/Frame.png\">\n <p></p>\n</div>\n\n\n<div align=\"center\">\n <a href=\"https://github.com/ControlNet/HYDRA/issues\">\n <img src=\"https://img.shields.io/github/issues/ControlNet/HYDRA?style=flat-square\">\n </a>\n <a href=\"https://github.com/ControlNet/HYDRA/network/members\">\n <img src=\"https://img.shields.io/github/forks/ControlNet/HYDRA?style=flat-square\">\n </a>\n <a href=\"https://github.com/ControlNet/HYDRA/stargazers\">\n <img src=\"https://img.shields.io/github/stars/ControlNet/HYDRA?style=flat-square\">\n </a>\n <a href=\"https://github.com/ControlNet/HYDRA/blob/master/LICENSE\">\n <img src=\"https://img.shields.io/github/license/ControlNet/HYDRA?style=flat-square\">\n </a>\n <a href=\"https://arxiv.org/abs/2403.12884\">\n <img src=\"https://img.shields.io/badge/arXiv-2403.12884-b31b1b.svg?style=flat-square\">\n </a>\n</div>\n\n**This is the code for the paper [HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning](https://arxiv.org/abs/2403.12884), accepted by ECCV 2024 \\[[Project Page](https://hydra-vl4ai.github.io)\\].**\n\n## Release\n\n- [2024/07/29] \ud83d\udd25 **HYDRA** is open sourced in GitHub.\n\n## TODOs\nWe realize that `gpt-3.5-turbo-0613` is deprecated, and `gpt-3.5` will be replaced by `gpt-4o-mini`. We will release another version of HYDRA.\n>As of July 2024, `gpt-4o-mini` should be used in place of `gpt-3.5-turbo`, as it is cheaper, more capable, multimodal, and just as fast [Openai API Page](https://platform.openai.com/docs/models/gpt-3-5-turbo).\n\nWe also notice the embedding model is updated by OpenAI as shown in this [link](https://openai.com/index/new-embedding-models-and-api-updates/). Due to the uncertainty of the embedding model updates from OpenAI, we suggest you train a new version of the RL controller yourself and update the RL models.\n- [x] GPT-4o-mini replacement.\n- [x] LLaMA3.1 (ollama) replacement.\n- [ ] Gradio Demo\n- [ ] GPT-4o Version.\n- [ ] HYDRA with RL\n\n\n## Installation\n\n### Requirements\n\n- Python >= 3.10\n- conda\n\nPlease follow the instructions below to install the required packages and set up the environment.\n\n### 1. Clone this repository.\n```Bash\ngit clone https://github.com/ControlNet/HYDRA\n```\n\n### 2. Setup conda environment and install dependencies. \n```Bash\nbash -i build_env.sh\n```\n\nIf you meet errors, please consider going through the `build_env.sh` file and install the packages manually.\n\n### 3. Configure the environments\n\nEdit the file `.env` or setup in CLI to configure the environment variables.\n\n```\nOPENAI_API_KEY=your-api-key\nOLLAMA_HOST=http://ollama.server:11434\n# do not change this TORCH_HOME variable\nTORCH_HOME=./pretrained_models\n```\n\n### 4. Download the pretrained models\nRun the scripts to download the pretrained models to the `./pretrained_models` directory. \n\n```Bash\npython -m hydra_vl4ai.download_models --base_config <EXP-CONFIG-DIR> --model_config <MODEL-CONFIG-PATH>\n```\n\nFor example,\n```Bash\npython -m hydra_vl4ai.download_models --base_config ./config/okvqa.yaml --model_config ./configs/model_config_1gpu.yaml\n```\n\n## Inference\nA worker is required to run the inference. \n\n```Bash\npython -m hydra_vl4ai.executor --base_config <EXP-CONFIG-DIR> --model_config <MODEL-CONFIG-PATH>\n```\n\n### Inference with given one image and prompt\n```Bash\npython demo_cli.py \\\n --image <IMAGE_PATH> \\\n --prompt <PROMPT> \\\n --base_config <YOUR-CONFIG-DIR> \\\n --model_config <MODEL-PATH>\n```\n\n### Inference with Gradio GUI\nTODO.\n\n### Inference dataset\n\n```Bash\npython main.py \\\n --data_root <YOUR-DATA-ROOT> \\\n --base_config <YOUR-CONFIG-DIR> \\\n --model_config <MODEL-PATH>\n```\n\nThen the inference results are saved in the `./result` directory for evaluation.\n\n## Evaluation\n\n```Bash\npython evaluate.py <RESULT_JSON_PATH> <DATASET_NAME>\n```\n\nFor example,\n\n```Bash\npython evaluate.py result/result_okvqa.jsonl okvqa\n```\n\n\n## Citation\n```bibtex\n@inproceedings{ke2024hydra,\n title={HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning},\n author={Fucai Ke and Zhixi Cai and Simindokht Jahangard and Weiqing Wang and Pari Delir Haghighi and Hamid Rezatofighi},\n booktitle={European Conference on Computer Vision},\n year={2024},\n organization={Springer}\n}\n```\n\n## Acknowledgements\n\nSome code and prompts are based on [cvlab-columbia/viper](https://github.com/cvlab-columbia/viper).\n",
"bugtrack_url": null,
"license": null,
"summary": "Official implementation for HYDRA.",
"version": "0.0.0",
"project_urls": {
"Bug Tracker": "https://github.com/ControlNet/HYDRA/issues",
"Homepage": "https://hydra-vl4ai.github.io/",
"Source Code": "https://github.com/ControlNet/HYDRA"
},
"split_keywords": [
"deep learning",
" pytorch",
" ai"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "ad63d708b353c8985b4039c40f9f8415ca6dfd35635367e64926a5e11dfdea46",
"md5": "0190f33b6925519ca1f2122f6eed49aa",
"sha256": "b239d0a7b44b6da2a1d01b8aff0e443da909aaa5384f2e7ed52c441d38bb7ff0"
},
"downloads": -1,
"filename": "hydra_vl4ai-0.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0190f33b6925519ca1f2122f6eed49aa",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 138164,
"upload_time": "2024-08-05T08:10:13",
"upload_time_iso_8601": "2024-08-05T08:10:13.465852Z",
"url": "https://files.pythonhosted.org/packages/ad/63/d708b353c8985b4039c40f9f8415ca6dfd35635367e64926a5e11dfdea46/hydra_vl4ai-0.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f3412f696e5bee28836179aaf8e547244ad04234211499cdb8eaca6cd2adcfed",
"md5": "51c1d9ca8db448fca85a1e46493961b6",
"sha256": "a5e0f4182aaf850e2bccbcdfff476aedd35a8654e3f5878a05b67758c2dd9b61"
},
"downloads": -1,
"filename": "hydra_vl4ai-0.0.0.tar.gz",
"has_sig": false,
"md5_digest": "51c1d9ca8db448fca85a1e46493961b6",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 103823,
"upload_time": "2024-08-05T08:10:15",
"upload_time_iso_8601": "2024-08-05T08:10:15.546593Z",
"url": "https://files.pythonhosted.org/packages/f3/41/2f696e5bee28836179aaf8e547244ad04234211499cdb8eaca6cd2adcfed/hydra_vl4ai-0.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-05 08:10:15",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ControlNet",
"github_project": "HYDRA",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [],
"lcname": "hydra-vl4ai"
}