hydra-vl4ai


Namehydra-vl4ai JSON
Version 0.0.6 PyPI version JSON
download
home_pagehttps://hydra-vl4ai.github.io/
SummaryOfficial implementation for HYDRA.
upload_time2025-09-06 13:52:04
maintainerNone
docs_urlNone
authorControlNet
requires_python>=3.10
licenseNone
keywords deep learning pytorch ai
VCS
bugtrack_url
requirements torch fastapi uvicorn starlette openai websockets tensorneko_util requests pillow word2number ollama bitsandbytes numpy gradio pydantic python-dotenv transformers opencv-python rich
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # <img src="media/HYDRA_icon_minimal.png" width="20"> HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

<div align="center">
    <img src="media/Frame.png">
    <p></p>
</div>


<div align="center">
    <a href="https://github.com/ControlNet/HYDRA/issues">
        <img src="https://img.shields.io/github/issues/ControlNet/HYDRA?style=flat-square">
    </a>
    <a href="https://github.com/ControlNet/HYDRA/network/members">
        <img src="https://img.shields.io/github/forks/ControlNet/HYDRA?style=flat-square">
    </a>
    <a href="https://github.com/ControlNet/HYDRA/stargazers">
        <img src="https://img.shields.io/github/stars/ControlNet/HYDRA?style=flat-square">
    </a>
    <a href="https://github.com/ControlNet/HYDRA/blob/master/LICENSE">
        <img src="https://img.shields.io/github/license/ControlNet/HYDRA?style=flat-square">
    </a>
    <a href="https://arxiv.org/abs/2403.12884">
        <img src="https://img.shields.io/badge/arXiv-2403.12884-b31b1b.svg?style=flat-square">
    </a>
</div>

<div align="center">    
    <a href="https://pypi.org/project/hydra-vl4ai/">
        <img src="https://img.shields.io/pypi/v/hydra-vl4ai?style=flat-square">
    </a>
    <a href="https://pypi.org/project/hydra-vl4ai/">
        <img src="https://img.shields.io/pypi/dm/hydra-vl4ai?style=flat-square">
    </a>
    <a href="https://www.python.org/"><img src="https://img.shields.io/pypi/pyversions/hydra-vl4ai?style=flat-square"></a>
</div>

**This is the code for the paper [HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning](https://link.springer.com/chapter/10.1007/978-3-031-72661-3_8), accepted by ECCV 2024 \[[Project Page](https://hydra-vl4ai.github.io)\]. We released the code that <span style="color: #FF6347; font-weight: bold;">uses Reinforcement Learning (DQN) to fine-tune the LLM</span>🔥🔥🔥**

## Release

- [2025/02/11] 🤖 HYDRA with RL is released.
- [2024/08/05] 🚀 [PYPI package](https://pypi.org/project/hydra-vl4ai/) is released.
- [2024/07/29] 🔥 **HYDRA** is open sourced in GitHub.

## TODOs
We realize that `gpt-3.5-turbo-0613` is deprecated, and `gpt-3.5` will be replaced by `gpt-4o-mini`. We will release another version of HYDRA.
>As of July 2024, `gpt-4o-mini` should be used in place of `gpt-3.5-turbo`, as it is cheaper, more capable, multimodal, and just as fast [Openai API Page](https://platform.openai.com/docs/models/gpt-3-5-turbo).

We also notice the embedding model is updated by OpenAI as shown in this [link](https://openai.com/index/new-embedding-models-and-api-updates/). Due to the uncertainty of the embedding model updates from OpenAI, we suggest you train a new version of the RL controller yourself and update the RL models.
- [x] GPT-4o-mini replacement.
- [x] LLaMA3.1 (ollama) replacement.
- [x] Gradio Demo
- [x] GPT-4o Version.
- [x] HYDRA with RL(DQN).
- [ ] HYDRA with Deepseek R1.

https://github.com/user-attachments/assets/39a897ab-d457-49d2-8527-0d6fe3a3b922

## Installation

### Requirements

- Python >= 3.10
- conda

Please follow the instructions below to install the required packages and set up the environment.

### 1. Clone this repository.
```Bash
git clone https://github.com/ControlNet/HYDRA
```

### 2. Setup conda environment and install dependencies. 

Option 1: Using [pixi](https://prefix.dev/) (recommended):
```Bash
pixi install
pixi shell
```

Option 2: Building from source:
```Bash
bash -i build_env.sh
```
If you meet errors, please consider going through the `build_env.sh` file and install the packages manually.

### 3. Configure the environments

Edit the file `.env` or setup in CLI to configure the environment variables.

```
OPENAI_API_KEY=your-api-key  # if you want to use OpenAI LLMs
OLLAMA_HOST=http://ollama.server:11434  # if you want to use your OLLaMA server for llama or deepseek
# do not change this TORCH_HOME variable
TORCH_HOME=./pretrained_models
```

### 4. Download the pretrained models
Run the scripts to download the pretrained models to the `./pretrained_models` directory. 

```Bash
python -m hydra_vl4ai.download_model --base_config <EXP-CONFIG-DIR> --model_config <MODEL-CONFIG-PATH>
```

For example,
```Bash
python -m hydra_vl4ai.download_model --base_config ./config/okvqa.yaml --model_config ./config/model_config_1gpu.yaml
```

## Inference
A worker is required to run the inference. 

```Bash
python -m hydra_vl4ai.executor --base_config <EXP-CONFIG-DIR> --model_config <MODEL-CONFIG-PATH>
```

### Inference with given one image and prompt
```Bash
python demo_cli.py \
  --image <IMAGE_PATH> \
  --prompt <PROMPT> \
  --base_config <YOUR-CONFIG-DIR> \
  --model_config <MODEL-PATH>
```

### Inference with Gradio GUI
```Bash
python demo_gradio.py \
  --base_config <YOUR-CONFIG-DIR> \
  --model_config <MODEL-PATH>
```

---
### Inference dataset

```Bash
python main.py \
  --data_root <YOUR-DATA-ROOT> \
  --base_config <YOUR-CONFIG-DIR> \
  --model_config <MODEL-PATH>
```

Then the inference results are saved in the `./result` directory for evaluation.

## Evaluation

```Bash
python evaluate.py <RESULT_JSON_PATH> <DATASET_NAME>
```

For example,

```Bash
python evaluate.py result/result_okvqa.jsonl okvqa
```

## Training Controller with RL(DQN)

```Bash
python train.py \
    --data_root <IMAGE_PATH> \
    --base_config <YOUR-CONFIG-DIR>\
    --model_config <MODEL-PATH> \
    --dqn_config <YOUR-DQN-CONFIG-DIR>
```
For example,
```Bash
python train.py \
    --data_root ../coco2014 \
    --base_config ./config/okvqa.yaml\
    --model_config ./config/model_config_1gpu.yaml \
    --dqn_config ./config/dqn_debug.yaml
```

## Citation
```bibtex
@inproceedings{ke2024hydra,
  title={HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning},
  author={Ke, Fucai and Cai, Zhixi and Jahangard, Simindokht and Wang, Weiqing and Haghighi, Pari Delir and Rezatofighi, Hamid},
  booktitle={European Conference on Computer Vision},
  year={2024},
  organization={Springer},
  doi={10.1007/978-3-031-72661-3_8},
  isbn={978-3-031-72661-3},
  pages={132--149},
}
```

## Acknowledgements

Some code and prompts are based on [cvlab-columbia/viper](https://github.com/cvlab-columbia/viper).

            

Raw data

            {
    "_id": null,
    "home_page": "https://hydra-vl4ai.github.io/",
    "name": "hydra-vl4ai",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "deep learning, pytorch, AI",
    "author": "ControlNet",
    "author_email": "smczx@hotmail.com",
    "download_url": "https://files.pythonhosted.org/packages/b3/c3/2bfc11664c0e39a72c17ff4bd32e3ae3a7d55d1b26186937f4f6daf1729a/hydra_vl4ai-0.0.6.tar.gz",
    "platform": null,
    "description": "# <img src=\"media/HYDRA_icon_minimal.png\" width=\"20\"> HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning\n\n<div align=\"center\">\n    <img src=\"media/Frame.png\">\n    <p></p>\n</div>\n\n\n<div align=\"center\">\n    <a href=\"https://github.com/ControlNet/HYDRA/issues\">\n        <img src=\"https://img.shields.io/github/issues/ControlNet/HYDRA?style=flat-square\">\n    </a>\n    <a href=\"https://github.com/ControlNet/HYDRA/network/members\">\n        <img src=\"https://img.shields.io/github/forks/ControlNet/HYDRA?style=flat-square\">\n    </a>\n    <a href=\"https://github.com/ControlNet/HYDRA/stargazers\">\n        <img src=\"https://img.shields.io/github/stars/ControlNet/HYDRA?style=flat-square\">\n    </a>\n    <a href=\"https://github.com/ControlNet/HYDRA/blob/master/LICENSE\">\n        <img src=\"https://img.shields.io/github/license/ControlNet/HYDRA?style=flat-square\">\n    </a>\n    <a href=\"https://arxiv.org/abs/2403.12884\">\n        <img src=\"https://img.shields.io/badge/arXiv-2403.12884-b31b1b.svg?style=flat-square\">\n    </a>\n</div>\n\n<div align=\"center\">    \n    <a href=\"https://pypi.org/project/hydra-vl4ai/\">\n        <img src=\"https://img.shields.io/pypi/v/hydra-vl4ai?style=flat-square\">\n    </a>\n    <a href=\"https://pypi.org/project/hydra-vl4ai/\">\n        <img src=\"https://img.shields.io/pypi/dm/hydra-vl4ai?style=flat-square\">\n    </a>\n    <a href=\"https://www.python.org/\"><img src=\"https://img.shields.io/pypi/pyversions/hydra-vl4ai?style=flat-square\"></a>\n</div>\n\n**This is the code for the paper [HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning](https://link.springer.com/chapter/10.1007/978-3-031-72661-3_8), accepted by ECCV 2024 \\[[Project Page](https://hydra-vl4ai.github.io)\\]. We released the code that <span style=\"color: #FF6347; font-weight: bold;\">uses Reinforcement Learning (DQN) to fine-tune the LLM</span>\ud83d\udd25\ud83d\udd25\ud83d\udd25**\n\n## Release\n\n- [2025/02/11] \ud83e\udd16 HYDRA with RL is released.\n- [2024/08/05] \ud83d\ude80 [PYPI package](https://pypi.org/project/hydra-vl4ai/) is released.\n- [2024/07/29] \ud83d\udd25 **HYDRA** is open sourced in GitHub.\n\n## TODOs\nWe realize that `gpt-3.5-turbo-0613` is deprecated, and `gpt-3.5` will be replaced by `gpt-4o-mini`. We will release another version of HYDRA.\n>As of July 2024, `gpt-4o-mini` should be used in place of `gpt-3.5-turbo`, as it is cheaper, more capable, multimodal, and just as fast [Openai API Page](https://platform.openai.com/docs/models/gpt-3-5-turbo).\n\nWe also notice the embedding model is updated by OpenAI as shown in this [link](https://openai.com/index/new-embedding-models-and-api-updates/). Due to the uncertainty of the embedding model updates from OpenAI, we suggest you train a new version of the RL controller yourself and update the RL models.\n- [x] GPT-4o-mini replacement.\n- [x] LLaMA3.1 (ollama) replacement.\n- [x] Gradio Demo\n- [x] GPT-4o Version.\n- [x] HYDRA with RL(DQN).\n- [ ] HYDRA with Deepseek R1.\n\nhttps://github.com/user-attachments/assets/39a897ab-d457-49d2-8527-0d6fe3a3b922\n\n## Installation\n\n### Requirements\n\n- Python >= 3.10\n- conda\n\nPlease follow the instructions below to install the required packages and set up the environment.\n\n### 1. Clone this repository.\n```Bash\ngit clone https://github.com/ControlNet/HYDRA\n```\n\n### 2. Setup conda environment and install dependencies. \n\nOption 1: Using [pixi](https://prefix.dev/) (recommended):\n```Bash\npixi install\npixi shell\n```\n\nOption 2: Building from source:\n```Bash\nbash -i build_env.sh\n```\nIf you meet errors, please consider going through the `build_env.sh` file and install the packages manually.\n\n### 3. Configure the environments\n\nEdit the file `.env` or setup in CLI to configure the environment variables.\n\n```\nOPENAI_API_KEY=your-api-key  # if you want to use OpenAI LLMs\nOLLAMA_HOST=http://ollama.server:11434  # if you want to use your OLLaMA server for llama or deepseek\n# do not change this TORCH_HOME variable\nTORCH_HOME=./pretrained_models\n```\n\n### 4. Download the pretrained models\nRun the scripts to download the pretrained models to the `./pretrained_models` directory. \n\n```Bash\npython -m hydra_vl4ai.download_model --base_config <EXP-CONFIG-DIR> --model_config <MODEL-CONFIG-PATH>\n```\n\nFor example,\n```Bash\npython -m hydra_vl4ai.download_model --base_config ./config/okvqa.yaml --model_config ./config/model_config_1gpu.yaml\n```\n\n## Inference\nA worker is required to run the inference. \n\n```Bash\npython -m hydra_vl4ai.executor --base_config <EXP-CONFIG-DIR> --model_config <MODEL-CONFIG-PATH>\n```\n\n### Inference with given one image and prompt\n```Bash\npython demo_cli.py \\\n  --image <IMAGE_PATH> \\\n  --prompt <PROMPT> \\\n  --base_config <YOUR-CONFIG-DIR> \\\n  --model_config <MODEL-PATH>\n```\n\n### Inference with Gradio GUI\n```Bash\npython demo_gradio.py \\\n  --base_config <YOUR-CONFIG-DIR> \\\n  --model_config <MODEL-PATH>\n```\n\n---\n### Inference dataset\n\n```Bash\npython main.py \\\n  --data_root <YOUR-DATA-ROOT> \\\n  --base_config <YOUR-CONFIG-DIR> \\\n  --model_config <MODEL-PATH>\n```\n\nThen the inference results are saved in the `./result` directory for evaluation.\n\n## Evaluation\n\n```Bash\npython evaluate.py <RESULT_JSON_PATH> <DATASET_NAME>\n```\n\nFor example,\n\n```Bash\npython evaluate.py result/result_okvqa.jsonl okvqa\n```\n\n## Training Controller with RL(DQN)\n\n```Bash\npython train.py \\\n    --data_root <IMAGE_PATH> \\\n    --base_config <YOUR-CONFIG-DIR>\\\n    --model_config <MODEL-PATH> \\\n    --dqn_config <YOUR-DQN-CONFIG-DIR>\n```\nFor example,\n```Bash\npython train.py \\\n    --data_root ../coco2014 \\\n    --base_config ./config/okvqa.yaml\\\n    --model_config ./config/model_config_1gpu.yaml \\\n    --dqn_config ./config/dqn_debug.yaml\n```\n\n## Citation\n```bibtex\n@inproceedings{ke2024hydra,\n  title={HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning},\n  author={Ke, Fucai and Cai, Zhixi and Jahangard, Simindokht and Wang, Weiqing and Haghighi, Pari Delir and Rezatofighi, Hamid},\n  booktitle={European Conference on Computer Vision},\n  year={2024},\n  organization={Springer},\n  doi={10.1007/978-3-031-72661-3_8},\n  isbn={978-3-031-72661-3},\n  pages={132--149},\n}\n```\n\n## Acknowledgements\n\nSome code and prompts are based on [cvlab-columbia/viper](https://github.com/cvlab-columbia/viper).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Official implementation for HYDRA.",
    "version": "0.0.6",
    "project_urls": {
        "Bug Tracker": "https://github.com/ControlNet/HYDRA/issues",
        "Homepage": "https://hydra-vl4ai.github.io/",
        "Source Code": "https://github.com/ControlNet/HYDRA"
    },
    "split_keywords": [
        "deep learning",
        " pytorch",
        " ai"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "fb9b32aa6cb72f74ed5fc58e3e987ee257238ee84bd98a86899376391b49a537",
                "md5": "2f801c1605500760c00dacf44bf35cfc",
                "sha256": "1504600819550ba3cbcd0eb3a26bf2270d25d7655a95dc42b8b642977f4fda51"
            },
            "downloads": -1,
            "filename": "hydra_vl4ai-0.0.6-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "2f801c1605500760c00dacf44bf35cfc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 186055,
            "upload_time": "2025-09-06T13:52:03",
            "upload_time_iso_8601": "2025-09-06T13:52:03.436074Z",
            "url": "https://files.pythonhosted.org/packages/fb/9b/32aa6cb72f74ed5fc58e3e987ee257238ee84bd98a86899376391b49a537/hydra_vl4ai-0.0.6-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "b3c32bfc11664c0e39a72c17ff4bd32e3ae3a7d55d1b26186937f4f6daf1729a",
                "md5": "f29d706f4134d11264a82bb738f63e95",
                "sha256": "519001997584cf054429f6b91ec7eec62191da7cd1a60719860d1c6f3b344b28"
            },
            "downloads": -1,
            "filename": "hydra_vl4ai-0.0.6.tar.gz",
            "has_sig": false,
            "md5_digest": "f29d706f4134d11264a82bb738f63e95",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 147120,
            "upload_time": "2025-09-06T13:52:04",
            "upload_time_iso_8601": "2025-09-06T13:52:04.837157Z",
            "url": "https://files.pythonhosted.org/packages/b3/c3/2bfc11664c0e39a72c17ff4bd32e3ae3a7d55d1b26186937f4f6daf1729a/hydra_vl4ai-0.0.6.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-09-06 13:52:04",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ControlNet",
    "github_project": "HYDRA",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "torch",
            "specs": []
        },
        {
            "name": "fastapi",
            "specs": []
        },
        {
            "name": "uvicorn",
            "specs": []
        },
        {
            "name": "starlette",
            "specs": []
        },
        {
            "name": "openai",
            "specs": []
        },
        {
            "name": "websockets",
            "specs": []
        },
        {
            "name": "tensorneko_util",
            "specs": [
                [
                    ">=",
                    "0.3.21"
                ],
                [
                    "<",
                    "0.4.0"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": []
        },
        {
            "name": "pillow",
            "specs": []
        },
        {
            "name": "word2number",
            "specs": []
        },
        {
            "name": "ollama",
            "specs": []
        },
        {
            "name": "bitsandbytes",
            "specs": []
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "<",
                    "2"
                ]
            ]
        },
        {
            "name": "gradio",
            "specs": [
                [
                    ">=",
                    "5.0"
                ],
                [
                    "<",
                    "5.12"
                ]
            ]
        },
        {
            "name": "pydantic",
            "specs": [
                [
                    "<",
                    "2.11"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": []
        },
        {
            "name": "transformers",
            "specs": []
        },
        {
            "name": "opencv-python",
            "specs": []
        },
        {
            "name": "rich",
            "specs": []
        }
    ],
    "lcname": "hydra-vl4ai"
}
        
Elapsed time: 3.17046s