iwisdm

Name	iwisdm JSON
Version	0.1.3 JSON
	download
home_page	None
Summary	A virtual environment that generates vision-language tasks with varying complexity.
upload_time	2024-12-08 05:04:15
maintainer	None
docs_url	None
author	Xiaoxuan Lei
requires_python	<4.0,>=3.11
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            Table of Contents
=================

   * [Overview](#iwisdm)
   * [Examples](#examples)
   * [Usage](#usage)
      * [Install Instructions](#install-instructions)
      * [Environment Init](#shapenet-environment-initialization)
      * [Basic Usage](#basic-usage)
   * [Acknowledgements](#acknowledgements)
   * [Citation](#citation)
   
# iWISDM
iWISDM, short for instructed-Virtual VISual Decision Making, is a virtual environment capable of generating a limitless array of _vision-language tasks with varying complexity_. 

It is a toolkit designed to evaluate the ability of multimodal models to follow instructions in visual tasks. It builds on the compositional nature of human behavior, and the fact that complex tasks are often constructed by combining smaller task units together in time.

iWISDM encompasses a broad spectrum of subtasks that engage executive functions such as inhibition of action, working memory, attentional set, task switching, and schema generalization. 

It is also a scalable and extensible framework which allows users to easily define their own task space and stimuli dataset.

# Examples
Using iWISDM, we have compiled four distinct benchmarks of increasing complexity levels for evaluation of large multi-modal models.

Below is an example of the generated tasks:
<p align="center">
  <img width="800" alt="Screenshot 2024-06-24 at 8 43 41 PM" src="https://github.com/BashivanLab/iWISDM/assets/44264329/5f7eeffe-a3be-405f-8514-6424818cf5b7">
</p>
<p align="center">
  <img src="https://github.com/BashivanLab/iWISDM/blob/main/benchmarking/param_table.png?raw=true" alt="benchmarking params"/>
</p>

These datasets can be generated from [/benchmarking](https://github.com/BashivanLab/iWISDM/tree/main/benchmarking) or downloaded at: 
[iWISDM_benchsets.tar.gz](https://drive.google.com/file/d/1K-9AAJfvz6kiN3h9X2Rg0D88gJQ_rxSu/view?usp=sharing)

iWISDM inherits several classes from COG ([github.com/google/cog](https://github.com/google/cog)) to build task graphs. For convenience, we have also pre-implemented several commonly used cognitive tasks in task_bank.py. 


### For further details, please refer to our paper at:
[https://arxiv.org/submit/5678755/view](https://arxiv.org/abs/2406.14343)

# Usage
### Install Instructions
To install the iWISDM package, simply run the following command:
```shell
pip install iwisdm
```
If you would like to install the package from source, you can clone the repository and follow the instructions below:
#### Install Poetry
```shell
curl -sSL https://install.python-poetry.org | python3 -
```
#### Create conda python environment
```shell
conda create --name iwisdm python=3.11
```
#### Install dependencies
```shell
poetry install
```

### ShapeNet Environment Initialization
To initialize the ShapeNet environment, you will need to download the ShapeNet dataset, this is for rendering the trials.

To replicate our experiments, you also need to download the benchmarking configurations.

ShapeNet is a large-scale repository of shapes represented by 3D CAD models of objects  [(Chang et. al. 2015)](https://arxiv.org/abs/1512.03012).
#### Pre-rendered Dataset Download
[shapenet_handpicked.tar.gz](https://drive.google.com/file/d/1is72QDjP6A6TA1mZLL3doYWaU08waAxm/view?usp=sharing) 

#### Benchmarking Configs Download
[configs.tar.gz](https://github.com/BashivanLab/iWISDM/tree/main/benchmarking/configs.tar.gz)
### Basic Usage

```python
# imports
from iwisdm import make
from iwisdm import read_write

# environment initialization
with open('your/path/to/env_config', 'r') as f:
    config = json.load(f)  # using pre-defined AutoTask configuration
env = make(
    env_id='ShapeNet',
    dataset_fp='your/path/to/shapenet_handpicked',
)
env.set_env_spec(
    env.init_env_spec(
        auto_gen_config=config,
    )
)

# AutoTask procedural task generation and saving trial
tasks = env.generate_tasks(10)  # generate 10 random task graphs and tasks
_, (_, temporal_task) = tasks[0]
trials = env.generate_trials(tasks=[temporal_task])  # generate a trial
imgs, _, info_dict = trials[0]
read_write.write_trial(imgs, info_dict, f'output/trial_{i}')
```

#### See [/tutorials](https://github.com/BashivanLab/iWISDM/tree/main/tutorials) for more examples.

# Acknowledgements
This repository builds upon the foundational work presented in the COG paper [(Yang et al.)](https://arxiv.org/abs/1803.06092).

Yang, Guangyu Robert, et al. "A dataset and architecture for visual reasoning with a working memory." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

# Citation
If you find iWISDM useful in your research, please use the following BibTex:
```
@inproceedings{lei2024iwisdm,
  title={iWISDM: Assessing instruction following in multimodal models at scale},
  author={Lei, Xiaoxuan and Gomez, Lucas and Bai, Hao Yuan and Bashivan, Pouya},
  booktitle={Conference on Lifelong Learning Agents (CoLLAs 2024)},
  year={2024}
}
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "iwisdm",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.11",
    "maintainer_email": null,
    "keywords": null,
    "author": "Xiaoxuan Lei",
    "author_email": "xiaoxuan.lei@mail.mcgill.ca",
    "download_url": "https://files.pythonhosted.org/packages/02/f9/b1066a3b87f65ba99a35c2340e42e2e65386fe0f6f18bbc6b5c3b05d0997/iwisdm-0.1.3.tar.gz",
    "platform": null,
    "description": "Table of Contents\n=================\n\n   * [Overview](#iwisdm)\n   * [Examples](#examples)\n   * [Usage](#usage)\n      * [Install Instructions](#install-instructions)\n      * [Environment Init](#shapenet-environment-initialization)\n      * [Basic Usage](#basic-usage)\n   * [Acknowledgements](#acknowledgements)\n   * [Citation](#citation)\n   \n# iWISDM\niWISDM, short for instructed-Virtual VISual Decision Making, is a virtual environment capable of generating a limitless array of _vision-language tasks with varying complexity_. \n\nIt is a toolkit designed to evaluate the ability of multimodal models to follow instructions in visual tasks. It builds on the compositional nature of human behavior, and the fact that complex tasks are often constructed by combining smaller task units together in time.\n\niWISDM encompasses a broad spectrum of subtasks that engage executive functions such as inhibition of action, working memory, attentional set, task switching, and schema generalization. \n\nIt is also a scalable and extensible framework which allows users to easily define their own task space and stimuli dataset.\n\n# Examples\nUsing iWISDM, we have compiled four distinct benchmarks of increasing complexity levels for evaluation of large multi-modal models.\n\nBelow is an example of the generated tasks:\n<p align=\"center\">\n  <img width=\"800\" alt=\"Screenshot 2024-06-24 at 8 43 41\u202fPM\" src=\"https://github.com/BashivanLab/iWISDM/assets/44264329/5f7eeffe-a3be-405f-8514-6424818cf5b7\">\n</p>\n<p align=\"center\">\n  <img src=\"https://github.com/BashivanLab/iWISDM/blob/main/benchmarking/param_table.png?raw=true\" alt=\"benchmarking params\"/>\n</p>\n\nThese datasets can be generated from [/benchmarking](https://github.com/BashivanLab/iWISDM/tree/main/benchmarking) or downloaded at: \n[iWISDM_benchsets.tar.gz](https://drive.google.com/file/d/1K-9AAJfvz6kiN3h9X2Rg0D88gJQ_rxSu/view?usp=sharing)\n\niWISDM inherits several classes from COG ([github.com/google/cog](https://github.com/google/cog)) to build task graphs. For convenience, we have also pre-implemented several commonly used cognitive tasks in task_bank.py. \n\n\n### For further details, please refer to our paper at:\n[https://arxiv.org/submit/5678755/view](https://arxiv.org/abs/2406.14343)\n\n# Usage\n### Install Instructions\nTo install the iWISDM package, simply run the following command:\n```shell\npip install iwisdm\n```\nIf you would like to install the package from source, you can clone the repository and follow the instructions below:\n#### Install Poetry\n```shell\ncurl -sSL https://install.python-poetry.org | python3 -\n```\n#### Create conda python environment\n```shell\nconda create --name iwisdm python=3.11\n```\n#### Install dependencies\n```shell\npoetry install\n```\n\n### ShapeNet Environment Initialization\nTo initialize the ShapeNet environment, you will need to download the ShapeNet dataset, this is for rendering the trials.\n\nTo replicate our experiments, you also need to download the benchmarking configurations.\n\nShapeNet is a large-scale repository of shapes represented by 3D CAD models of objects  [(Chang et. al. 2015)](https://arxiv.org/abs/1512.03012).\n#### Pre-rendered Dataset Download\n[shapenet_handpicked.tar.gz](https://drive.google.com/file/d/1is72QDjP6A6TA1mZLL3doYWaU08waAxm/view?usp=sharing) \n\n#### Benchmarking Configs Download\n[configs.tar.gz](https://github.com/BashivanLab/iWISDM/tree/main/benchmarking/configs.tar.gz)\n### Basic Usage\n\n```python\n# imports\nfrom iwisdm import make\nfrom iwisdm import read_write\n\n# environment initialization\nwith open('your/path/to/env_config', 'r') as f:\n    config = json.load(f)  # using pre-defined AutoTask configuration\nenv = make(\n    env_id='ShapeNet',\n    dataset_fp='your/path/to/shapenet_handpicked',\n)\nenv.set_env_spec(\n    env.init_env_spec(\n        auto_gen_config=config,\n    )\n)\n\n# AutoTask procedural task generation and saving trial\ntasks = env.generate_tasks(10)  # generate 10 random task graphs and tasks\n_, (_, temporal_task) = tasks[0]\ntrials = env.generate_trials(tasks=[temporal_task])  # generate a trial\nimgs, _, info_dict = trials[0]\nread_write.write_trial(imgs, info_dict, f'output/trial_{i}')\n```\n\n#### See [/tutorials](https://github.com/BashivanLab/iWISDM/tree/main/tutorials) for more examples.\n\n# Acknowledgements\nThis repository builds upon the foundational work presented in the COG paper [(Yang et al.)](https://arxiv.org/abs/1803.06092).\n\nYang, Guangyu Robert, et al. \"A dataset and architecture for visual reasoning with a working memory.\" Proceedings of the European Conference on Computer Vision (ECCV). 2018.\n\n# Citation\nIf you find iWISDM useful in your research, please use the following BibTex:\n```\n@inproceedings{lei2024iwisdm,\n  title={iWISDM: Assessing instruction following in multimodal models at scale},\n  author={Lei, Xiaoxuan and Gomez, Lucas and Bai, Hao Yuan and Bashivan, Pouya},\n  booktitle={Conference on Lifelong Learning Agents (CoLLAs 2024)},\n  year={2024}\n}\n```\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A virtual environment that generates vision-language tasks with varying complexity.",
    "version": "0.1.3",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1ecbbc915a414e00a329c4f5dc5bc66239dd7e004ae020c2d68e847aa8e60447",
                "md5": "498168f3b9284692de0d1dc75adc5cfb",
                "sha256": "3011d64825e7343cba1fc93f378f2e1afce8a0703e49d26d8ff59c33f8709998"
            },
            "downloads": -1,
            "filename": "iwisdm-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "498168f3b9284692de0d1dc75adc5cfb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.11",
            "size": 52305,
            "upload_time": "2024-12-08T05:04:13",
            "upload_time_iso_8601": "2024-12-08T05:04:13.385824Z",
            "url": "https://files.pythonhosted.org/packages/1e/cb/bc915a414e00a329c4f5dc5bc66239dd7e004ae020c2d68e847aa8e60447/iwisdm-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "02f9b1066a3b87f65ba99a35c2340e42e2e65386fe0f6f18bbc6b5c3b05d0997",
                "md5": "a42518b2f19cad1f78b39207adf486b1",
                "sha256": "dd3dc1999a4bf7259e7b9703cd635e900d9d3199df30ad89d894f5dd893fc3b3"
            },
            "downloads": -1,
            "filename": "iwisdm-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "a42518b2f19cad1f78b39207adf486b1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.11",
            "size": 48135,
            "upload_time": "2024-12-08T05:04:15",
            "upload_time_iso_8601": "2024-12-08T05:04:15.128476Z",
            "url": "https://files.pythonhosted.org/packages/02/f9/b1066a3b87f65ba99a35c2340e42e2e65386fe0f6f18bbc6b5c3b05d0997/iwisdm-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-08 05:04:15",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "iwisdm"
}

Xiaoxuan Lei