langroom

Name	langroom JSON
Version	0.1.2 JSON
	download
home_page
Summary	A minimal gridworld environment for embodied question answering.
upload_time	2023-09-10 20:01:58
maintainer
docs_url	None
author	Jessy Lin
requires_python	>=3.8,<4.0
license
keywords	environment agent rl language
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ![](https://github.com/jlin816/langroom/raw/main/banner.png)

<p align="center" font-weight="bold">
A minimal environment to evaluate embodied question answering in interactive agents.
</p>

In LangRoom, agents must learn to both move and talk. LangRoom contains four objects with randomly generated colors. Agents have a partial view of the environment and receives questions "what color is <object>?". In response, they must seek out the correct object and generate the right answer.

# 💬 Getting Started

Play as a human:
```bash
pip install langroom
# Move with WASD
# Speak tokens 1-10 with number keys 1234567890
python run_gui.py
```

Create an environment instance:
```python
import langroom
env = langroom.LangRoom()
ac = {"move": 0, "talk": 0, "reset": True}
obs = env.step(ac)
```
# 📑 Documentation

## Task Structure

There are four objects with fixed positions and randomized colors. By default, the agent has a partially observed view of 5x5 grid cells and cannot see all the objects at once. The environment generates questions "what color is `<object>`?" and then waits ten timesteps before starting to say "it is `<color>`". Agents answer correctly if they output the correct `<color>` token at the same timestep as the environment. After each question-answer sequence, the colors of the objects are re-randomized.

Three reward variants are implemented, specified by the `task` argument:
- `answer-only`: agent is rewarded only for saying the correct color at the right timestep and penalized a small amount for saying things at other timesteps. Use this reward structure for comparability to the original paper Lin et al. (2023).
- `answer-and-echo`: agent is rewarded for predicting tokens that the environment generates (including silences and questions), with a larger reward for saying the correct color at the right timestep.
- `echo`: agent is rewarded for predicting all tokens the environment generates equally.

## Observation Space
The observation and action space definition follows the [embodied](https://github.com/danijar/embodied/blob/d897527510020eef812a684cbbb87afe05bbd785/embodied/core/base.py#L43) environment interface.
- `image (uint8 (resolution, resolution, 3))`: pixel agent-centric local view
- `text (uint32 ())`: ID of the token at the current timestep
- `log_image (uint8 (resolution, 4 * resolution, 3))`: debugging view with additional information rendered with agent view

Following the `embodied` env interface, these keys are also provided in the observation:
- `reward (float32)`: reward at the current timestep
- `is_first (bool)`: True if this timestep is the first timestep of an episode
- `is_last (bool)`: True if this timestep is the last timestep of an episode (terminated or truncated)
- `is_terminal (bool)`: True if this timestep is the last timestep of an episode (terminated)

## Action Space
LangRoom has a dictionary action space that allows the agent to output actions and tokens (i.e. move and speak) simultaneously at each timestep.
- `move (int32 ())`: ID of the movement action from movement action space `[stay down up right left]`
- `talk (int32 ())`: ID of the generated token

Following the `embodied` env interface, the action space also includes:
- `reset (bool)`: set to True to reset the episode

## Vocabulary Size

By default, the vocabulary size is 15 (the minimal number of tokens to ask and answer questions). To test how agents deal with larger vocabularies (and thus larger action spaces), set the `vocab_size` argument. Additional words in the vocabulary will be filled with dummy tokens.

# 🛠️ Development and Issues

New development and extensions to the environment are welcome! For any questions or issues, please open a GitHub issue.

# Citation
```
@article{lin2023learning,
         title={Learning to Model the World with Language},
         author={Jessy Lin and Yuqing Du and Olivia Watkins and Danijar Hafner and Pieter Abbeel and Dan Klein and Anca Dragan},
         year={2023},
         eprint={2308.01399},
         archivePrefix={arXiv},
}
```

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "langroom",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "environment,agent,rl,language",
    "author": "Jessy Lin",
    "author_email": "jessy81697@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/e0/0f/f4241c7ddb990c35113c75de3b110ce52c220d9add846ee73c45f0f7cd73/langroom-0.1.2.tar.gz",
    "platform": null,
    "description": "![](https://github.com/jlin816/langroom/raw/main/banner.png)\n\n<p align=\"center\" font-weight=\"bold\">\nA minimal environment to evaluate embodied question answering in interactive agents.\n</p>\n\nIn LangRoom, agents must learn to both move and talk. LangRoom contains four objects with randomly generated colors. Agents have a partial view of the environment and receives questions \"what color is <object>?\". In response, they must seek out the correct object and generate the right answer.\n\n# \ud83d\udcac Getting Started\n\nPlay as a human:\n```bash\npip install langroom\n# Move with WASD\n# Speak tokens 1-10 with number keys 1234567890\npython run_gui.py\n```\n\nCreate an environment instance:\n```python\nimport langroom\nenv = langroom.LangRoom()\nac = {\"move\": 0, \"talk\": 0, \"reset\": True}\nobs = env.step(ac)\n```\n# \ud83d\udcd1 Documentation\n\n## Task Structure\n\nThere are four objects with fixed positions and randomized colors. By default, the agent has a partially observed view of 5x5 grid cells and cannot see all the objects at once. The environment generates questions \"what color is `<object>`?\" and then waits ten timesteps before starting to say \"it is `<color>`\". Agents answer correctly if they output the correct `<color>` token at the same timestep as the environment. After each question-answer sequence, the colors of the objects are re-randomized.\n\nThree reward variants are implemented, specified by the `task` argument:\n- `answer-only`: agent is rewarded only for saying the correct color at the right timestep and penalized a small amount for saying things at other timesteps. Use this reward structure for comparability to the original paper Lin et al. (2023).\n- `answer-and-echo`: agent is rewarded for predicting tokens that the environment generates (including silences and questions), with a larger reward for saying the correct color at the right timestep.\n- `echo`: agent is rewarded for predicting all tokens the environment generates equally.\n\n## Observation Space\nThe observation and action space definition follows the [embodied](https://github.com/danijar/embodied/blob/d897527510020eef812a684cbbb87afe05bbd785/embodied/core/base.py#L43) environment interface.\n- `image (uint8 (resolution, resolution, 3))`: pixel agent-centric local view\n- `text (uint32 ())`: ID of the token at the current timestep\n- `log_image (uint8 (resolution, 4 * resolution, 3))`: debugging view with additional information rendered with agent view\n\nFollowing the `embodied` env interface, these keys are also provided in the observation:\n- `reward (float32)`: reward at the current timestep\n- `is_first (bool)`: True if this timestep is the first timestep of an episode\n- `is_last (bool)`: True if this timestep is the last timestep of an episode (terminated or truncated)\n- `is_terminal (bool)`: True if this timestep is the last timestep of an episode (terminated)\n\n## Action Space\nLangRoom has a dictionary action space that allows the agent to output actions and tokens (i.e. move and speak) simultaneously at each timestep.\n- `move (int32 ())`: ID of the movement action from movement action space `[stay down up right left]`\n- `talk (int32 ())`: ID of the generated token\n\nFollowing the `embodied` env interface, the action space also includes:\n- `reset (bool)`: set to True to reset the episode\n\n## Vocabulary Size\n\nBy default, the vocabulary size is 15 (the minimal number of tokens to ask and answer questions). To test how agents deal with larger vocabularies (and thus larger action spaces), set the `vocab_size` argument. Additional words in the vocabulary will be filled with dummy tokens.\n\n# \ud83d\udee0\ufe0f Development and Issues\n\nNew development and extensions to the environment are welcome! For any questions or issues, please open a GitHub issue.\n\n# Citation\n```\n@article{lin2023learning,\n         title={Learning to Model the World with Language},\n         author={Jessy Lin and Yuqing Du and Olivia Watkins and Danijar Hafner and Pieter Abbeel and Dan Klein and Anca Dragan},\n         year={2023},\n         eprint={2308.01399},\n         archivePrefix={arXiv},\n}\n```\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "A minimal gridworld environment for embodied question answering.",
    "version": "0.1.2",
    "project_urls": null,
    "split_keywords": [
        "environment",
        "agent",
        "rl",
        "language"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "cb7edcb06e0ce4fab6272e65b572b6f8437b0728fd72747bed2e877d3c0a18f4",
                "md5": "de55227a8bb2e83da7828552a4b03a92",
                "sha256": "2910f08aa8e35a445333fce0af262a0a862444ca7e803ed2745b879436be3318"
            },
            "downloads": -1,
            "filename": "langroom-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "de55227a8bb2e83da7828552a4b03a92",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 28175,
            "upload_time": "2023-09-10T20:01:56",
            "upload_time_iso_8601": "2023-09-10T20:01:56.424616Z",
            "url": "https://files.pythonhosted.org/packages/cb/7e/dcb06e0ce4fab6272e65b572b6f8437b0728fd72747bed2e877d3c0a18f4/langroom-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e00ff4241c7ddb990c35113c75de3b110ce52c220d9add846ee73c45f0f7cd73",
                "md5": "a8c5091044d4d3f481c5675c98233074",
                "sha256": "97bea68bc54cca1d18b390853e5b650e61f29664510cf293f632f7af33adb14c"
            },
            "downloads": -1,
            "filename": "langroom-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "a8c5091044d4d3f481c5675c98233074",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 21907,
            "upload_time": "2023-09-10T20:01:58",
            "upload_time_iso_8601": "2023-09-10T20:01:58.259684Z",
            "url": "https://files.pythonhosted.org/packages/e0/0f/f4241c7ddb990c35113c75de3b110ce52c220d9add846ee73c45f0f7cd73/langroom-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-10 20:01:58",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "langroom"
}

Jessy Lin