robo-transformers

Name	robo-transformers JSON
Version	1.0.0 JSON
	download
home_page
Summary	RT-1, RT-1-X, Octo Robotics Transformer Model Inference
upload_time	2024-01-13 08:22:04
maintainer
docs_url	None
author	Sebastian Peralta
requires_python	>=3.9,<3.12
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Library for Robotic Transformers. RT-1, RT-X-1, Octo

[![Code Coverage](https://codecov.io/gh/sebbyjp/dgl_ros/branch/code_cov/graph/badge.svg?token=9225d677-c4f2-4607-a9dd-8c22446f13bc)](https://codecov.io/gh/sebbyjp/dgl_ros)
[![ubuntu | python 3.9 | 3.10 | 3.11](https://github.com/sebbyjp/robo_transformers/actions/workflows/ubuntu.yml/badge.svg)](https://github.com/sebbyjp/robo_transformers/actions/workflows/ubuntu.yml)
[![macos | python 3.9 | 3.10 | 3.11](https://github.com/sebbyjp/robo_transformers/actions/workflows/macos.yml/badge.svg)](https://github.com/sebbyjp/robo_transformers/actions/workflows/macos.yml)

## Installation

Requirements:
python >= 3.9

### Install tensorflow version for your OS and Hardware

See [Tensorflow](https://www.tensorflow.org/install)

### Using Octo models

Follow their [installation procedure](https://github.com/octo-models/octo).

**Note**: You might not need conda if you are able to just clone their repo and run `pip install -e octo`.

### Recommended: Using PyPI

`pip install robo-transformers`

### From Source

Clone this repo:

`git clone https://github.com/sebbyjp/robo_transformers.git`

`cd robo_transformers`

Use poetry

`pip install poetry && poetry config virtualenvs.in-project true`

### Install dependencies

`poetry install`

Poetry has installed the dependencies in a virtualenv so we need to activate it.

`source .venv/bin/activate`

## Run Octo inference on demo images

`python -m robo_transformers.demo`
  
## Run RT-1 Inference On Demo Images

`python -m robo_transformers.models.rt1.inference`

## See usage

You can specify a custom checkpoint path or the model_keys for the three mentioned in the RT-1 paper as well as RT-X.

`python -m robo_transformers.models.rt1.inference --help`

## Run Inference Server

The inference server takes care of all the internal state so all you need to specify is an instruction and image. You may also pass in 

```python
from robo_transformers.inference_server import InferenceServer
import numpy as np

# Somewhere in your robot control stack code...

instruction = "pick block"
img = np.random.randn(256, 320, 3) # Width, Height, RGB
inference = InferenceServer()

action = inference(instruction, img)
```

## Data Types

`action, next_policy_state = model.act(time_step, curr_policy_state)`

### policy state is internal state of network

In this case it is a 6-frame window of past observations,actions and the index in time.

```python
{'action_tokens': ArraySpec(shape=(6, 11, 1, 1), dtype=dtype('int32'), name='action_tokens'),
 'image': ArraySpec(shape=(6, 256, 320, 3), dtype=dtype('uint8'), name='image'),
 'step_num': ArraySpec(shape=(1, 1, 1, 1), dtype=dtype('int32'), name='step_num'),
 't': ArraySpec(shape=(1, 1, 1, 1), dtype=dtype('int32'), name='t')}
 ```

### time_step is the input from the environment

```python
{'discount': BoundedArraySpec(shape=(), dtype=dtype('float32'), name='discount', minimum=0.0, maximum=1.0),
 'observation': {'base_pose_tool_reached': ArraySpec(shape=(7,), dtype=dtype('float32'), name='base_pose_tool_reached'),
                 'gripper_closed': ArraySpec(shape=(1,), dtype=dtype('float32'), name='gripper_closed'),
                 'gripper_closedness_commanded': ArraySpec(shape=(1,), dtype=dtype('float32'), name='gripper_closedness_commanded'),
                 'height_to_bottom': ArraySpec(shape=(1,), dtype=dtype('float32'), name='height_to_bottom'),
                 'image': ArraySpec(shape=(256, 320, 3), dtype=dtype('uint8'), name='image'),
                 'natural_language_embedding': ArraySpec(shape=(512,), dtype=dtype('float32'), name='natural_language_embedding'),
                 'natural_language_instruction': ArraySpec(shape=(), dtype=dtype('O'), name='natural_language_instruction'),
                 'orientation_box': ArraySpec(shape=(2, 3), dtype=dtype('float32'), name='orientation_box'),
                 'orientation_start': ArraySpec(shape=(4,), dtype=dtype('float32'), name='orientation_in_camera_space'),
                 'robot_orientation_positions_box': ArraySpec(shape=(3, 3), dtype=dtype('float32'), name='robot_orientation_positions_box'),
                 'rotation_delta_to_go': ArraySpec(shape=(3,), dtype=dtype('float32'), name='rotation_delta_to_go'),
                 'src_rotation': ArraySpec(shape=(4,), dtype=dtype('float32'), name='transform_camera_robot'),
                 'vector_to_go': ArraySpec(shape=(3,), dtype=dtype('float32'), name='vector_to_go'),
                 'workspace_bounds': ArraySpec(shape=(3, 3), dtype=dtype('float32'), name='workspace_bounds')},
 'reward': ArraySpec(shape=(), dtype=dtype('float32'), name='reward'),
 'step_type': ArraySpec(shape=(), dtype=dtype('int32'), name='step_type')}
 ```

### action

```python
{'base_displacement_vector': BoundedArraySpec(shape=(2,), dtype=dtype('float32'), name='base_displacement_vector', minimum=-1.0, maximum=1.0),
 'base_displacement_vertical_rotation': BoundedArraySpec(shape=(1,), dtype=dtype('float32'), name='base_displacement_vertical_rotation', minimum=-3.1415927410125732, maximum=3.1415927410125732),
 'gripper_closedness_action': BoundedArraySpec(shape=(1,), dtype=dtype('float32'), name='gripper_closedness_action', minimum=-1.0, maximum=1.0),
 'rotation_delta': BoundedArraySpec(shape=(3,), dtype=dtype('float32'), name='rotation_delta', minimum=-1.5707963705062866, maximum=1.5707963705062866),
 'terminate_episode': BoundedArraySpec(shape=(3,), dtype=dtype('int32'), name='terminate_episode', minimum=0, maximum=1),
 'world_vector': BoundedArraySpec(shape=(3,), dtype=dtype('float32'), name='world_vector', minimum=-1.0, maximum=1.0)}
 ```

## TODO

- Render action, policy_state, observation specs in something prettier like pandas data frame.

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "robo-transformers",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<3.12",
    "maintainer_email": "",
    "keywords": "",
    "author": "Sebastian Peralta",
    "author_email": "peraltas@seas.upenn.edu",
    "download_url": "https://files.pythonhosted.org/packages/64/39/737542c033a7735d2f08e07c2b14e57f593f8866e9c65c850c98c534cc17/robo_transformers-1.0.0.tar.gz",
    "platform": null,
    "description": "# Library for Robotic Transformers. RT-1, RT-X-1, Octo\n\n[![Code Coverage](https://codecov.io/gh/sebbyjp/dgl_ros/branch/code_cov/graph/badge.svg?token=9225d677-c4f2-4607-a9dd-8c22446f13bc)](https://codecov.io/gh/sebbyjp/dgl_ros)\n[![ubuntu | python 3.9 | 3.10 | 3.11](https://github.com/sebbyjp/robo_transformers/actions/workflows/ubuntu.yml/badge.svg)](https://github.com/sebbyjp/robo_transformers/actions/workflows/ubuntu.yml)\n[![macos | python 3.9 | 3.10 | 3.11](https://github.com/sebbyjp/robo_transformers/actions/workflows/macos.yml/badge.svg)](https://github.com/sebbyjp/robo_transformers/actions/workflows/macos.yml)\n\n## Installation\n\nRequirements:\npython >= 3.9\n\n### Install tensorflow version for your OS and Hardware\n\nSee [Tensorflow](https://www.tensorflow.org/install)\n\n### Using Octo models\n\nFollow their [installation procedure](https://github.com/octo-models/octo).\n\n**Note**: You might not need conda if you are able to just clone their repo and run `pip install -e octo`.\n\n### Recommended: Using PyPI\n\n`pip install robo-transformers`\n\n### From Source\n\nClone this repo:\n\n`git clone https://github.com/sebbyjp/robo_transformers.git`\n\n`cd robo_transformers`\n\nUse poetry\n\n`pip install poetry && poetry config virtualenvs.in-project true`\n\n### Install dependencies\n\n`poetry install`\n\nPoetry has installed the dependencies in a virtualenv so we need to activate it.\n\n`source .venv/bin/activate`\n\n## Run Octo inference on demo images\n\n`python -m robo_transformers.demo`\n  \n## Run RT-1 Inference On Demo Images\n\n`python -m robo_transformers.models.rt1.inference`\n\n## See usage\n\nYou can specify a custom checkpoint path or the model_keys for the three mentioned in the RT-1 paper as well as RT-X.\n\n`python -m robo_transformers.models.rt1.inference --help`\n\n## Run Inference Server\n\nThe inference server takes care of all the internal state so all you need to specify is an instruction and image. You may also pass in \n\n```python\nfrom robo_transformers.inference_server import InferenceServer\nimport numpy as np\n\n# Somewhere in your robot control stack code...\n\ninstruction = \"pick block\"\nimg = np.random.randn(256, 320, 3) # Width, Height, RGB\ninference = InferenceServer()\n\naction = inference(instruction, img)\n```\n\n## Data Types\n\n`action, next_policy_state = model.act(time_step, curr_policy_state)`\n\n### policy state is internal state of network\n\nIn this case it is a 6-frame window of past observations,actions and the index in time.\n\n```python\n{'action_tokens': ArraySpec(shape=(6, 11, 1, 1), dtype=dtype('int32'), name='action_tokens'),\n 'image': ArraySpec(shape=(6, 256, 320, 3), dtype=dtype('uint8'), name='image'),\n 'step_num': ArraySpec(shape=(1, 1, 1, 1), dtype=dtype('int32'), name='step_num'),\n 't': ArraySpec(shape=(1, 1, 1, 1), dtype=dtype('int32'), name='t')}\n ```\n\n### time_step is the input from the environment\n\n```python\n{'discount': BoundedArraySpec(shape=(), dtype=dtype('float32'), name='discount', minimum=0.0, maximum=1.0),\n 'observation': {'base_pose_tool_reached': ArraySpec(shape=(7,), dtype=dtype('float32'), name='base_pose_tool_reached'),\n                 'gripper_closed': ArraySpec(shape=(1,), dtype=dtype('float32'), name='gripper_closed'),\n                 'gripper_closedness_commanded': ArraySpec(shape=(1,), dtype=dtype('float32'), name='gripper_closedness_commanded'),\n                 'height_to_bottom': ArraySpec(shape=(1,), dtype=dtype('float32'), name='height_to_bottom'),\n                 'image': ArraySpec(shape=(256, 320, 3), dtype=dtype('uint8'), name='image'),\n                 'natural_language_embedding': ArraySpec(shape=(512,), dtype=dtype('float32'), name='natural_language_embedding'),\n                 'natural_language_instruction': ArraySpec(shape=(), dtype=dtype('O'), name='natural_language_instruction'),\n                 'orientation_box': ArraySpec(shape=(2, 3), dtype=dtype('float32'), name='orientation_box'),\n                 'orientation_start': ArraySpec(shape=(4,), dtype=dtype('float32'), name='orientation_in_camera_space'),\n                 'robot_orientation_positions_box': ArraySpec(shape=(3, 3), dtype=dtype('float32'), name='robot_orientation_positions_box'),\n                 'rotation_delta_to_go': ArraySpec(shape=(3,), dtype=dtype('float32'), name='rotation_delta_to_go'),\n                 'src_rotation': ArraySpec(shape=(4,), dtype=dtype('float32'), name='transform_camera_robot'),\n                 'vector_to_go': ArraySpec(shape=(3,), dtype=dtype('float32'), name='vector_to_go'),\n                 'workspace_bounds': ArraySpec(shape=(3, 3), dtype=dtype('float32'), name='workspace_bounds')},\n 'reward': ArraySpec(shape=(), dtype=dtype('float32'), name='reward'),\n 'step_type': ArraySpec(shape=(), dtype=dtype('int32'), name='step_type')}\n ```\n\n### action\n\n```python\n{'base_displacement_vector': BoundedArraySpec(shape=(2,), dtype=dtype('float32'), name='base_displacement_vector', minimum=-1.0, maximum=1.0),\n 'base_displacement_vertical_rotation': BoundedArraySpec(shape=(1,), dtype=dtype('float32'), name='base_displacement_vertical_rotation', minimum=-3.1415927410125732, maximum=3.1415927410125732),\n 'gripper_closedness_action': BoundedArraySpec(shape=(1,), dtype=dtype('float32'), name='gripper_closedness_action', minimum=-1.0, maximum=1.0),\n 'rotation_delta': BoundedArraySpec(shape=(3,), dtype=dtype('float32'), name='rotation_delta', minimum=-1.5707963705062866, maximum=1.5707963705062866),\n 'terminate_episode': BoundedArraySpec(shape=(3,), dtype=dtype('int32'), name='terminate_episode', minimum=0, maximum=1),\n 'world_vector': BoundedArraySpec(shape=(3,), dtype=dtype('float32'), name='world_vector', minimum=-1.0, maximum=1.0)}\n ```\n\n## TODO\n\n- Render action, policy_state, observation specs in something prettier like pandas data frame.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "RT-1, RT-1-X, Octo Robotics Transformer Model Inference",
    "version": "1.0.0",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "90d79d1737639cedde4f5a5e98daff8930fc124fde1db876f5c04da5fcfdea3a",
                "md5": "c325bc7f19597fdaf23047ff42798ab0",
                "sha256": "0ec6944630d72fac784cac2a78bce984cfad6c1510020be7f5177c3af4a6dd12"
            },
            "downloads": -1,
            "filename": "robo_transformers-1.0.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c325bc7f19597fdaf23047ff42798ab0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<3.12",
            "size": 4144218,
            "upload_time": "2024-01-13T08:22:01",
            "upload_time_iso_8601": "2024-01-13T08:22:01.911353Z",
            "url": "https://files.pythonhosted.org/packages/90/d7/9d1737639cedde4f5a5e98daff8930fc124fde1db876f5c04da5fcfdea3a/robo_transformers-1.0.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6439737542c033a7735d2f08e07c2b14e57f593f8866e9c65c850c98c534cc17",
                "md5": "ebe4afba68c010d64de6503d72369446",
                "sha256": "df858fff31822728c860f68b0302948c44ff4bb8257a3eecb23fbce417649168"
            },
            "downloads": -1,
            "filename": "robo_transformers-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "ebe4afba68c010d64de6503d72369446",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<3.12",
            "size": 4143479,
            "upload_time": "2024-01-13T08:22:04",
            "upload_time_iso_8601": "2024-01-13T08:22:04.653411Z",
            "url": "https://files.pythonhosted.org/packages/64/39/737542c033a7735d2f08e07c2b14e57f593f8866e9c65c850c98c534cc17/robo_transformers-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-01-13 08:22:04",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "robo-transformers"
}

Sebastian Peralta