Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Updated with voxel observation mode. Added observation mode config supports #411

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
67 changes: 66 additions & 1 deletion docs/source/user_guide/concepts/observation.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,71 @@ For a quick demo to visualize pointclouds, you can run
python -m mani_skill.examples.demo_vis_pcd -e "PushCube-v1"
```

### voxel
This observation mode has the same data format as the [sensor_data mode](#sensor_data), but all sensor data from cameras are removed and instead a new key is added called `voxel_grid`.

To use this observation mode, a dictionary of observation config parameters is required to be passed in via obs_mode_config during environment initializations (gym.make()). It should contain the following voxelization config hyperparameters:
astonastonaston marked this conversation as resolved.
Show resolved Hide resolved

- `coord_bounds`: `[torch.float32, torch.float32, torch.float32, torch.float32, torch.float32, torch.float32]` It has form **[x_min, y_min, z_min, x_max, y_max, z_max]** defining the metric volume to be voxelized.
- `voxel_size`: `torch.int` Defining the side length of each voxel, assuming that all voxels are cubic.
- `device`: `torch.device` The device on which the voxelization takes place. Something like **torch.device("cuda" if torch.cuda.is_available() else "cpu")**
- `segmentation`: `bool` Defining whether or not to estimate voxel segmentations using the point cloud segmentations. If true then num_channels=11 (including one channel for voxel segmentation), otherwise num_channels=10.

Then, as you step throught the environment you created and get observations, you can see the extra key `voxel_grid` indicating the voxel grid generated:


- `voxel_grid`: `[torch.int, torch.int, torch.int, torch.int, torch.int]` It has form **[N, voxel_size, voxel_size, voxel_size, num_channels]**. Voxel grids generated by fusing the point cloud and rgb data from all cameras. `N` is the batch size. `voxel_size` is the side length of the voxel, as indicated in voxelization configs. `num_channels` indicates the number of feature channels for each voxel.


The voxel grid can be visualized below. This is an image showing the voxelized scene of PushCube-v1 with slightly-tuned default hyperparameters. The voxel grid is reconstructed from the front camera, following the default camera settings of the PuchCube-v1 task, and hence it only contains the front voxels instead of the voxels throughout the scene.

```{image} images/voxel_pushcube.png
---
alt: Voxelized PushCube-v1 scene at the initial state
---
```

The RGBD image data used to reconstruct the voxel scene above is shown in the following figure. Here we use only one base camera in PushCube-v1 task.

```{image} images/voxel_cam_view_one.png
---
alt: Corresponding RGBD observations
---
```

For a quick demo to visualize voxel grids, you can run

<!-- TODO: add command line args -->
```bash
python -m mani_skill.examples.demo_vis_voxel -e "PushCube-v1" --voxel-size 200 --zoom-factor 2.2 --coord-bounds -1 -1 -1 2 2 2
```

Or simply

```bash
python -m mani_skill.examples.demo_vis_voxel -e "PushCube-v1"
```

When using just the default settings.

Furthermore, if you use more sensors (currently only RGB and depth cameras) to film the scene and collect more point cloud and RGB data from different poses, you can get a more accurate voxel grid reconstruction of the scene. Figure below gives a more completely reconstructed voxel scene of PushCube-v1 using more RGBD cameras.

```{image} images/voxel_pushcube_complete.png
---
alt: Densely voxelized PushCube-v1 scene at the initial state
---
```

It is reconstructed using 5 cameras located at the up, left, right, front, and back of the tabletop scene, respectively, as shown in the visualized RGBD observations below.

```{image} images/voxel_cam_view_all.png
---
alt: Corresponding RGBD observations
---
```



## Segmentation Data

Objects upon being loaded are automatically assigned a segmentation ID (the `per_scene_id` attribute of `sapien.Entity` objects). To get information about which IDs refer to which Actors / Links, you can run the code below
Expand All @@ -113,4 +178,4 @@ Note that ID 0 refers to the distant background. For a quick demo of this, you c
```bash
python -m mani_skill.examples.demo_vis_segmentation -e "PushCube-v1" # plot all segmentations
python -m mani_skill.examples.demo_vis_segmentation -e "PushCube-v1" --id cube # mask everything but the object with name "cube"
```
```
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 13 additions & 0 deletions docs/source/user_guide/demos/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,19 @@ python -m mani_skill.examples.demo_vis_rgbd -e "StackCube-v1"
```{figure} images/rgbd_vis.png
```

## Visualize Voxel Data

You can run the following to visualize the voxelized data. It will give you the following voxelized scene under the default sensor settings with only 1 camera at the front of the scene.

```bash
python -m mani_skill.examples.demo_vis_voxel -e "PushCube-v1"
```


```{figure} images/voxel_pushcube.png
```


## Visualize Reset Distributions

Determining how difficult a task might be for ML algorithms like reinforcement learning and imitation learning can heavily depend on the reset distribution of the task. To see what the reset distribution of any task (the result of repeated env.reset calls) looks like you can run the following to save a video to the `videos` folder
Expand Down
19 changes: 17 additions & 2 deletions mani_skill/envs/sapien_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
from mani_skill.envs.utils.observations import (
sensor_data_to_pointcloud,
sensor_data_to_rgbd,
sensor_data_to_voxel
)
from mani_skill.sensors.base_sensor import BaseSensor, BaseSensorConfig
from mani_skill.sensors.camera import (
Expand Down Expand Up @@ -66,6 +67,8 @@ class BaseEnv(gym.Env):

sensor_cfgs (dict): configurations of sensors. See notes for more details.

obs_mode_config (dict): configuration hyperparameters of observations. See notes for more details.

human_render_camera_cfgs (dict): configurations of human rendering cameras. Similar usage as @sensor_cfgs.

robot_uids (Union[str, BaseAgent, List[Union[str, BaseAgent]]]): List of robots to instantiate and control in the environment.
Expand Down Expand Up @@ -99,7 +102,7 @@ class BaseEnv(gym.Env):
SUPPORTED_ROBOTS: List[Union[str, Tuple[str]]] = None
"""Override this to enforce which robots or tuples of robots together are supported in the task. During env creation,
setting robot_uids auto loads all desired robots into the scene, but not all tasks are designed to support some robot setups"""
SUPPORTED_OBS_MODES = ("state", "state_dict", "none", "sensor_data", "rgb", "rgbd", "pointcloud")
SUPPORTED_OBS_MODES = ("state", "state_dict", "none", "sensor_data", "rgb", "rgbd", "pointcloud", "voxel")
SUPPORTED_REWARD_MODES = ("normalized_dense", "dense", "sparse", "none")
SUPPORTED_RENDER_MODES = ("human", "rgb_array", "sensors")
"""The supported render modes. Human opens up a GUI viewer. rgb_array returns an rgb array showing the current environment state.
Expand All @@ -120,6 +123,8 @@ class BaseEnv(gym.Env):
"""all sensor configurations parsed from self._sensor_configs and agent._sensor_configs"""
_agent_sensor_configs: Dict[str, BaseSensorConfig]
"""all agent sensor configs parsed from agent._sensor_configs"""
_obs_mode_config: Dict
"""configurations for converting sensor data to observations under the current observation mode (e.g. voxel size and scene bounds for voxel observations)"""
_human_render_cameras: Dict[str, Camera]
"""cameras used for rendering the current environment retrievable via `env.render_rgb_array()`. These are not used to generate observations"""
_default_human_render_camera_configs: Dict[str, CameraConfig]
Expand All @@ -146,6 +151,7 @@ def __init__(
shader_dir: str = "default",
enable_shadow: bool = False,
sensor_configs: dict = None,
obs_mode_config: dict = None,
human_render_camera_configs: dict = None,
robot_uids: Union[str, BaseAgent, List[Union[str, BaseAgent]]] = None,
sim_cfg: Union[SimConfig, dict] = dict(),
Expand All @@ -156,6 +162,7 @@ def __init__(
self.num_envs = num_envs
self.reconfiguration_freq = reconfiguration_freq if reconfiguration_freq is not None else 0
self._reconfig_counter = 0
self._obs_mode_config = obs_mode_config
self._custom_sensor_configs = sensor_configs
self._custom_human_render_camera_configs = human_render_camera_configs
self._parallel_gui_render_enabled = parallel_gui_render_enabled
Expand Down Expand Up @@ -408,7 +415,7 @@ def get_obs(self, info: Dict = None):
obs = common.flatten_state_dict(state_dict, use_torch=True, device=self.device)
elif self._obs_mode == "state_dict":
obs = self._get_obs_state_dict(info)
elif self._obs_mode in ["sensor_data", "rgbd", "rgb", "pointcloud"]:
elif self._obs_mode in ["sensor_data", "rgbd", "rgb", "pointcloud", "voxel"]:
obs = self._get_obs_with_sensor_data(info)
if self._obs_mode == "rgbd":
obs = sensor_data_to_rgbd(obs, self._sensors, rgb=True, depth=True, segmentation=True)
Expand All @@ -417,6 +424,14 @@ def get_obs(self, info: Dict = None):
obs = sensor_data_to_rgbd(obs, self._sensors, rgb=True, depth=False, segmentation=True)
elif self.obs_mode == "pointcloud":
obs = sensor_data_to_pointcloud(obs, self._sensors)
elif self.obs_mode == "voxel":
# assert on _obs_mode_config here, and pass them to the convertion function
assert self._obs_mode_config != None, "You mush pass in configs in voxel observation mode via obs_mode_config keyword arg in gym.make(). See the Maniskill docs for details. No such config detected."
assert "voxel_size" in self._obs_mode_config.keys(), "Lacking voxel_size (voxel size) in observation configs"
assert "coord_bounds" in self._obs_mode_config.keys(), "Lacking coord_bounds (coordinate bounds) in observation configs"
assert "device" in self._obs_mode_config.keys(), "Lacking device (device for voxelizations) in observation configs"
assert "segmentation" in self._obs_mode_config.keys(), "Lacking segmentation (a boolean indicating whether including voxel segmentations) in observation configs"
obs = sensor_data_to_voxel(obs, self._sensors, self._obs_mode_config)
else:
raise NotImplementedError(self._obs_mode)
return obs
Expand Down
1 change: 1 addition & 0 deletions mani_skill/envs/utils/observations/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
from .observations import *
from .voxelizer import *
97 changes: 96 additions & 1 deletion mani_skill/envs/utils/observations/observations.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from mani_skill.sensors.base_sensor import BaseSensor, BaseSensorConfig
from mani_skill.sensors.camera import Camera
from mani_skill.utils import common

from mani_skill.envs.utils.observations.voxelizer import VoxelGrid

def sensor_data_to_rgbd(
observation: Dict,
Expand Down Expand Up @@ -113,3 +113,98 @@ def sensor_data_to_pointcloud(observation: Dict, sensors: Dict[str, BaseSensor])
# observation["pointcloud"]["segmentation"].numpy().astype(np.uint16)
# )
return observation

def sensor_data_to_voxel(
observation: Dict,
sensors: Dict[str, BaseSensor],
obs_mode_config: Dict
):
"""convert all camera data in sensor to voxel grid"""
sensor_data = observation["sensor_data"]
camera_params = observation["sensor_param"]
coord_bounds = obs_mode_config["coord_bounds"] # [x_min, y_min, z_min, x_max, y_max, z_max] - the metric volume to be voxelized
voxel_size = obs_mode_config["voxel_size"] # size of the voxel grid (assuming cubic)
device = obs_mode_config["device"] # device on which doing voxelization
seg = obs_mode_config["segmentation"] # device on which doing voxelization
pcd_rgb_observations = dict()

# Collect all cameras' observations
for (cam_uid, images), (sensor_uid, sensor) in zip(
sensor_data.items(), sensors.items()
):
assert cam_uid == sensor_uid
if isinstance(sensor, Camera):
cam_data = {}

# Extract point cloud and segmentation data
images: Dict[str, torch.Tensor]
position = images["PositionSegmentation"]
if seg:
segmentation = position[..., 3].clone()
position = position.float()
position[..., 3] = 1 # convert to homogeneious coordinates
position[..., :3] = (
position[..., :3] / 1000.0
) # convert the raw depth from millimeters to meters

# Convert to world space position and update camera data
cam2world = camera_params[cam_uid]["cam2world_gl"]
xyzw = position.reshape(position.shape[0], -1, 4) @ cam2world.transpose(
1, 2
)
xyz = xyzw[..., :3] / xyzw[..., 3].unsqueeze(-1) # dehomogeneize
cam_data["xyz"] = xyz
if seg:
cam_data["seg"] = segmentation.reshape(segmentation.shape[0], -1, 1)

# Extract rgb data
if "Color" in images:
rgb = images["Color"][..., :3].clone()
rgb = rgb / 255 # convert to range [0, 1]
cam_data["rgb"] = rgb.reshape(rgb.shape[0], -1, 3)

pcd_rgb_observations[cam_uid] = cam_data

# just free sensor_data to save memory
for k in pcd_rgb_observations.keys():
del observation["sensor_data"][k]

# merge features from different cameras together
pcd_rgb_observations = common.merge_dicts(pcd_rgb_observations.values())
for key, value in pcd_rgb_observations.items():
pcd_rgb_observations[key] = torch.concat(value, axis=1)

# prepare features for voxel convertions
xyz_dev = pcd_rgb_observations["xyz"].to(device)
rgb_dev = pcd_rgb_observations["rgb"].to(device)
if seg:
seg_dev = pcd_rgb_observations["seg"].to(device)
coord_bounds = torch.tensor(coord_bounds, device=device).unsqueeze(0)
batch_size = xyz_dev.shape[0]
max_num_coords = rgb_dev.shape[1]
vox_grid = VoxelGrid(
coord_bounds=coord_bounds,
voxel_size=voxel_size,
device=device,
batch_size=batch_size,
feature_size=3,
max_num_coords=max_num_coords,
)

# convert to the batched voxel grids
# voxel 11D features contain: 3 (pcd xyz coordinates) + 3 (rgb) + 3 (voxel xyz indices) + 1 (seg id if applicable) + 1 (occupancy)
if seg: # add voxel segmentations
voxel_grid = vox_grid.coords_to_bounding_voxel_grid(xyz_dev,
coord_features=rgb_dev,
coord_bounds=coord_bounds,
clamp_vox_id=True,
pcd_seg=seg_dev)
else: # no voxel segmentation
voxel_grid = vox_grid.coords_to_bounding_voxel_grid(xyz_dev,
coord_features=rgb_dev,
coord_bounds=coord_bounds,
clamp_vox_id=False)

# update voxel grids to the observation dict
observation["voxel_grid"] = voxel_grid
return observation
Loading