feat: enable to use multiple rgb encoders per camera in diffusion policy #484

HiroIshida · 2024-10-24T13:36:53Z

What this does

This PR enables to configure multiple rgb encoders for each cameras, which is done in the original diffusion policy paper. Not for sure in general setting, but at least in my task, this seems to increase the success rate of deployment slightly.
Current diffusion policy shared the single RGB encoder for all cameras.
This fixes #483

How it was tested

a) regression test

Prepare the following script and check that the behavior of the policy does not change before and the after the PR by comparing the hash value of the output.

import yaml
import torch
from hashlib import sha256
from lerobot.common.policies.diffusion.modeling_diffusion import DiffusionPolicy, DiffusionConfig

def run(camera_names, rgb_encoder_per_camera):
    torch.manual_seed(0)

    resol = 56
    input_shapes = {"observation.state": [6]}
    for name in camera_names:
        input_shapes[f"observation.image.{name}"] = [3, resol, resol]
    output_shapes = {"action": [6]}

    # determine normalizer
    normalization_mode = {"observation.state": "min_max"}
    for name in camera_names:
        normalization_mode[f"observation.image.{name}"] = "mean_std"

    # load stats
    with open('stats-diffusion.yaml', 'r') as f:
        stats = yaml.load(f, Loader=yaml.FullLoader)
    effective_keys = list(output_shapes.keys()) + list(input_shapes.keys()) + ["episode_index", "frame_indx", "index", "next.done",  "timestamp"]
    effective_key_set = set(effective_keys)
    for key, value in stats.items():
        if key not in effective_key_set:
            continue
        inner_dict = {}
        for key_inner, value_inner in value.items():
            inner_dict[key_inner] = torch.tensor(value_inner)
        stats[key] = inner_dict

    # create policy
    conf = DiffusionConfig(input_shapes=input_shapes, output_shapes=output_shapes, input_normalization_modes=normalization_mode, crop_shape=None, num_inference_steps=5)
    if rgb_encoder_per_camera:
        conf.rgb_encoder_per_camera = True
    policy = DiffusionPolicy(conf, dataset_stats=stats)

    # inference
    observation = {"observation.state": torch.rand(1, 6)}
    for name in camera_names:
        observation[f"observation.image.{name}"] = torch.rand(1, 3, resol, resol)
                   

    action = policy.select_action(observation)

    # compute hash of action
    hash_value = sha256(action.numpy().tobytes()).hexdigest()
    print("hash value of output: ", hash_value)


if __name__ == "__main__":
    # case 1
    print(f"test with only gripper camera")
    camera_names = ["gripper"]
    run(camera_names, False)

    # case 2
    print(f"test with gripper and webcam cameras where rgb_encoder_per_camera is not set")
    camera_names = ["gripper", "webcam"]
    run(camera_names, False)

    # case 3
    print(f"test with gripper and webcam cameras where rgb_encoder_per_camera is set")
    has_rgb_encoder_per_camera = "rgb_encoder_per_camera" in DiffusionConfig.__annotations__
    if not has_rgb_encoder_per_camera:
        print("DiffusionConfig does not have attribute rgb_encoder_per_camera")
    else:
        camera_names = ["gripper", "webcam"]
        run(camera_names, True)

Before this PR

integral@umejuice:~/python/snippets/python/ext_examples/lerobot$ python3 diffusion_policy_multiple_cameras.py 
test with only gripper camera
hash value of output:  b3f55809cad72b41e484ec589d9ace5acf1f16439c4b0d71817dc6cbb41d0f4f
test with gripper and webcam cameras where rgb_encoder_per_camera is not set
hash value of output:  d08244f48c83964d90a3224790c9d1f9fc20d58c870420c028b3be46a0ff3b22
test with gripper and webcam cameras where rgb_encoder_per_camera is set
DiffusionConfig does not have attribute rgb_encoder_per_camera

After this PR

integral@umejuice:~/python/snippets/python/ext_examples/lerobot$ python3 diffusion_policy_multiple_cameras.py 
test with only gripper camera
hash value of output:  b3f55809cad72b41e484ec589d9ace5acf1f16439c4b0d71817dc6cbb41d0f4f
test with gripper and webcam cameras where rgb_encoder_per_camera is not set
hash value of output:  d08244f48c83964d90a3224790c9d1f9fc20d58c870420c028b3be46a0ff3b22
test with gripper and webcam cameras where rgb_encoder_per_camera is set
hash value of output:  a636c3ad1313eda2504b008d92eb823d58acc6344582bb4922d73892371e5da0

You can see the match of hash values.

b) validity check

The above test include the case (see case3 ), which is run only when the DiffusionConfig has rgb_encoder_per_camera attribute, so activate only after this PR. And as you can see, there is no error occured and output the action.

c) tested on real robot

I trained the diffusion policy with the rgb_encoder_per_camera=True , and checked that policy worked well. I cannot say concrete stuff but the performance is slightly better than the original.

How to checkout & try? (for the reviewer)

Try the above script and please check the hash value before and after this PR. Before the run, you need to save the following yaml file as 'stats-diffusion.yaml.

action:
  max:
  - 119.3543701171875
  - 49.64094543457031
  - 111.9273910522461
  - 7.118511199951172
  - 106.76649475097656
  - 26.715518951416016
  mean:
  - 94.45906829833984
  - 18.427291870117188
  - 73.95457458496094
  - 1.212478518486023
  - 79.41802978515625
  - 6.242947578430176
  min:
  - 77.6657943725586
  - -29.830978393554688
  - 39.58110809326172
  - -8.624532699584961
  - 44.424076080322266
  - -18.17777442932129
  std:
  - 8.80908489227295
  - 16.74974822998047
  - 14.223931312561035
  - 1.9911779165267944
  - 10.933554649353027
  - 9.21682357788086
episode_index:
  max:
  - 69.0
  mean:
  - 34.674591064453125
  min:
  - 0.0
  std:
  - 20.254430770874023
frame_index:
  max:
  - 176.0
  mean:
  - 61.95341873168945
  min:
  - 0.0
  std:
  - 37.45309829711914
index:
  max:
  - 8585.0
  mean:
  - 4292.5
  min:
  - 0.0
  std:
  - 2478.564453125
next.done:
  max:
  - 1.0
  mean:
  - 0.008152804337441921
  min:
  - 0.0
  std:
  - 0.08992405980825424
observation.image.gripper:
  max:
  - - - 1.0
  - - - 1.0
  - - - 1.0
  mean:
  - - - 0.3928315341472626
  - - - 0.39230185747146606
  - - - 0.360833078622818
  min:
  - - - 0.0
  - - - 0.0
  - - - 0.0
  std:
  - - - 0.2355380803346634
  - - - 0.23801472783088684
  - - - 0.2544608414173126
observation.image.webcam:
  max:
  - - - 1.0
  - - - 1.0
  - - - 1.0
  mean:
  - - - 0.405765175819397
  - - - 0.6512684226036072
  - - - 0.6204251646995544
  min:
  - - - 0.0
  - - - 0.0
  - - - 0.0
  std:
  - - - 0.1711459755897522
  - - - 0.1499525010585785
  - - - 0.1772834062576294
observation.state:
  max:
  - 113.90892791748047
  - 49.61915588378906
  - 111.92845916748047
  - 7.113752841949463
  - 106.78263092041016
  - 26.70062255859375
  mean:
  - 95.00767517089844
  - 15.08882999420166
  - 76.28932189941406
  - 1.3102922439575195
  - 80.53984832763672
  - 6.705284595489502
  min:
  - 77.66748809814453
  - -29.83199119567871
  - 39.72526550292969
  - -5.572042942047119
  - 44.40793991088867
  - -18.174049377441406
  std:
  - 8.437381744384766
  - 18.46141242980957
  - 14.492023468017578
  - 1.9003233909606934
  - 10.978218078613281
  - 8.922645568847656
timestamp:
  max:
  - 17.600000381469727
  mean:
  - 6.195342540740967
  min:
  - 0.0
  std:
  - 3.745309591293335

HiroIshida · 2024-10-24T20:22:47Z

@alexander-soare
Could you review this PR? :)

HiroIshida force-pushed the diffusion_policy_with_multi_rgb_encoders branch from 38c883b to 9633d87 Compare October 24, 2024 13:41

feat: enable to use multiple rgb encoders per camera

ac1cd33

HiroIshida force-pushed the diffusion_policy_with_multi_rgb_encoders branch from 9633d87 to ac1cd33 Compare October 24, 2024 13:58

HiroIshida changed the title ~~feat: enable to use multiple rgb encoders per camera~~ feat: enable to use multiple rgb encoders per camera in diffusion policy Oct 24, 2024

HiroIshida marked this pull request as ready for review October 24, 2024 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enable to use multiple rgb encoders per camera in diffusion policy #484

feat: enable to use multiple rgb encoders per camera in diffusion policy #484

HiroIshida commented Oct 24, 2024 •

edited

Loading

HiroIshida commented Oct 24, 2024

feat: enable to use multiple rgb encoders per camera in diffusion policy #484

Are you sure you want to change the base?

feat: enable to use multiple rgb encoders per camera in diffusion policy #484

Conversation

HiroIshida commented Oct 24, 2024 • edited Loading

What this does

How it was tested

a) regression test

b) validity check

c) tested on real robot

How to checkout & try? (for the reviewer)

HiroIshida commented Oct 24, 2024

HiroIshida commented Oct 24, 2024 •

edited

Loading