Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enable to use multiple rgb encoders per camera in diffusion policy #484

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

HiroIshida
Copy link

@HiroIshida HiroIshida commented Oct 24, 2024

What this does

This PR enables to configure multiple rgb encoders for each cameras, which is done in the original diffusion policy paper. Not for sure in general setting, but at least in my task, this seems to increase the success rate of deployment slightly.
Current diffusion policy shared the single RGB encoder for all cameras.
This fixes #483

How it was tested

a) regression test

Prepare the following script and check that the behavior of the policy does not change before and the after the PR by comparing the hash value of the output.

import yaml
import torch
from hashlib import sha256
from lerobot.common.policies.diffusion.modeling_diffusion import DiffusionPolicy, DiffusionConfig

def run(camera_names, rgb_encoder_per_camera):
    torch.manual_seed(0)

    resol = 56
    input_shapes = {"observation.state": [6]}
    for name in camera_names:
        input_shapes[f"observation.image.{name}"] = [3, resol, resol]
    output_shapes = {"action": [6]}

    # determine normalizer
    normalization_mode = {"observation.state": "min_max"}
    for name in camera_names:
        normalization_mode[f"observation.image.{name}"] = "mean_std"

    # load stats
    with open('stats-diffusion.yaml', 'r') as f:
        stats = yaml.load(f, Loader=yaml.FullLoader)
    effective_keys = list(output_shapes.keys()) + list(input_shapes.keys()) + ["episode_index", "frame_indx", "index", "next.done",  "timestamp"]
    effective_key_set = set(effective_keys)
    for key, value in stats.items():
        if key not in effective_key_set:
            continue
        inner_dict = {}
        for key_inner, value_inner in value.items():
            inner_dict[key_inner] = torch.tensor(value_inner)
        stats[key] = inner_dict

    # create policy
    conf = DiffusionConfig(input_shapes=input_shapes, output_shapes=output_shapes, input_normalization_modes=normalization_mode, crop_shape=None, num_inference_steps=5)
    if rgb_encoder_per_camera:
        conf.rgb_encoder_per_camera = True
    policy = DiffusionPolicy(conf, dataset_stats=stats)

    # inference
    observation = {"observation.state": torch.rand(1, 6)}
    for name in camera_names:
        observation[f"observation.image.{name}"] = torch.rand(1, 3, resol, resol)
                   

    action = policy.select_action(observation)

    # compute hash of action
    hash_value = sha256(action.numpy().tobytes()).hexdigest()
    print("hash value of output: ", hash_value)


if __name__ == "__main__":
    # case 1
    print(f"test with only gripper camera")
    camera_names = ["gripper"]
    run(camera_names, False)

    # case 2
    print(f"test with gripper and webcam cameras where rgb_encoder_per_camera is not set")
    camera_names = ["gripper", "webcam"]
    run(camera_names, False)

    # case 3
    print(f"test with gripper and webcam cameras where rgb_encoder_per_camera is set")
    has_rgb_encoder_per_camera = "rgb_encoder_per_camera" in DiffusionConfig.__annotations__
    if not has_rgb_encoder_per_camera:
        print("DiffusionConfig does not have attribute rgb_encoder_per_camera")
    else:
        camera_names = ["gripper", "webcam"]
        run(camera_names, True)

Before this PR

integral@umejuice:~/python/snippets/python/ext_examples/lerobot$ python3 diffusion_policy_multiple_cameras.py 
test with only gripper camera
hash value of output:  b3f55809cad72b41e484ec589d9ace5acf1f16439c4b0d71817dc6cbb41d0f4f
test with gripper and webcam cameras where rgb_encoder_per_camera is not set
hash value of output:  d08244f48c83964d90a3224790c9d1f9fc20d58c870420c028b3be46a0ff3b22
test with gripper and webcam cameras where rgb_encoder_per_camera is set
DiffusionConfig does not have attribute rgb_encoder_per_camera

After this PR

integral@umejuice:~/python/snippets/python/ext_examples/lerobot$ python3 diffusion_policy_multiple_cameras.py 
test with only gripper camera
hash value of output:  b3f55809cad72b41e484ec589d9ace5acf1f16439c4b0d71817dc6cbb41d0f4f
test with gripper and webcam cameras where rgb_encoder_per_camera is not set
hash value of output:  d08244f48c83964d90a3224790c9d1f9fc20d58c870420c028b3be46a0ff3b22
test with gripper and webcam cameras where rgb_encoder_per_camera is set
hash value of output:  a636c3ad1313eda2504b008d92eb823d58acc6344582bb4922d73892371e5da0

You can see the match of hash values.

b) validity check

The above test include the case (see case3 ), which is run only when the DiffusionConfig has rgb_encoder_per_camera attribute, so activate only after this PR. And as you can see, there is no error occured and output the action.

c) tested on real robot

I trained the diffusion policy with the rgb_encoder_per_camera=True , and checked that policy worked well. I cannot say concrete stuff but the performance is slightly better than the original.

How to checkout & try? (for the reviewer)

Try the above script and please check the hash value before and after this PR. Before the run, you need to save the following yaml file as 'stats-diffusion.yaml.

action:
  max:
  - 119.3543701171875
  - 49.64094543457031
  - 111.9273910522461
  - 7.118511199951172
  - 106.76649475097656
  - 26.715518951416016
  mean:
  - 94.45906829833984
  - 18.427291870117188
  - 73.95457458496094
  - 1.212478518486023
  - 79.41802978515625
  - 6.242947578430176
  min:
  - 77.6657943725586
  - -29.830978393554688
  - 39.58110809326172
  - -8.624532699584961
  - 44.424076080322266
  - -18.17777442932129
  std:
  - 8.80908489227295
  - 16.74974822998047
  - 14.223931312561035
  - 1.9911779165267944
  - 10.933554649353027
  - 9.21682357788086
episode_index:
  max:
  - 69.0
  mean:
  - 34.674591064453125
  min:
  - 0.0
  std:
  - 20.254430770874023
frame_index:
  max:
  - 176.0
  mean:
  - 61.95341873168945
  min:
  - 0.0
  std:
  - 37.45309829711914
index:
  max:
  - 8585.0
  mean:
  - 4292.5
  min:
  - 0.0
  std:
  - 2478.564453125
next.done:
  max:
  - 1.0
  mean:
  - 0.008152804337441921
  min:
  - 0.0
  std:
  - 0.08992405980825424
observation.image.gripper:
  max:
  - - - 1.0
  - - - 1.0
  - - - 1.0
  mean:
  - - - 0.3928315341472626
  - - - 0.39230185747146606
  - - - 0.360833078622818
  min:
  - - - 0.0
  - - - 0.0
  - - - 0.0
  std:
  - - - 0.2355380803346634
  - - - 0.23801472783088684
  - - - 0.2544608414173126
observation.image.webcam:
  max:
  - - - 1.0
  - - - 1.0
  - - - 1.0
  mean:
  - - - 0.405765175819397
  - - - 0.6512684226036072
  - - - 0.6204251646995544
  min:
  - - - 0.0
  - - - 0.0
  - - - 0.0
  std:
  - - - 0.1711459755897522
  - - - 0.1499525010585785
  - - - 0.1772834062576294
observation.state:
  max:
  - 113.90892791748047
  - 49.61915588378906
  - 111.92845916748047
  - 7.113752841949463
  - 106.78263092041016
  - 26.70062255859375
  mean:
  - 95.00767517089844
  - 15.08882999420166
  - 76.28932189941406
  - 1.3102922439575195
  - 80.53984832763672
  - 6.705284595489502
  min:
  - 77.66748809814453
  - -29.83199119567871
  - 39.72526550292969
  - -5.572042942047119
  - 44.40793991088867
  - -18.174049377441406
  std:
  - 8.437381744384766
  - 18.46141242980957
  - 14.492023468017578
  - 1.9003233909606934
  - 10.978218078613281
  - 8.922645568847656
timestamp:
  max:
  - 17.600000381469727
  mean:
  - 6.195342540740967
  min:
  - 0.0
  std:
  - 3.745309591293335

@HiroIshida HiroIshida force-pushed the diffusion_policy_with_multi_rgb_encoders branch from 38c883b to 9633d87 Compare October 24, 2024 13:41
@HiroIshida HiroIshida force-pushed the diffusion_policy_with_multi_rgb_encoders branch from 9633d87 to ac1cd33 Compare October 24, 2024 13:58
@HiroIshida HiroIshida changed the title feat: enable to use multiple rgb encoders per camera feat: enable to use multiple rgb encoders per camera in diffusion policy Oct 24, 2024
@HiroIshida HiroIshida marked this pull request as ready for review October 24, 2024 20:22
@HiroIshida
Copy link
Author

@alexander-soare
Could you review this PR? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Diffusion policy's RGB encoder shared by all cameras. Is this intended?
1 participant