Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Image support for VQ-BeT #407

Open
bkpcoding opened this issue Sep 3, 2024 · 2 comments
Open

Multi-Image support for VQ-BeT #407

bkpcoding opened this issue Sep 3, 2024 · 2 comments
Assignees

Comments

@bkpcoding
Copy link

Hello, I wanted to ask if there is a possibility to have VQ-BeT running on multiple camera's for some environments that have different views, like Robomimic? If so can someone give me points on what exactly I need to change, I would be happy to submit a PR once I get it working on my side and finish the ICLR deadline!

Currently, if I understand correctly we need to change the VQBeTRgbEncoder, it seems like it supports multiple camera views but there is an assert statement that checks the length of the image views to be 1. Is there a specific reason for this assert statement, i.e., I need to change something else?

@alexander-soare
Copy link
Collaborator

Hi! This shouldn't be too difficult. Check out this older PR that did something similar with Diffusion Policy: #218.

You'll need to manage the plumbing. From an actual NN architecture perspective it's pretty basic, just add the image in as another observation token. Also check ACT as that will be more similar to VQ-BeT in this sense: it treats each image as a separate token.

On another note, we are probably going to do a fairly major refactor to the way policies handle inputs/outputs some time soon.

@alexander-soare alexander-soare self-assigned this Sep 4, 2024
@bkpcoding
Copy link
Author

Thank you so much for pointing out the PR. I think I will do something similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants