Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement on Spot batch VLM predicate evaluation pipeline #285

Open
wants to merge 48 commits into
base: master
Choose a base branch
from

Conversation

lf-zhao
Copy link
Collaborator

@lf-zhao lf-zhao commented May 8, 2024

Implement a new pipeline for VLM predicate evaluation / VLM classifier:
(Note: The previous PR #284 hasn't been merged and it carries over. That one was mainly for keeping record. This is what actually working on Spot)

Pipeline:

  • Initialize VLM predicates in the env, now e.g., On and Inside
  • In Spot env reset, Spot builds initial obs and need to see all objects. We provide all VLM predicates and calculate all GroundAtom queries to evaluate for VLM classifiers, and save it as a dictionary.
  • In Spot env step, Spot takes one step (skill level) and observes a set of new images, we only update VLM ground atoms of all VLM predicates but evaluate VLMGroundAtoms with the visible objects.
  • The evaluation in reset and step are done in batch manner: a batch contains all relevant VLM ground atoms and a collection of Spot camera images (e.g., 6 cameras), and return a list to parse.
  • In Spot perceiver (obs -> object-centric state), save VLM predicates and ground atoms to state.
  • In abstract (object-centric state -> ground atoms, symbolic state), if it's VLM ground atom, we directly get from state.

Command:
python predicators/main.py --spot_robot_ip 192.168.80.3 --spot_graph_nav_map b45-621 --env lis_spot_block_floor_env --approach spot_wrapper[oracle] --bilevel_plan_without_sim True --seed 0 --perceiver spot_perceiver --spot_vlm_eval_predicate True --vlm_eval_verbose True --num_train_tasks 0 --num_test_tasks 1

Modifications:

  • Structs: Add VLM predicate and VLM ground atom class. Partial perception state and Observation class include additional information on visible objects, images, ground atoms from last state, and more.
  • Model: Add OpenAI VLM class. Also add some demo VLM predicates based on VLM and their prompts, such as for On, Inside, Blocking.
  • VLM predicate evaluation: Add a function to evaluate all visible ground atoms batched in one query.
  • Spot VLM object perception: Generate visible ground atoms for visible objects and all VLM predicates.

…; we need to figure out a better way to sync later!
…e-eval

# Conflicts:
#	predicators/pretrained_model_interface.py
#	setup.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant