Skip to content

gobanana520/CAST-STEM-2024

Repository files navigation

CAST-STEM 2024 Summer Camp Project

Python ROS Pytorch License Website

This is the repository for the CAST-STEM 2024 Summer Camp project. The project aims to estimate hand and object poses from recordings captured by the Multi-Camera System. The project website can be found here.


Contents


Poster

Project Instruction Video

Project Video

Click on the image to watch the project instruction video.

Download Links


License

This project is licensed under the GPL-3.0 License - see the LICENSE file for details.


Prerequisites

  • Git

    • For Linux:
    sudo apt-get install git
  • Conda Environment Manager

    Please refer to the official instruction Installing Miniconda to install the miniconda.

  • Code Editor (Visual Studio Code for example)

Environment Setup

  1. Create the Conda Environment
conda create --name summer_camp python=3.11
conda activate summer_camp
  1. Clone the Repository
git clone --recursive https://github.com/gobanana520/CAST-STEM-2024.git
cd CAST-STEM-2024
  1. Install Dependencies
  • For Linux & Windows
python -m pip install --no-cache-dir -r requirements.txt
  • For MacOS
python -m pip install --no-cache-dir -r requirements_macos.txt
  1. ROS Environment Setup [Optional]

If you plan to run the ROS locally, refer to the ROS Environment Setup document for detailed steps. You can then run roscore to start the ROS master and debug your code under the ROS environment.


Project Schedule

Week 1: Introduction to the Basics

Week 2: Data Collection (Calibration)

  • Tasks

    • ✅ Camera Intrinsics Extraction

      • The camera intrinsics are saved under the ./data/calibration/intrinsics/<camera_serial>_640x480.json file.
    • ✅ Camera Extrinsics Calibration

      • We use a large calibration board to calibrate the camera extrinsics in pairs. Below is the usage demo of the tool Vicalib: camera_calibration
      • The camera extrinsics are saved under the ./data/calibration/extrinsics/extrinsics_<data>/extrinsics.json file.
    • ✅ Hand Shape Calibration.

      • The MANO hand shapes are saved under the ./data/calibration/mano/<person_id>/mano.json file.
    • ✅ Get familiar with data collection with the Multi-Camera System.

      • Launch all the Realsense Cameras with ROS.
      • Use RVIZ to visualize the camera images.
      • Monitor the camera status.
      • Command to record the rosbag from specific topics.

Week 3: Data Collection (Continued)

  • Objects Used in the Dataset

    • The dataset contains the following objects: Object List
    • The object models are saved under the ./data/models folder. You could use Meshlab to view the object models.
  • Tasks

    • ✅ Collect the data with the Multi-Camera System.
      • Each person will pick one object.
      • Use single / two hands to manipulate the object.
      • Recording is saved to the rosbag file.
    • ✅ Extract the images from the rosbag recordings.
  • Homeworks

    • HW1: Rosbag Extraction
      • Write the class RosbagExtractor
        • to extract the images from the rosbag recordings for all the camera image topics.
      • the extracted images should be saved in the ./data/recordings/<person_id>_<rosbag_name> folder following below structure
        <person_id>_<rosbag_name> # the recording folder name
        ├── 037522251142          # the camera serial number
        │   ├── color_000000.jpg  # the color image color_xxxxxx.jpg
        │   └── depth_000000.png  # the depth image depth_xxxxxx.png
        │   └── ...
        ├── 043422252387
        │   ├── color_000000.jpg
        │   ├── depth_000000.png
        │   ├── ...
        ├── ...
        ├── 117222250549
        │   ├── color_000000.jpg
        │   ├── depth_000000.png
        │   ├── ...
        
      • References:
    • HW2: Metadata Generation
      • For each extracted recording, the metadata should be generated under the sequence folder with filename meta.json.
      • The object_id (G01_1,...,G31_4) could be found in the Week 3 section.
      • Below is an example of the meta.json file:
        {
          // the camera serial numbers
          "serials": [
            "037522251142",
            "043422252387",
            "046122250168",
            "105322251225",
            "105322251564",
            "108222250342",
            "115422250549",
            "117222250549"
          ],
          // the image width
          "width": 640,
          // the image height
          "height": 480,
          // the extrinsics folder name
          "extrinsics": "extrinsics_20240611",
          // the person name
          "mano_calib": "john",
          // the object id
          "object_ids": "G31_4",
          // the hand sides in the recording
          // (if both hands are used, the order should be right first and then left)
          "mano_sides": ["right", "left"],
          // the number of frames in the recording
          "num_frames": 1024
        }

Week 4: Data Processing (Handmarks & Object Masks)

  • Slides

  • Tasks

    • ✅ Handmarks Detection by MediaPipe
    • ✅ Label the initial Object Mask mannually.
    • ✅ Use XMem to generate the remaining masks for all the recordings.
    • ✅ Generate 3D hand joints by Triangulation and RANSAC.
    • ✅ Setup the HaMeR python environment.
    • ✅ Setup the FoundationPose python environment.
  • Homeworks

    • HW1: Handmarks Detection
      • Write the class MPHandDetector to detect the 2D handmarks from the extracted images using the MediaPipe.
      • The detected handmarks should be saved in the ./data/recordings/<sequence_name>/processed/hand_detection folder following below structure:
        <sequence_name>/processed/hand_detection
        ├── mp_handmarks_results.npz  # the detected handmarks results
        └── vis                       # the folder to save the visualization results
            ├── mp_handmarks
            │   ├── vis_000000.png    # the visualization image of the handmarks
            │   ├── vis_000001.png
            │   ├── ...
            └── mp_handmarks.mp4      # the visualization video of the handmarks
        
      • The detected handmarks should be saved as the numpy array with the shape of (num_hands, num_joints, 2).
      • The detected handmarks should be saved in the image coordinate system and unnormalized.
      • The detected handmarks should be saved in the order of the right hand first, and then the left hand.
      • References:
    • HW2: Label the initial Object Mask mannually.
      • The mask_id (1, 2,...,10) of each object could be found in the Week 3 section.
      • Dwonload the pretrained models [4.3GB] for Segment Anything Model (SAM).
        • For linux like OS: run bash ./config/sam/download_sam_model.sh in the terminal.
        • Or you could download the models from the Box and put them under ./config/sam folder.
      • Run the mask label toolkit to label the object mask in each camera view.
        python ./tools/04_run_mask_label_toolkit.py
        mask_label_toolkit
        • Click ... to select the image.
        • Ctrl + Left Click to add positive point (green color).
        • Ctrl + Right Click to add negative point (red color).
        • R to reset the points.
        • Click - and + to set the mask id, and click Add Mask to add the mask.
        • Click Save Mask to save the mask.
        • The mask and visualization images will be saved in the ./data/recordings/<sequence_name>/processed/segmentation/init_segmentation/<camera_serial> folder.
      • HW3: Generate one 3D hand joint by triangulation and RANSAC.

Week 5: Data Processing (Hand & Object Pose Estimation)

  • Tasks

    • ✅ Use the HaMeR to estimate the 2D handmarks in each camera view.
      • Generate the input bounding box for the HaMeR.
      • Run HaMeR model to estimate the 2D handmarks.
    • ✅ Use the FoundationPose to estimate the object pose in each camera view.
      • Setup the FoundationPose python environment.
      • Write the DataReader to load the input data for the FoundationPose for our sequences.
      • Run the FoundationPose model to estimate the object pose.
    • ✅ Optimize the final MANO hand pose.
      • Generate 3D hand joints from handmarks of HaMeR.
      • Optimize the MANO hand pose to fit the 3D hand joints.
    • ✅ Optimize the final Object Pose.
      • Generate the best 3D object pose from the FoundationPose results.
      • Optimize the object pose to fit the 3D inlier FD poses.
    • ✅ Generate the final 3D hand and object poses.
      • Generate the final hand and object poses from the optimized MANO hand pose and object pose.
        • The final MANO hand poses is save to poses_m.npy file under each sequence folder.
        • The final 6D object poses is save to poses_o.npy file under each sequence folder.
    • ✅ Visualization of the final poses
      • The rendered images are saved in the ./data/recordings/<sequence_name>/processed/sequence_rendering folder. And
      • Tthe rendered video is saved to vis_<sequence_name>.mp4 file under each sequence folder.
  • Homeworks


Processed Results

Videos demonstrating the final processed results of the project can be found below:

vis_ida
vis_isaac
vis_lyndon
vis_may
vis_nicole
vis_reanna
vis_rebecca