CASQ: Enhancing Human-Object Interaction Detection via Supplementary Semantic Information for Interaction Queries

In this study, we propose a novel method that utilizes supplementary semantic information to generate dynamic interaction queries per image. Our method involves embedding object categories into vector space using a pre-trained CLIP model and incorporating attention information from the semantic features, which enhances its representation and query capabilities.

Our proposed CASQ significantly improves the accuracy and performance of HOI detection, accounting for variations in context and characteristics of the interaction.

1. Environmental Setup

We experimented three modules in colab environmental

!pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
!pip install cython scipy
!pip install pycocotools
!pip install opencv-python
!pip install wandb
!pip install transformers
!pip install ftfy

%cd /CASQ/hotr/CLIP
!pip install -r requirements.txt
!pip install transformers==4.12.5

2. How to Train/Test

For both training and testing, you can either run on a single GPU or multiple GPUs.

# Train from epoch 1
!python main.py \
		--group_name vcoco \
		--run_name vcoco_single_run_000001 \
		--HOIDet \
		--validate  \
		--share_enc \
		--pretrained_dec \
		--lr 1e-4 \
		--num_hoi_queries 16 \
		--set_cost_idx 10 \
		--hoi_act_loss_coef 10 \
		--hoi_eos_coef 0.1 \
		--temperature 0.05 \
		--no_aux_loss \
		--hoi_aux_loss \
		--dataset_file vcoco \
		--frozen_weights https://dl.fbaipublicfiles.com/detr/detr-r50-e632da11.pth \
		--data_path /v-coco/data  \
		--output_dir  /checkpoints

# To train from a resume epoch
!python main.py \
		--group_name HOTR_vcoco \
		--run_name vcoco_single_run_000001 \
		--HOIDet \
		--validate \
		--share_enc \
		--pretrained_dec \
		--lr 1e-4 \
		--num_hoi_queries 16 \
		--set_cost_idx 10 \
		--hoi_act_loss_coef 10 \
		--hoi_eos_coef 0.1 \
		--temperature 0.05 \
		--no_aux_loss \
		--hoi_aux_loss \
		--dataset_file vcoco \
		--frozen_weights https://dl.fbaipublicfiles.com/detr/detr-r50-e632da11.pth \
    		--resume checkpoints/HOTR_vcoco/vcoco_single_run_000001/checkpoint99.pth \
    		--start_epoch 99 \
		--data_path /v-coco/data  \
		--output_dir  checkpoints

For testing, you can use your own trained weights and pass the group name and run name to the 'resume' argument.

!python main.py \
		--HOIDet \
		--share_enc \
		--pretrained_dec \
		--num_hoi_queries 16 \
		--object_threshold 0 \
		--temperature 0.05 \
		--no_aux_loss \
		--eval \
		--dataset_file vcoco \
		--data_path /v-coco/data \
		--resume /checkpoints/checkpoint99.pth

In order to use our provided weights, you can download the weights provided below. Then, pass the directory of the downloaded file (for example, to test our pre-trained weights on the vcoco dataset, we put the downloaded weights under the directory checkpoints/vcoco.pth) to the 'resume' argument.

4. Results

Here, we provide results of V-COCO Scenario 1 (60.88 mAP) and Scenario2 (65.69 mAP). This is obtained "without" applying any priors on the scores (see iCAN).

# queries	Scenario 1	Scenario 2	Checkpoint
16	60.2	65.1	download

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
hotr		hotr
imgs		imgs
.gitattributes		.gitattributes
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CASQ: Enhancing Human-Object Interaction Detection via Supplementary Semantic Information for Interaction Queries

1. Environmental Setup

2. How to Train/Test

4. Results

About

Releases

Packages

Languages

levietthinh/CASQ

Folders and files

Latest commit

History

Repository files navigation

CASQ: Enhancing Human-Object Interaction Detection via Supplementary Semantic Information for Interaction Queries

1. Environmental Setup

2. How to Train/Test

4. Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages