What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs - NIPS 22'

Paper : link
Demo : link
Video : link

Get Started

datasets format are followed by link

For training a model :

python train_grounding.py -bs 32 -nW 8 -nW_eval 1 -task vg_train -data_path /path_to/vg -val_path /path_to/flicker
python train_grounding.py -bs 32 -nW 8 -nW_eval 1 -task coco -data_path /path_to/coco -val_path /path_to/flicker

For Grounding evaluation with our model [XX is the number of the results folder i.e 'gpu22' - XX == 22]:

python inference_grounding.py -task grounding -dataset refit -val_path /path_to/RefIt -Isize 224 -clip_eval 0 -path_ae XX -nW 1
python inference_grounding.py -task grounding -dataset flicker -val_path /path_to/flicker -Isize 224 -clip_eval 0 -path_ae XX -nW 1
python inference_grounding.py -task grounding -dataset vg -val_path /path_to/VG -Isize 224 -clip_eval 0 -path_ae XX -nW 1

For Grounding evaluation with CLIP model:

python inference_grounding.py -task grounding -dataset refit -val_path /path_to/RefIt -Isize 224 -clip_eval 1 -nW 1
python inference_grounding.py -task grounding -dataset flicker -val_path /path_to/flicker -Isize 224 -clip_eval 1 -nW 1
python inference_grounding.py -task grounding -dataset vg -val_path /path_to/VG -Isize 224 -clip_eval 1 -nW 1

For WWbL evaluation with our model:

python inference_grounding.py -task app -dataset refit -val_path /path_to/RefIt -Isize 224 -clip_eval 0 -path_ae XX -nW 1 --start 0 --end 9983
python wwbl_algo1_point_metric.py -nW 1 -predictions_path YY -val_path /path_to/RefIt --dataset refit

python inference_grounding.py -task app -dataset flicker -val_path /path_to/flicker -Isize 224 -clip_eval 0 -path_ae XX -nW 1 -start 0 -end 1000
python wwbl_algo1_point_metric.py -nW 1 -predictions_path YY -val_path /path_to/flicker --dataset flicker

python inference_grounding.py -task app -dataset vg -val_path /path_to/VG -Isize 224 -clip_eval 0 -path_ae XX -nW 1 -start 0 -end 17478
python wwbl_algo1_point_metric.py -nW 1 -predictions_path YY -val_path /path_to/VG --dataset VG

Phrase Grounding Results - Point Accuracy Metric

COCO weights

VG weights

Method	Backbone	VG(VGtrained/COCO)	Flicker(VGtrained/COCO)	ReferIt(VGtrained/COCO)
Baseline	Random	11.15	27.24	24.30
Baseline	Center	20.55	47.40	30.30
GAE	CLIP	54.72	72.47	56.76
FCVC	VGG	-/14.03	-/29.03	-/33.52
VGLS	VGG	24.40/-	-/-	-/-
TD	Inception-2	19.31/-	42.40/-	31.97/-
SSS	VGG	30.03/-	49.10/-	39.98/-
MG	BiLSTM+VGG	50.18/46.99	57.91/53.29	62.76/47.89
MG	ELMo+VGG	48.76/47.94	60.08/61.66	60.01/47.52
GbS	VGG	53.40/52.00	70.48/72.60	59.44/56.10
ours	CLIP+VGG	62.31/59.09	75.63/75.43	65.95/61.03

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
BLIP		BLIP
CLIP/clip		CLIP/clip
datasets		datasets
models		models
pics		pics
README.md		README.md
base.py		base.py
bbox.py		bbox.py
inference_grounding.py		inference_grounding.py
model.py		model.py
req.txt		req.txt
train_grounding.py		train_grounding.py
utils.py		utils.py
utils_coco.py		utils_coco.py
utils_grounding.py		utils_grounding.py
wwbl_algo1_point_metric.py		wwbl_algo1_point_metric.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs - NIPS 22'

Get Started

Phrase Grounding Results - Point Accuracy Metric

About

Releases

Packages

Languages

talshaharabany/what-is-where-by-looking

Folders and files

Latest commit

History

Repository files navigation

What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs - NIPS 22'

Get Started

Phrase Grounding Results - Point Accuracy Metric

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages