Zero-shot Visual Question Answering using Knowledge Graph [ ISWC 2021 ]
In this work, we propose a Zero-shot VQA algorithm using knowledge graphs and a mask-based learning mechanism for better incorporating external knowledge, and present new answer-based Zero-shot VQA splits for the F-VQA dataset.
2024-02
We preprint our Survey Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey [Repo
].
python >= 3.5
PyTorch >= 1.6.0
For more detail of requirements:
pip install -r requirements.txt
Location of 5 F-VQA train / test data split:
data/KG_VQA/fvqa/exp_data/train_data
data/KG_VQA/fvqa/exp_data/test_data
Location of 5 ZS-F-VQA train / test data split:
data/KG_VQA/fvqa/exp_data/train_seen_data
data/KG_VQA/fvqa/exp_data/test_unseen_data
Answers are available at data/KG_VQA/data/FVQA/new_dataset_release/.
Image:
- Image folder (put all your
.JPEG
/.jpg
file here):data/KG_VQA/fvqa/exp_data/images/images
- Image feature:
fvqa-resnet-14x14.h5
pretrained: GoogleDrive or BaiduCloud (password:16vd)fvqa36_imgid2idx.pkl
andfvqa_36.hdf5
pretrained: GoogleDrive or BaiduCloud (password:zsqa)
- Origin images are available at FVQA with download_link.
- Other vqa dataset: you could generate a pretrained image feature via this way (Guidance / code)
- The generated
.h
file should be placed in :data/KG_VQA/fvqa/exp_data/common_data/.
Answer / Qusetion vocab:
- The generated file
answer.vocab.fvqa.json
&question.vocab.fvqa.json
now are available at :data/KG_VQA/fvqa/exp_data/common_data/.
- Other vqa dataset: code for process answer vocab and process questions vocab
Pretrained Model (url)
Download it and overwrite data/KG_VQA/fvqa/model_save
[--KGE {TransE,ComplEx,TransR,DistMult}] [--KGE_init KGE_INIT] [--GAE_init GAE_INIT] [--ZSL ZSL] [--entity_num {all,4302}] [--data_choice {0,1,2,3,4}]
[--name NAME] [--no-tensorboard] --exp_name EXP_NAME [--dump_path DUMP_PATH] [--exp_id EXP_ID] [--random_seed RANDOM_SEED] [--freeze_w2v {0,1}]
[--ans_net_lay {0,1,2}] [--fact_map {0,1}] [--relation_map {0,1}] [--now_test {0,1}] [--save_model {0,1}] [--joint_test_way {0,1}] [--top_rel TOP_REL]
[--top_fact TOP_FACT] [--soft_score SOFT_SCORE] [--mrr MRR]
Available model for training: Up-Down
, BAN
, SAN
, MLP
You can try your own model via adding it (.py
) to : main/code/model/.
For more details: code/config.py
cd code
For data check:
python deal_data.py --exp_name data_check
General VQA:
- train:
bash run_FVQA_train.sh
- test:
bash run_FVQA.sh
ZSL/GZSL VQA:
- train:
bash run_ZSL_train.sh
- test:
bash run_ZSL.sh
Note:
- you can open the
.sh
file for parameter modification.
Result:
- Log file will be saved to:
code/dump
- model will be saved to:
data/KG_VQA/fvqa/model_save
Thanks for the following released works:
SciencePlots, ramen, GAE, vqa-winner-cvprw-2017, faster-rcnn, VQA, BAN, commonsense-kg-completion, bottom-up-attention-vqa, FVQA, answer_embedding, torchlight
Please condiser citing this paper if you use the code
@inproceedings{chen2021zero,
title={Zero-Shot Visual Question Answering Using Knowledge Graph},
author={Chen, Zhuo and Chen, Jiaoyan and Geng, Yuxia and Pan, Jeff Z and Yuan, Zonggang and Chen, Huajun},
booktitle={International Semantic Web Conference},
pages={146--162},
year={2021},
organization={Springer}
}
For more details, please submit a issue or contact Zhuo Chen.