[CVPR 2024] Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection
Taeheon Kim*, Sebin Shin*, Youngjoon Yu, Hak Gu Kim, and Yong Man Ro ( * : equally contributed)
This repository contains code and links to the Causal Mode Multiplexer, a framework designed for unbiased multispectral pedestrian detection. We show that the Causal Mode Multiplexer framework effectively learns the causal relationships between multispectral inputs and predictions, thereby showing strong generalization ability on out-of-distribution data.
- News
- Summary
- Installation & Data Preparation
- Training
- Test
- New Dataset: ROTX-MP
- Citation
- Acknowledgement
- 2024.04.17 🌈 Code released.
- 2024.03.02 ⭐ arXiv preprint released.
- 2024.02.27 🎉 Our paper has been accepted to CVPR 2024.
We propose a novel Causal Mode Multiplexer (CMM) framework that performs unbiased inference from statistically biased multispectral pedestrian training data. Specifically, the CMM framework learns causality based on different cause-and-effects between ROTO1, RXTO, and ROTX inputs and predictions. For ROTO data, we guide the model to learn the total effect in the common mode learning scheme. Next, for ROTX and RXTO data, we utilize the tools of counterfactual intervention to eliminate the direct effect of thermal by subtracting it from the total effect. To this end, we modify the training objective from maximizing the posterior probability likelihood to maximizing the total indirect effect in the differential mode learning scheme. Our design requires combining two different learning schemes; therefore, we propose a Causal Mode Multiplexing (CMM) Loss to optimize the interchange.
1R⋆T⋆ refers to the visibility (O/X) in each modality. Generally, ROTO refers to daytime images, and RXTO refers to nighttime images. ROTX refers to daytime images in obscured situations.
The following are the instructions on how to install dependencies and prepare data. The code is tested on torch=0.3.1, cuda9.0
.
Step 1. Clone the repository locally:
git clone https://github.com/ssbin0914/Causal-Mode-Multiplexer.git
cd Causal-Mode-Multiplexer
Step 2. Create conda env using the file requirements.txt
and then activate it:
conda create -n cmm python=2.7.16
conda activate cmm
pip install -r requirements.txt
wget https://download.pytorch.org/whl/cu90/torch-0.3.1-cp27-cp27mu-linux_x86_64.whl
pip install torch-0.3.1-cp27-cp27mu-linux_x86_64.whl
pip install torchvision==0.1.8
cd lib
sh make.sh
cd ..
Step 3. Download the data
folder from this link, unzip it, and place it under the Causal Mode Multiplexer/
directory. We provide the FLIR dataset.
└── Causal-Mode-Multiplexer
├── cfgs
│
├── lib
│
├── data
│ ├── cache
│ │
│ ├── KAIST_PED
│ │ └── Annotations
│ │ │ ├── lwir
│ │ │ │ ├── FLIR_08864.txt
│ │ │ │ └── ...
│ │ │ └── visible
│ │ │ ├── FLIR_08864.txt
│ │ │ └── ...
│ │ ├── annotations_cache
│ │ ├── ImageSets
│ │ │ ├── Main
│ │ │ │ ├── train.txt
│ │ │ │ └── test.txt
│ │ │ └── Main_Org
│ │ ├── JPEGImages
│ │ │ ├── lwir
│ │ │ │ ├── FLIR_08864.jpg
│ │ │ │ └── ...
│ │ │ └── visible
│ │ │ ├── FLIR_08864.jpg
│ │ │ └── ...
│ │ └── results
│ │
│ └── pretrained_model
│ ├── resnet50.pth
│ ├── resnet101.pth
│ └── ...
│
└── ...
To train the CMM, simply run:
CUDA_VISIBLE_DEVICES=0,1,2,3 python trainval_net.py ResNet50_lr0.007_Uncer_KL --dataset kaist --cuda --mGPUs --bs 4 --cag --s 2 --types all --net res50 --UKLoss ON --lr 0.007 --lr_decay_step 1 --epochs 2
where ResNet50_lr0.007_Uncer_KL
is the name of the folder where the weights will be stored. --lr
specifies the learning rate, --lr_decay_step
indicates the step at which the learning rate decays, and --epochs
refers to the number of training epochs.
After running the code, the weights are stored in weights/res50/kaist/ResNet50_lr0.007_Uncer_KL/
directory.
- The pretrained weight for the FLIR dataset is available from this link. If you want to test with this pretrained weight, put the weight file under the
weights/res50/kaist/ResNet50_lr0.007_Uncer_KL/
directory.
To test the CMM, simply run:
CUDA_VISIBLE_DEVICES=0 python test_net.py ResNet50_lr0.007_Uncer_KL --dataset kaist --cuda --cag --checksession 2 --checkepoch 2 --checkpoint 1381 --types all --UKLoss ON --net res50 --vis
where ResNet50_lr0.007_Uncer_KL
is the name of the folder containing the weights you want to test. Set --checksession
, --checkepoch
, and --checkpoint
according to the name of your weight file, e.g., fpn_2_2_1381.pth
.
After running the code, you may encounter an error such as IOError: [Errno 2] No such file or directory: '~/data/KAIST_PED/Annotations/FLIR_09593.xml
. Even if this error occurs, it still indicates that the detection was performed correctly. We calculate the AP score not with test_net.py
but using MATLAB, so proceed to the next step.
The detection results are stored in Detection_Result/
directory. These results are used for evaluation purposes. Visualization results are also stored as RGB images in the images
folder and as infrared (IR) images in the images_ir
folder.
To calculate the AP score, we use MATLAB.
Step 1. Create a folder locally and then create a test
folder inside it.
Step 2. Move the txt files from the Detection_Result/
directory into the test
folder.
Step 3. Download and unzip the ground truth annotation folder from this link.
Step 4. Download and unzip the evaluator from this link.
Step 5. Open FLIRdevkit-matlab-wrapper/demo_test.m
. In this file, set dtDir
to the path of the test
folder and gtDIR
to the path of the downloaded ground truth annotation flir
folder.
Step 6. Open FLIRdevkit-matlab-wrapper/bbGt.m
and set a breakpoint at line 761. Then run demo_test.m
. When it hits the breakpoint, enter trapz(xs, ys)
. This value is the AP score.
To evaluate modality bias in multispectral pedestrian detectors, we propose a new dataset: ROTX Multispectral Pedestrian (ROTX-MP) dataset. It mainly contains ROTX data, compared to existing datasets that consist of ROTO and RXTO data. ROTX-MP consists of 1000 ROTX test images collected from two practical scenarios (pedestrians over a glass window, pedestrians wearing heat-insulation clothes) related to the applications of multispectral pedestrian detection.
If you need the ROTX-MP dataset, feel free to email eetaekim@kaist.ac.kr.
To evaluate performance on the ROTX-MP dataset:
Step 1. Place the ground truth annotations in the lwir
and visible
folders within the data/KAIST_PED/Annotations/
directory.
Step 2. Put the images from the ROTX-MP dataset into the lwir
and visible
folders located in the data/KAIST_PED/JPEG_Images/
directory.
Step 3. Replace the test.txt
file in the data/KAIST_PED/ImageSets/Main/
directory with the ROTX-MP test.txt
file. Note that the original test.txt
file is from the FLIR dataset.
Step 4. Delete the files located in the data/cache/
directory if you evaluate on the ROTX-MP dataset after evaluating the FLIR dataset. It is crucial to remove these files when switching between datasets for training and evaluation. This is also the case when you want to evaluate on the FLIR dataset after evaluating the ROTX-MP dataset.
Step 5. To calculate the AP score, simply perform the evaluation in the same way it was previously done with the FLIR dataset.
If you find this work useful for your research, please consider citing our paper:
@inproceedings{kim2024causal,
title={Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection},
author={Kim, Taeheon and Shin, Sebin and Yu, Youngjoon and Kim, Hak Gu and Ro, Yong Man},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={26784--26793},
year={2024}
}
We thank the authors of the following research works and open-source projects. We've used some of the code from different repositories.
Uncertainty-Guided Cross-Modal Learning for Robust Multispectral Pedestrian Detection