[CVPR 2024] From-Ground-To-Objects: Coarse-to-Fine Self-supervised Monocular Depth Estimation of Dynamic Objects with Ground Contact Prior
Download dynamic object masks for Cityscapes dataset from DynamicDepth github
Pretrained models can be downloaded from here
Put model checkpoints (mono_encoder.pth
& mono_depth.pth
) in /checkpoints/MonoViT/
WIR
: Whole Image Region / DOR
: Dynamic Object Region
Method | Input size | abs rel | a1 | |
---|---|---|---|---|
Ours-MonoViT | 192 x 640 | WIR | 0.087 | 0.921 |
DOR | 0.099 | 0.910 |
Precomputed results (disparity_map
& error_map
) can be downloaded from here
# Test pretrained MonoViT with our proposed method on Cityscapes dataset
python test.py --config ./configs/test_monovit_cs.yaml
If you find this work useful, please consider citing:
@inproceedings{moon2024ground,
title={From-Ground-To-Objects: Coarse-to-Fine Self-supervised Monocular Depth Estimation of Dynamic Objects with Ground Contact Prior},
author={Moon, Jaeho and Bello, Juan Luis Gonzalez and Kwon, Byeongjun and Kim, Munchurl},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10519--10529},
year={2024}
}
Monodepth: https://github.com/nianticlabs/monodepth2
MonoViT: https://github.com/zxcqlf/MonoViT
DynamicDepth: https://github.com/AutoAILab/DynamicDepth
This work was supported by the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT): No. 2021-0-00087, Development of high-quality conversion technology for SD/HD low-quality media and No. RS2022-00144444, Deep Learning Based Visual Representational Learning and Rendering of Static and Dynamic Scenes.