This repository contains the pytorch implementation of pp-pad (peripheral prediction padding) in ICIP2023 paper:
- Kensuke Mukai and Takao Yamanaka, "Improving Translation Invariance in Convolutional Neural Networks with Peripheral Prediction Padding," International Conference on Image Processing (ICIP), 2023, Kuala Lumpur, Malaysia. [arXiv https://arxiv.org/abs/2307.07725]
More qualitative comparison: Results_summary.md
In addition to this repository, download the following files, and then place them under the pp-pad folders.
- VOCtrainval_11-May-2012.tar:
-- Download 'VOCtrainval_11-May-2012.tar' from 'Download the training/validation data (2GB tar file)' in http://host.robots.ox.ac.uk/pascal/VOC/voc2012/#devkit
-- Place 'VOCdevkit' folder under pp-pad/program/dataset/ - weights.zip (522 MB): pp-pad/program/weights
Change current directory:
cd program
To train the network:
python PSPNet.py --cfg config/cfg_sample.yaml
To evaluate the translation invarinace and classification accuracy:
python segment_val.py --cfg config/cfg_sample.yaml
The results are saved in the 'program/outputs' directory. The filepath and other settings are specified in the config file 'config/*.yaml'.
To save the visualized results, set the hyper-parameter 'save_patches' in the config file (.yaml) to True and then run the segment_val.py:
python segment_val.py --cfg config/cfg_sample.yaml
Sample images can be obtained in the 'program/samples' directory.
The yaml file can be modified in a text editor.
sample config file: program/config/cfg_sample.yaml
[padding mode]
- padding_mode: Select from 'pp-pad', 'zeros', 'reflect', 'replicate', 'circular', 'partial', 'cap_pretrain', or 'cap_train'. For 'cap_pretrain' and 'cap_train', use them in this order, since pretrain is required in CAP.
[path for PSPNet.py]
- dataset: Dataset folder
- pretrain: Initial weights for the network
- outputs: Output folder
[training parameters]
- num_epoches: Number of training epochs
- val_output_interval: Interval for validation [epochs]
- batch_size
- batch_multiplier: Weights are updated every batch_multiplier, which means the effective batch size is batch_size x batch_multiplier
- optimizer: sgd or adam
[path for segment_val.py]
- val_images: Validation file list for evaluation of translation invariance and classification accuracy. Only 100 images were used for evaluation due to the computational cost.
- weights: Network model for evaluation
[image sizes]
- input_size: Patch size extracted from original image in training and evaluation
- expanded_size: Original image was first resized into expanded_size in the long side, and then croped in the input_size specified above.
- patch_stride: Stride of sliding window in evaluation creating overlapping patches
[dataset info]
- color_mean: Mean values of images in the dataset
- color_std: Standard deviations of images in the dataset
[to save patches and visualize inference results]
- save_patches: True / False
- sample_images: Image files to save patches and visualize inference results
The values in the following results were different from the original ICIP2023 paper[1], especially in meanE, because there were bugs in the initial implementation for calculating meanE. The following is the results obatained by the current code. The network was trained in 320 epoches.
[Simple mean IOU (excluding background), meanE & disR (including background class)]
- mIoU is simple mean of IoU, but not weighted average. The background class was excluded in the calculation of IoU.
- meanE and disR were calculated including background class.
Patch Size: 475x475
Methods | mIoU ↑ | meanE_in ↓ | disR_in ↓ | |
---|---|---|---|---|
Conventional | Zero | 0.3233 | 0.4536 | 0.5847 |
Reflect | 0.3090 | 0.4826 | 0.6087 | |
Replicate | 0.3100 | 0.4745 | 0.6038 | |
Circular | 0.3062 | 0.4923 | 0.6126 | |
Previous | Partial [17] | 0.3184 | 0.4575 | 0.5893 |
Proposed | PP-Pad (2x3) | 0.3203 | 0.4221 | 0.5443 |
PP-Pad (2x3 conv) | 0.3255 | 0.4195 | 0.5423 |
Patch Size: 512x512
Methods | mIoU ↑ | meanE_in ↓ | disR_in ↓ | |
---|---|---|---|---|
Previous | CAP [19] | 0.3300 | 0.4440 | 0.5794 |
Proposed | PP-Pad (2x3) | 0.3301 | 0.4315 | 0.5533 |
PP-Pad (2x3 conv) | 0.3307 | 0.4238 | 0.5505 |
[Weighted average IOU, meanE, and disR excluding background class]
- Weighted average version of mIoU (mIoU_weighted). Each patch was weighted by the number of effective pixels. 'effective' means that union of predicted area and ground-truth area is not zero in at least one class when (excluding the background class) calculating IoU for each patch. By weighting, patches filled with the background can be less contributed to mIoU, since the background class was excluded in mIoU calculations.
- meanE & disR excluding the background class in annotation (meanE_ex, disR_ex)
Patch Size: 475x475
Methods | mIoU_weighted ↑ | meanE_ex ↓ | disR_ex ↓ | |
---|---|---|---|---|
Conventional | Zero | 0.4102 | 0.5564 | 0.7088 |
Reflect | 0.3941 | 0.6016 | 0.7605 | |
Replicate | 0.3990 | 0.5953 | 0.7565 | |
Circular | 0.3879 | 0.6205 | 0.7593 | |
Previous | Partial [17] | 0.4062 | 0.5900 | 0.7462 |
Proposed | PP-Pad (2x3) | 0.4120 | 0.5272 | 0.6820 |
PP-Pad (2x3 conv) | 0.4182 | 0.5118 | 0.6745 |
Patch Size: 512x512
Methods | mIoU_weighted ↑ | meanE_ex ↓ | disR_ex ↓ | |
---|---|---|---|---|
Previous | CAP [19] | 0.4189 | 0.5419 | 0.7040 |
Proposed | PP-Pad (2x3) | 0.4258 | 0.5370 | 0.6932 |
PP-Pad (2x3 conv) | 0.4295 | 0.5213 | 0.6827 |
- Kensuke Mukai and Takao Yamanaka, "Improving Translation Invariance in Convolutional Neural Networks with Peripheral Prediction Padding," ICIP2023. https://arxiv.org/abs/2307.07725
- Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia, "Pyramid Scene Parsing Network," CVPR2017. https://arxiv.org/abs/1612.01105
- Yu-Hui Huang, Marc Proesmans, and Luc Van Gool, "Context-aware Padding for Semantic Segmentation," arXiv 2021. https://arxiv.org/abs/2109.07854
- Guilin Liu, Kevin J. Shih, Ting-Chun Wang, Fitsum A. Reda, Karan Sapra, Zhiding Yu, Andrew Tao, and Bryan Catanzaro, "Partial Convolution based Padding," arXiv 2018. https://arxiv.org/abs/1811.11718
The codes were confirmed with the following versions.
- Python 3.7.13
- Pytorch 1.13.0+cu117
- NVIDIA Driver 510.108.03
- CUDA 11.6