This is the source code and additional visualization examples of our LSAP, learning Sampling-Agnostic Perturbations for Video Action Classification.
Motivation of Our Work
Intuitively, generating an adversarial example for a video is more difficult than for an image, since a video contains a sequence of images (frames) with strong temporal correlation. Directly applying the existing image-based adversarial attack methods to generate image-level adversarial examples for each frame in a video will inevitably neglect the temporal correlation among frames and result in less effective video-level adversarial example.
Insight of Our Work
- We investigate the problem of adversarial attack on video classification model and propose a novel sampling-agnostic perturbation generation method for video adversarial examples via universal attack.
- We propose an advanced regularizer for attacking video classification problem and demonstrate the effectiveness of the regularizer named temporal coherence regularization by evaluating its effect for attack on video model.
- We propose a generalized optimization scheme for different types of adversarial attacks, and prove the effectiveness of the optimization method. In video adversarial attack, we find it typically has a value gap between the regularization and classification loss, our new training scheme leads to better convergence speed and generates perturbation even though our attacker lacks the knowledge of the specific frames in the clip.