Skip to content

Transformer-based Long-Term Viewport Prediction in 360° Video: Scanpath is All You Need

Notifications You must be signed in to change notification settings

FannyChao/VPT360

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 

Repository files navigation

Transformer-based Long-Term Viewport Prediction in 360° Video: Scanpath is All You Need

Abstract

Virtual Reality (VR) multimedia technology has dramatically advanced in recent years. Its immersive and interactive natures enable users to view any direction in 360° content freely. Users do not see the entire 360° content at a glance, but only a portion in the viewport. Viewport-based adaptive streaming, which streams only the user’s viewport of interest with high quality, has emerged as the primary technique to save bandwidth over the best-effort Internet. Thus, users’ viewport prediction in the forthcoming seconds becomes an essential task for informing the streaming decisions in the VR system. Various viewport prediction methods based on deep neural networks have been proposed. However, typically they are composed of complex Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) that require heavy computation. To achieve high prediction accuracy in limited computation time in a streaming system, we propose a new transformer-based architecture, named 360° Viewport Prediction Transformer (VPT360), that only leverages the past viewport scanpath to predict a user’s future viewport scanpath. We evaluate VPT360 over three widely-used datasets and compare the computation complexity with the state-of-the-art methods. The experiments show that our VPT360 provides the highest accuracy for short-term and long-term prediction and achieves the lowest computation complexity. The code is publicly available at https://github.com/FannyChao/VPT360 to further contribute to the community.

To Be Continued...

Citing

@INPROCEEDINGS{VPT360,
  author={F. -Y. {Chao} and C. {Ozcinar} and A. {Smolic}},
  booktitle={IEEE 23nd International Workshop on Multimedia Signal Processing (MMSP)}, 
  title={Transformer-based Long-Term Viewport Prediction in 360° Video: Scanpath is All You Need}, 
  year={2021},
  volume={},
  number={},
  pages={},
  doi={}}

About

Transformer-based Long-Term Viewport Prediction in 360° Video: Scanpath is All You Need

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published