Learning Binary Residual Representations for Domain-specific Video Streaming | mingyuliu.blog

This post is about our AAAI 2018 paper on efficient video streaming (together with Yi-Hsuan Tsai, Deqing Sun, Ming-Hsuan Yang, and Jan Kautz). Some results can be found in the accompany video.


Existing video compression standards (e.g., MPEG4, H.264, and HEVC) can effectively compress most of the data in a video. What are left uncompressed are the residual images, which are the difference between the compressed and original videos. The residual images are difficult to compress because they contain highly non-linear, domain-specific patterns. In this work, we ask the following two questions, hoping that they can lead us to better compress videos for game streaming.

  1. Whether we can improve existing compression algorithms to achieve a better compression rate if we limit the use of the resulting compressor to a specific domain.
  2. Whether the improved design can be seamlessly integrated to existing video compression standard.

Why 1? We ask the first question because many interesting video streaming services are domain-specific. For example, as using video game streaming services (e.g., NVIDIA GeForce Now), the game videos are first rendered in the GPU server. They are then compressed and delivered to the end user. Over a period of hours, the videos to be streamed are all in the same domain.

Why 2? We ask for a seamless integration because we want to leverage existing video compression infrastructure, including those hardware-optimized computation and existing software stacks.


We believe that we have come out with hybrid system to address both of the questions. Specifically, we first apply an existing video compression standard (e.g., H.264) to compress domain-specific videos and train a binary autoencoder (the latent representations are either 0 or 1) to encode the resulting residual information frame-by-frame into a binary representation. We then apply Huffman coding to compress the binary representations in a lossless manner. The compressed binary representations can be sent to the user in the meta data field in the existing video streaming packet. This way, our system is compatible the existing video streaming standard. We illustrate our system in the following figure.


This allows us to achieve a better compression rate under a channel bandwidth constraint. Specifically, we could reduce the bandwidth assigned to the existing video compression standard and use the reserved bandwidth for transmitting the binary residual representation computed by the binary autoencoder. Our experiment results show that we can save up to 1.7 db using this hybrid system. The results for the SkyRim game is show in the following figure (click and zoom in to see the difference). For more details about the algorithm and experiment results, please check out our paper.

Originally published at mingyuliu.blog on January 30, 2018.

Source: Deep Learning on Medium