Learning Compressible 360° Video Isomers

Concept figure

We propose to improve 360° video compression by selecting a proper orientation for cubemap projection. Our key insight is that different cubemap orientations lead to different compression rate for the same 360° video using the same video codec. We perform a detailed analysis on 80 360° videos with 3 hours total length to verify that the orientation of cubemap projection is important for the ultimate video size. The results show scope for reducing video sizes by up to 75% through rotation, and the average reduction is more than 8% across all videos.


360° Video Isomers Analysis

We enumerate the cubemap orientation along two rotation axes:

Cubemap orientations

The cubemaps are rendered with transform360 and encoded with x264, x265, and libvpx losslessly. We use a fixed 2s GOP and encode each GOP with an independent orientation. We then define the achievable size reduction through rotation as:

Size reduction definition

We compute the size reduction over 80 360° videos with 3 hours total length. The videos are crawled from YouTube and are encoded with H264 High Profile with 4K resolution. The average and range of achievable size reduction are:

Size reduction results

Because the compression rate depends on the visual content and resulting cubemap representation, the video size distribution w.r.t. orientation varies across different videos.

Cubemap size distribution

Predict Compressible Isomer

Enumerate all possible cubemap orientations requires us to encode the video repeatedly and is computationally prohibitive. Instead, we propose to predict the optimal orientation from video content using a Convolutional Neural Network.

Orientation prediction

Based on the prediction model, we propose a new two-stage compression pipeline for 360° videos.

Compression pipeline


The encoded video size is stored in Pandas DataFrame. The columns correspond to the Youtube video id and segment id. The segment length is two seconds, so the start and end of each segment is [2*id, 2*id+2]. The rows correspond to the two orientations (yaw, pitch) of the cubemap. The file size is represented in byte. The HDF5 file contains three datasets: h264, hevc, and vp9.