Object Segmentation masks for ImageNet Video Dataset - 2015

Here we provide the binary object segmentation masks which were used to train our motion stream (Section 3.2 in the paper)

This data includes a total of 84,929 video frames, the corresponding optical flow and object segmentations obtained using our appearance stream model.

The series of filtering stages which make use of the bounding boxes provided with the original dataset ensures high quality.

Download Video Frames (2GB)

Download Segmentations (166MB)

Download Optical Flow (6GB)

Please cite the following papers if you use this dataset in your work:

@article{fusionseg,
Author = {Jain, Suyog and Xiong, Bo and Grauman, Kristen}, 
Journal = {CVPR},
Title = {FusionSeg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos}, 
Year = {2017}
}

@article{ILSVRC15,
Author = {Olga Russakovsky and Jia Deng and Hao Su and Jonathan Krause and Sanjeev Satheesh and Sean Ma and Zhiheng Huang and Andrej Karpathy and Aditya Khosla and Michael Bernstein and Alexander C. Berg and Li Fei-Fei},
Title = {{ImageNet Large Scale Visual Recognition Challenge}},
Year = {2015},
journal   = {International Journal of Computer Vision (IJCV)},
doi = {10.1007/s11263-015-0816-y},
volume={115},
number={3},
pages={211-252}
}