Supervoxel-Consistent Foreground Propagation in Video

Problem

Automatic propagation of foreground segmentation in videos from a single/multiple labeled frame (s).

Existing methods can only enforce local consistency in space and time
(Only use pairwise connections)

Robust foreground propagation requires capturing long range dependencies as object evolves in shape over time.

Our Idea

Higher order potentials for supervoxels to discover long-range coherent regions.

Enforce long term temporal consistency using higher order potentials defined over supervoxel based cliques.
Supervoxel cliques often span long and broader areas in space and time, hence better capture object’s long term evolution in shape and appearance.

Approach

Spatio-Temporal MRF with higher order supervoxel cliques.

Unary Potential:

Pairwise Potential:

Higher Order Potential:

Solve using graph cuts in an iterative grab cut manner.

Results

Segtrack Dataset (6 videos, 243 frames)

Average Pixel Error (lower is better)

YouTube Dataset (126 videos, >10k frames)

- Our method outperforms all the baselines in 8 out of 10 classes, with gains up to 8 points over the best baseline.

- Ground truth pixel level object masks collected using mturk available for download.

Weizmann Dataset (90 videos, >5k frames)

- Our method produces accurate object tubes with much less annotation effort as compared to Cheng et al.

Qualitative Results

Data

Pixel level ground truth mask for a subset of YouTube Objects dataset [data]