Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation

Suyog Dutt Jain         Kristen Grauman
University of Texas at Austin
[pdf] [supplementary] [bibtex][poster][code] [data]


Fixing the input modality for interactive segmentation methods is not optimal
Image Problem

Our Goal

Predict the annotation modality that is sufficiently strong for accurate segmentation of a given image
Image Cost

Applications of our method

Quick selection for a single image       Group selection with fixed budget
Image AppQuick       Image AppGroup


Segmentation model
We use the standard Markov Random Field based image segmentation model.
Image MRFNew

Learning to predict segmentation difficulty per modality (Training)
Given a set of images with the foreground masks, we first simulate the user input.
Image Approach
Use the overlap score between the resulting segmentation and ground truth to mark an image
as ``easy" or ``hard" and train a linear SVM classifier (for each modality).
Bounding box example of ``easy" vs ``hard"
Image HardEasy

Learning to predict segmentation difficulty per modality (Testing)
Use saliency detector to get a coarse estimate of foreground at test time.
Image Saliency
Liu et al. 2009
Compute the proposed features and use trained classifiers to predict difficulty

Modality selection methods

Cascade Selection
Making quick and best selection for a given image.
Image Cascade

Budgeted Selection
Goal: Given a batch of ``n" images with a fixed time budget ``B", we find the optimal
annotation tool for each image
Image Budget


  1. Interactive Image Segmentation (IIS): 151 unrelated images with complex shapes and appearance.

  2. MSRC: 591 images, and we convert the multi-class annotations to fg-bg labels by treating the main object(s) (cow, flowers, etc.) as foreground.

  3. CMU-Cornell iCoseg: 643 images divided into 38 groups with similar foreground appearance.


  1. Otsu: Adaptive image thresholding.

  2. Effort Prediction (Vijayanarasimhan et al. 2009): State-of-the-art method for estimating image difficulty.

  3. Global Features: We train two SVMs (one for bounding box, one for contours) to predict if an image is easy based on global features.

  4. GT-Input: Uses the ground-truth box/contour masks as input to our method (Upper Bound).

  5. Random: Randomly assigns a confidence value to each modality in the budgeted annotation results.

Predicting segmentation difficulty per modality:

\includegraphics[keepaspectratio=true,scale=0.30]{figs/MSRC_bounding_box_roc.eps} \includegraphics[keepaspectratio=true,scale=0.30]{figs/iCoseg_bounding_box_roc.eps} \includegraphics[keepaspectratio=true,scale=0.30]{figs/IIS_bounding_box_roc.eps} \includegraphics[keepaspectratio=true,scale=0.30]{figs/ALL_bounding_box_roc.eps}
\includegraphics[keepaspectratio=true,scale=0.30]{figs/MSRC-contour_roc.eps} \includegraphics[keepaspectratio=true,scale=0.30]{figs/iCoseg-contour_roc.eps} \includegraphics[keepaspectratio=true,scale=0.30]{figs/IIS-contour_roc.eps} \includegraphics[keepaspectratio=true,scale=0.30]{figs/ALL-contour_roc.eps}
Difficulty prediction accuracy for each dataset (first three columns) and cross-dataset experiments (last column)

Image Quality

Cascade selection - application to recognition

Task: Given a set of images with a common object, train a classifier to separate object vs. non object regions.

How to get data labeled?

Image Recog
Our method leads to substantial savings in annotation effort
with minimal loss in accuracy.

Budgeted selection - MTurk User study

For the same amount of annotation time, our method
leads to much higher average overlap scores.

Acknowledgements: This research is supported in part by ONR YIP N00014-12-1-0754.

Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation. S. Jain and K. Grauman. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, December 2013. [pdf] [supplementary] [bibtex][poster] [code] [data]

Last modified by Suyog Jain 2014-06-27