Suyog Dutt Jain Kristen Grauman
University of Texas at Austin
suyog@cs.utexas.edu
[pdf] [supplementary] [bibtex][poster][code] [data]

Problem

Fixing the input modality for interactive segmentation methods is not optimal

Our Goal

Predict the annotation modality that is sufficiently strong for accurate segmentation of a given image

Applications of our method

Quick selection for a single image				Group selection with fixed budget

Approach

Segmentation model

We use the standard Markov Random Field based image segmentation model.

Learning to predict segmentation difficulty per modality (Training)

Given a set of images with the foreground masks, we first simulate the user input.

Use the overlap score between the resulting segmentation and ground truth to mark an image
as ``easy" or ``hard" and train a linear SVM classifier (for each modality).

Bounding box example of ``easy" vs ``hard"

Learning to predict segmentation difficulty per modality (Testing)

Use saliency detector to get a coarse estimate of foreground at test time.

Liu et al. 2009

Compute the proposed features and use trained classifiers to predict difficulty

Modality selection methods

Cascade Selection

Making quick and best selection for a given image.

Budgeted Selection

Goal: Given a batch of ``n" images with a fixed time budget ``B", we find the optimal
annotation tool for each image

Results

Datasets:

Interactive Image Segmentation (IIS): 151 unrelated images with complex shapes and appearance.
MSRC: 591 images, and we convert the multi-class annotations to fg-bg labels by treating the main object(s) (cow, flowers, etc.) as foreground.
CMU-Cornell iCoseg: 643 images divided into 38 groups with similar foreground appearance.

Baselines:

Otsu: Adaptive image thresholding.
Effort Prediction (Vijayanarasimhan et al. 2009): State-of-the-art method for estimating image difficulty.
Global Features: We train two SVMs (one for bounding box, one for contours) to predict if an image is easy based on global features.
GT-Input: Uses the ground-truth box/contour masks as input to our method (Upper Bound).
Random: Randomly assigns a confidence value to each modality in the budgeted annotation results.

Predicting segmentation difficulty per modality:

$\includegraphics[keepaspectratio=true,scale=0.30]{figs/MSRC_bounding_box_roc.eps}$	$\includegraphics[keepaspectratio=true,scale=0.30]{figs/iCoseg_bounding_box_roc.eps}$	$\includegraphics[keepaspectratio=true,scale=0.30]{figs/IIS_bounding_box_roc.eps}$	$\includegraphics[keepaspectratio=true,scale=0.30]{figs/ALL_bounding_box_roc.eps}$
$\includegraphics[keepaspectratio=true,scale=0.30]{figs/MSRC-contour_roc.eps}$	$\includegraphics[keepaspectratio=true,scale=0.30]{figs/iCoseg-contour_roc.eps}$	$\includegraphics[keepaspectratio=true,scale=0.30]{figs/IIS-contour_roc.eps}$	$\includegraphics[keepaspectratio=true,scale=0.30]{figs/ALL-contour_roc.eps}$
Difficulty prediction accuracy for each dataset (first three columns) and cross-dataset experiments (last column)

Cascade selection - application to recognition

Task: Given a set of images with a common object, train a classifier to separate object vs. non object regions.

How to get data labeled?

All tight: Ask the human annotator to provide pixel level masks (status quo).
Ours: Use our cascade selection method to decide the best annotation for each image.

Our method leads to substantial savings in annotation effort
with minimal loss in accuracy.

Budgeted selection - MTurk User study

101 MTurkers (5 per image).
Use the median time for each image for experiments.
Budget ranges from ``all bounding boxes" to ``all tight polygons".

$\includegraphics[keepaspectratio=true,scale=0.60]{figs/userstudy_budget.eps}$

For the same amount of annotation time, our method
leads to much higher average overlap scores.

Acknowledgements: This research is supported in part by ONR YIP N00014-12-1-0754.

Publication:
Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation. S. Jain and K. Grauman. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, December 2013. [pdf] [supplementary] [bibtex][poster] [code] [data]

Last modified by Suyog Jain 2014-06-27