What's It Going to Cost You?: Predicting Effort vs. Informativeness
for Multi-Label Image Annotations

Sudheendra Vijayanarasimhan and Kristen Grauman
Department of Computer Sciences,
University of Texas at Austin


Problem

In the general case, visual category learning does not fit the mold of traditional active learning approaches, which primarily aim to reduce the number of labeled examples required to learn a classifier, and almost always assume a binary decision task. When trying to choose informative image data to label for recognition, there are three important distinctions we ought to take into account as shown in the following figure.

Image multi-label-concept Image effort-concept-ijcv
Image concept_phones_part1_dotted_with_arrow1
Three important problems that need to be addressed while choosing informative image data to label for recognition, none of which are considered by traditional active learning approaches.

How do we actively learn in the presence of multi-label and multi-level annotations where each annotation requires different amounts of manual efforts?



Solution

In order to handle these issues, we propose an active learning framework where the expected informativeness of any candidate image annotation is weighed against the predicted cost of obtaining it.

$ \bullet$ Multi-level annotations. To accomodate the multiple levels of granularity that may occur in provided image annotations, we pose the problem in the multiple-instance learning setting (MIL) as done in our previous work.

$ \bullet$ Multi-label images. To deal with images containing multiple labels (objects) we devise a kernel-based classifier for multiple-instance, multi-label learning (MIML) and show how to extend our previous work on standard binary MIL setting to the more general MIML setting.

Image msk-concept1
If $ X_1$ and $ X_2$ are two images associated with labels $ l_1$ and $ l_2$ then we define the Multi-label Set Kernel(MSK) value between the images as Image msk-eq1 where $ K(x_i^1, x_j^2)$ is the kernel values between instances $ x_i^1, x_j^2$ in the two images.

$ \bullet$ Predicting manual effort. Humans (especially vision researchers) can easily glance at an image and roughly gauge the difficulty. Can we predict annotation costs directly from image features?

Image which-image1 Image mech-turk1

Learning with data collected from anonymous users on the Web, we show that we can reliably learn to predict the amount of time required to annotate an image and that active selection gains actually improve when we account for the task's variable difficulty.



Multi-label Multi-level Active Selection

Putting it all together we consider the following three queries that an active learner can pose:

Image selection1

We measure the information content of a candidate annotation by computing the reduction in the risk of the dataset once we add the example along with the annotation to the training set and retrain the classifier.

We obtain the cost (manual effort) of an annotation using our cost-predictor trained on user annotations.

At each active iteration, we compute the ``net worth'' of each unlabeled example and its candidate annotations and choose the annotation that provides the best tradeoff and add it to the training set.



Results

To validate our method we use the MSRC dataset, as it is a common benchmark for multi-class object segmentation. In the following we show some example selections by our approach to illustrate the main idea. We provide extensive quantitative results in our publication.

Image example-easy-hard1
The easiest and hardest images to annotate based on actual users' timing data (top), and the predictions of our cost function on novel images (bottom).



Image example-queries1
Each image (from left to right) represents the example with the largest VOI as selected by our active learner on a sequence of iterations. The active learning query (one among a region label, an image tag or a complete segmentation) is displayed at the bottom of the image along with the oracle's answer. For a query on a region the corresponding region is highlighted in the image, and for an image tag the text on top of the image represents what label would produce the best reduction in risk.


Publication

What's It Going to Cost You?: Predicting Effort vs. Informativeness for Multi-Label Image Annotations,
S. Vijayanarasimhan and K. Grauman, in CVPR 2009
[paper,annotation data]

Sudheendra 2010-03-18