WhittleSearch: Image Search with Relative Attribute Feedback

Adriana Kovashka, Devi Parikh, and Kristen Grauman


Note: We have a patent pending for this work.
We also have a demo for this project.

Abstract

We propose a novel mode of feedback for image search, where a user describes which properties of exemplar images should be adjusted in order to more closely match his/her mental model of the image(s) sought. For example, perusing image results for a query "black shoes", the user might state, "Show me shoe images like these, but sportier." Offline, our approach first learns a set of ranking functions, each of which predicts the relative strength of a nameable attribute in an image ('sportiness', 'furriness', etc.). At query time, the system presents an initial set of reference images, and the user selects among them to provide relative attribute feedback. Using the resulting constraints in the multi-dimensional attribute space, our method updates its relevance function and re-ranks the pool of images. This procedure iterates using the accumulated constraints until the top ranked images are acceptably close to the user.s envisioned target. In this way, our approach allows a user to efficiently "whittle away" irrelevant portions of the visual feature space, using semantic language to precisely communicate her preferences to the system. We demonstrate the technique for refining image search for people, products, and scenes, and show it outperforms traditional binary relevance feedback in terms of search speed and accuracy.

Introduction

Existing image search methods rely either on keywords or on content-based retrieval. Keywords are not enough-- we cannot pre-tag all images the user might like to search with keywords that match whatever query the user might come up with. On the other hand, content-based image retrieval is limited by the well-known "semantic gap" between low-level cues and the higher-level user intent. User feedback can help, but current methods provide a very narrow channel for human feedback -- it is not clear what about the marked images is relevant or irrelevant.



Therefore, we propose to use relative attributes for more specific user feedback in image search. We allow user to describe precisely what is missing from current set of results. The user expresses the semantics of their search goal through relative attributes, relating their target and some pre-selected exemplar images. Note that attributes (or "concepts" in the information retrieval community) have been used previously for search, but users have not been allowed to isolate individual attributes as a handle for feedback.


Our idea: Allow users to give relative attribute feedback on reference images to refine their image search.


Approach

Step 1: Predict relative attribute strengths.
Step 2: Get user statements relating their search target to exemplar images.
Step 3: Use the constraints to whittle away irrelevant regions of the multi-dimensional attribute space.

Background: Binary Relevance Feedback

  1. User marks some images as relevant or irrelevant given their search target.
  2. We learn a binary classifier using the relevant images as positives and the irrelevant images as negatives, and rank images in the dataset by the classifier outputs.

Relative Attribute Feedback

Learning to predict relative attributes

We learn relative attributes as in Parikh and Grauman, "Relative Attributes", ICCV 2011.

Interface for image-level relative attribute annotations.
  1. Obtain ordered image pairs and unordered pairs such that (image i has stronger presence of attribute m than image j) and (the images have equivalent presence of attribute m).


  2. Learn a ranking function such that as many of the following constraints are satisfied as possible:
    .

  3. For the latter, use the formulation due to Thorsten Joachims, "Optimizing Search Engines Using Clickthrough Data", KDD 2002:
    minimize subject to:

Updating the scoring function from feedback

  1. User selects some images and marks how they differ from the image want, thus defining constraints: "I want [objects] that are [more/less] [attribute name] than this image."
  2. We update the scores for each image in the dataset, using these constraints.


  3. At the top we rank images whose score , i.e. images that satisfy all F constraints given so far. Next are images which satisfy F-1 constraints, etc.


A toy example illustrating the intersection of relative constraints with M = 2 attributes. The images are plotted on the axes for both attributes. The space of images that satisfy each constraint are marked in a different color. The region satisfying all constraints is marked with a black dashed line. In this case, there is only one image in it (outlined in black).

Hybrid Feedback Approach




Experimental Results

Experimental Design

Datasets


Evaluation Metrics

Feedback generation


Feedback Results

Impact of iterative feedback


Our method converges faster than SVM-based binary feedback. Our advantage is stronger on datasets with more fluid categories.

Impact of amount of feedback


Our method learns faster; it achieves higher accuracy with fewer constraints.

Impact of reference images


Baseline needs good positives and negatives; our method needs similar images or a mix.

Ranking accuracy with human-given feedback


Initialization with random reference images.

Initialization via an attribute-keyword query.

Our method outperforms binary search but not on OSR; may be due to human difficulty with attribute vocabulary. Relative attribute feedback can also refine keyword search.


Qualitative Results



Example iterative search results with relative attribute feedback.




Example search result with hybrid feedback.

Note: You can find additional results in our supplementary file.

Consistency of Relative Supervision Types

Class-level vs. instance-level

Humans agree more when asked to compare image instances (6% disagreement) than when asked to compare image categories (13% disagreement). Furthermore:


Instance-based learning is less error-prone than class-based learning for images with less strict category boundaries.

Absolute vs. relative

Humans agree more when they make relative statements (17% disagreement) vs absolute statements (22% disagreement).

Conclusion

We proposed a method that allows user to communicate very precisely how the retrieved results compare with their mental model. Our method refines results more effectively, often with less human effort.


Publication and Dataset

WhittleSearch: Image Search with Relative Attribute Feedback. Adriana Kovashka, Devi Parikh, and Kristen Grauman. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, June 2012.     [pdf]     [supplementary]     [.bib file]     [poster]     [Shoes dataset]     [OSR annotations]     [PubFig annotations]

Note: Please send a blank email to [adriana AT cs DOT utexas DOT edu] with "WhittleSearch Shoes dataset" in the subject line so we know who downloaded our dataset, and also so we can let you know of any updates to this dataset. We will also post any updates on this page.

Note: The full OSR and PubFig datasets from the "Relative Attributes" paper can be found here.