WhittleSearch: Image Search with Relative Attribute Feedback
Adriana Kovashka, Devi Parikh, and Kristen Grauman
Note: We have a patent pending for this work.
We also have a demo for this project.
Abstract
We propose a novel mode of feedback for image search,
where a user describes which properties of exemplar images
should be adjusted in order to more closely match his/her
mental model of the image(s) sought. For example, perusing
image results for a query "black shoes", the user might
state, "Show me shoe images like these, but sportier." Offline,
our approach first learns a set of ranking functions,
each of which predicts the relative strength of a nameable
attribute in an image ('sportiness', 'furriness', etc.). At
query time, the system presents an initial set of reference
images, and the user selects among them to provide relative
attribute feedback. Using the resulting constraints in
the multi-dimensional attribute space, our method updates
its relevance function and re-ranks the pool of images. This
procedure iterates using the accumulated constraints until
the top ranked images are acceptably close to the user.s envisioned
target. In this way, our approach allows a user to
efficiently "whittle away" irrelevant portions of the visual
feature space, using semantic language to precisely communicate
her preferences to the system. We demonstrate the
technique for refining image search for people, products,
and scenes, and show it outperforms traditional binary relevance
feedback in terms of search speed and accuracy.
Introduction
Existing image search methods rely either on keywords or on content-based retrieval. Keywords are not enough-- we cannot pre-tag all images the user might like to search with keywords that match whatever query the user might come up with. On the other hand, content-based image retrieval is limited by the well-known "semantic gap" between low-level cues and the higher-level user intent. User feedback can help, but current methods provide a very narrow channel for human feedback -- it is not clear what about the marked images is relevant or irrelevant.
Therefore, we propose to use relative attributes for more specific user feedback in image search. We allow user to describe precisely what is missing from current set of results. The user expresses the semantics of their search goal through relative attributes, relating their target and some pre-selected exemplar images. Note that attributes (or "concepts" in the information retrieval community) have been used previously for search, but users have not been allowed to isolate individual attributes as a handle for feedback.
Our idea: Allow users to give relative attribute feedback on reference images to refine their image search.
Approach
Step 1: Predict relative attribute strengths.
Step 2: Get user statements relating their search target to exemplar images.
Step 3: Use the constraints to whittle away irrelevant regions of the multi-dimensional attribute space.
Background: Binary Relevance Feedback
- User marks some images as relevant or irrelevant given their search target.
- We learn a binary classifier using the relevant images as positives and the irrelevant images as negatives, and rank images in the dataset by the classifier outputs.
Relative Attribute Feedback
Learning to predict relative attributes
We learn relative attributes as in Parikh and Grauman, "Relative Attributes", ICCV 2011.
Interface for image-level relative attribute annotations.
- Obtain ordered image pairs and unordered pairs such that (image i has stronger presence of attribute m than image j) and (the images have equivalent presence of attribute m).
- Learn a ranking function such that as many of the following constraints are satisfied as possible:
.
- For the latter, use the formulation due to Thorsten Joachims, "Optimizing Search Engines Using Clickthrough Data", KDD 2002:
minimize subject to:
Updating the scoring function from feedback
- User selects some images and marks how they differ from the image want, thus defining constraints: "I want [objects] that are [more/less] [attribute name] than this image."
- We update the scores for each image in the dataset, using these constraints.
- For a constraint of the type: "I want images exhibiting more of attribute m than reference image , the scoring function should satisfy:
- For a constraint of the type: "I want images exhibiting less of attribute m than reference image , the scoring function should satisfy:
- For constraints of the type: "I want images that are similar in terms of attribute m to reference image , the scoring function should satisfy:
- At the top we rank images whose score , i.e. images that satisfy all F constraints given so far. Next are images which satisfy F-1 constraints, etc.
A toy example illustrating the intersection of relative
constraints with M = 2 attributes. The images are plotted on
the axes for both attributes. The space of images that satisfy each
constraint are marked in a different color. The region satisfying all
constraints is marked with a black dashed line. In this case, there
is only one image in it (outlined in black).
- Note that our method allows user to refine their query, in a way that a query stated in absolute attribute terms cannot.
- Our method is efficient, since it only involves set-logic operations and no learning.
Hybrid Feedback Approach
- Using images marked as positive (), ones marked as negative (), and our the sets of images which satisfy k constraints, we can define a set which includes all relevance preferences:
and a set which expresses equivalent relevance:
- Then we can learn a relevance ranking function.
Experimental Results
Experimental Design
Datasets
- Shoes -- 14,658 shoe images from the Attribute Discovery dataset, augmented with 10 attributes: pointy at the front, open, bright in color, high at the heel, covered with ornaments, shiny, long on the leg, formal, sporty, feminine -- dataset and instance-level relative annotations can be downloaded here
- OSR -- 2,688 images from Outdoor Scene Recognition; 6 attributes -- instance-level relative annotations can be downloaded here, dataset can be found here
- PubFig -- 772 images from Public Figures dataset; 11 attributes -- instance-level relative annotations can be downloaded here, dataset can be found here
Evaluation Metrics
- Rank -- assigned to the secret image (low is good since image appears closer to top)
- NDCG@50 -- correlation between the method's ranking and a ground truth ranking (high is good)
- Ground truth: images ranked by their distance to the secret image in learned feature space
Feedback generation
- Pair each target image with 16 exemplars, show the pairs to users on Mechanical Turk and ask:
- For our method: "Is the target image more or less [attribute name] than the exemplar?"
- For binary feedback baseline: "Is the target image similar or dissimilar from the exemplar?"
- Or, generate feedback automatically:
- For our method: randomly sample constraints using relationship between secret image's and exemplar's predicted relative attribute values
- For baseline: sample positives/negatives using their image feature distance to secret image
Feedback Results
Impact of iterative feedback
Our method converges faster than SVM-based binary feedback. Our advantage is stronger on datasets with more fluid categories.
Impact of amount of feedback
Our method learns faster; it achieves higher accuracy with fewer constraints.
Impact of reference images
Baseline needs good positives and negatives; our method needs similar images or a mix.
Ranking accuracy with human-given feedback
Initialization with random reference images.
Initialization via an attribute-keyword query.
Our method outperforms binary search but not on OSR; may be due to human difficulty with attribute vocabulary. Relative attribute feedback can also refine keyword search.
Qualitative Results
Example iterative search results with relative attribute feedback.
Example search result with hybrid feedback.
Note: You can find additional results in our supplementary file.
Consistency of Relative Supervision Types
Class-level vs. instance-level
Humans agree more when asked to compare image instances (6% disagreement) than when asked to compare image categories (13% disagreement). Furthermore:
Instance-based learning is less error-prone than class-based learning for images with less strict category boundaries.
Absolute vs. relative
Humans agree more when they make relative statements (17% disagreement) vs absolute statements (22% disagreement).
Conclusion
We proposed a method that allows user to communicate very precisely how the retrieved results compare with their mental model. Our method refines results more effectively, often with less human effort.
Publication and Dataset
WhittleSearch: Image Search with Relative Attribute Feedback. Adriana Kovashka, Devi Parikh, and Kristen Grauman. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, June 2012. [pdf] [supplementary] [.bib file] [poster] [Shoes dataset] [OSR annotations] [PubFig annotations]
Note: Please send a blank email to [adriana AT cs DOT utexas DOT edu] with "WhittleSearch Shoes dataset" in the subject line so we know who downloaded our dataset, and also so we can let you know of any updates to this dataset. We will also post any updates on this page.
Note: The full OSR and PubFig datasets from the "Relative Attributes" paper can be found here.