Attribute Adaptation for Personalized Image Search

Adriana Kovashka and Kristen Grauman

The University of Texas at Austin

Abstract

Current methods learn monolithic attribute predictors, with the assumption that a single model is sufficient to reflect human understanding of a visual attribute. However, in reality, humans vary in how they perceive the association between a named property and image content. For example, two people may have slightly different internal models for what makes a shoe look "formal", or they may disagree on which of two scenes looks "more cluttered". Rather than discount these differences as noise, we propose to learn user-specific attribute models. We adapt a generic model trained with annotations from multiple users, tailoring it to satisfy user-specific labels. Furthermore, we propose novel techniques to infer user-specific labels based on transitivity and contradictions in the user's search history. We demonstrate that adapted attributes improve accuracy over both existing monolithic models as well as models that learn from scratch with user-specific data alone. In addition, we show how adapted attributes are useful to personalize image search, whether with binary or relative attributes.

Problem

Existing methods (e.g. Lampert et al. CVPR 2009, Farhadi et al. CVPR 2009, Branson et al. ECCV 2010, Kumar et al. PAMI 2011, Scheirer et al. CVPR 2012, Parikh & Grauman ICCV 2011) assume that one model of an attribute is sufficient to capture all user perceptions. However, there are real perceptual differences between annotators. In the following example from our data collection, 5 users confidently declared the shoe on the left formal, while 5 confidently declared the opposite:

These differences also stem from the imprecision of attribute terms:

Our Idea

We treat learning of perceived attributes as an adaptation problem. We adapt a generic attribute predictor trained with a large amount of majority-voted data with a small amount of user-labeled data.

We also propose a method to obtain labels implicitly from user's search history.
The impact of our work is that we capture a user's perception with minimal annotation effort. The resulting personalization makes attribute-based image search more accurate.

Learning Adapted Attributes

Binary attributes: We adapt a generic classifier with user-labeled data as follows:

where the training data is:

This formulation is defined in J. Yang, R. Yan, and A. G. Hauptmann, "Adapting SVM Classifiers to Data with Shifted Distributions", ICDM Workshops, 2007.
Relative attributes: We adapt a generic ranker with user-labeled data as follows:

where the training data is:

This formulation is due to B. Geng, L. Yang, C. Xu, and X.-S. Hua, "Ranking Model Adaptation for Domain-Specific Search", IEEE TKDE, March 2010. The δ parameter controls the contribution of the generic model.

Inferring Implicit User-Specific Labels

We can infer labels for relative attributes implicitly, from a user's search history. If T is the user's target image (which only he/she knows), and A and B are two reference images, then we can infer the right-hand side from the left-hand side via transitivity:

Alternatively, we can also use seeming contradictions in a user's search history. For example, feedback might imply that no images satisfy all constraints:

This in turn indicates that attribute models might be inaccurate. However, it is unlikely that we will have many contradictions on exactly the same attribute, so we relax the condition for detecting a contradiction to also include strongly correlated (or anti-correlated) attributes. In the example below, "feminine" is seen as the opposite of "sporty", as established from data:

We then take pairs of images (in this case, A and C) in the opposite order of how current, and supply those as training pairs to the learner. This way, the set of images that satisfy all constraints is no longer empty.

Accuracy of Adapted Attributes

We use the following datasets and attributes:

Shoes (Berg et al. 2010, Kovashka et al. 2012) which has 14,658 images and 10 attributes: pointy at the front, open, bright, shiny, ornamented, high-heeled, long, formal, sporty, feminine;
and SUN (Patterson and Hays 2010) which has 14,340 images and 102 attributes, of which we use the following 12: sailing, vacationing, hiking, camping, socializing, shopping, vegetation, clouds, natural light, cold, open area, far-away horizon

We compare our User-adaptive approach against the following baselines:

Generic:, the standard approach of learning from majority-voted data;
Generic+:, which is like above but uses more generic data;
and User-exclusive, which learns a user-specific model from scratch, without making use of a generic model.

We train models and test on a held-out set from each user. The following is the average over all attributes and users:

Please see our supplementary file for examples of the performance of the methods on individual users and attributes.

Below is a visualization of some learned generic and adapted spectra for four attributes.

Impact of Adapted Attributes for Personalized Search

We also show that the personalized attribute models allow the user to more quickly find his/her search target. Furthermore, implicitly gathering labels for personalization saves the user time, while producing similar results.

Publication

Attribute Adaptation for Personalized Image Search. Adriana Kovashka and Kristen Grauman. In Proceedings of the International Conference on Computer Vision (ICCV), December 2013. [pdf] [supplementary] [poster] [collected user labels]