We explore the problem of predicting "just noticeable differences" in a visual attribute. While some pairs of images have a clear ordering for an attribute (e.g., A is more sporty than B), for others the difference may be indistinguishable to human observers. However, existing relative attribute models are unequipped to infer partial orders on novel data. Attempting to map relative attribute ranks to equality predictions is non-trivial, particularly since the span of indistinguishable pairs in attribute space may vary in different parts of the feature space. We develop a Bayesian local learning strategy to infer when images are indistinguishable for a given attribute. On the UT-Zap50K shoes and LFW-10 faces datasets, we outperform a variety of alternative methods. In addition, we show the practical impact on fine-grained visual search.
Main Task: Given 2 distinct images, determine when the difference in the strength of an attribute becomes indistinguishable.
Imagine you are given a pile of images of Barack Obama, and you must sort them according to where he looks most to least serious. Can you do it? Surely there will be some obvious ones where he is more serious or less serious. There will even be image pairs where the distinction is quite subtle, yet still perceptible. However, you are likely to conclude that forcing a total order is meaningless: while the images exhibit different degrees of the attribute seriousness, at some point the differences become indistinguishable. It's not that the pixel patterns in indistinguishable image pairs are literally the same—they just can't be characterized consistently as anything other than "equally serious".
Problem: Existing models for relative attributes assume that all images are orderable, that at test time, the system can and should always distinguish which image in a pair exhibits the attribute more.
We propose a Bayesian local learning approach to infer when two images are indistinguishable for a given attribute. We argue that this situation calls for a model of just noticeable difference among attributes. Just noticeable difference (JND) is a concept from psychophysics and it refers to the amount a stimulus has to be changed in order for it to be detectable by human observers.
Challenge: Analogous to the MacAdam ellipses in the CIE x,y color space (right), relative attribute space is likely not uniform (left). That is, the regions within which attribute differences are indistinguishable may vary in size and orientation across the high-dimensional visual feature space. Here we see the faces within each "equally smiling" cluster exhibit varying qualities for differentiating smiles—such as age, gender, and visibility of the teeth—but are still difficult or impossible to order in terms of smiling-ness. Depending on where we look in the feature space, the magnitude of attribute difference required to register a perceptible change may vary.
Pipeline: (1) Learn a ranking function w using a standard relative attribute ranking model. We treat the projected ranking scores R(x) as imperfect mid-level representations. (2) Estimate the likelihood densities of the equal and ordered pairs, respectively, using the pairwise distances in the relative attribute space (i.e. rank margins). (3) Determine the local prior by counting the labels of the analogous pairs in the image descriptor space. (4) Compute the posterior using the likelihood and the prior terms to predict whether the novel pair is distinguishable (not depicted).
We evaluate our method on 2 challenging datasets containing instance-level pairwise supervision: UT-Zap50K (shoes) and LFW-10 (faces). We compare our method to the baselines described below:
Qualitative: Here, we observe the subtleties of JND. Whereas past methods would be artificially forced to make a comparison for the left panel of image pairs, our method declares them indistinguishable. Pairs may look very different overall (e.g., different hair, race, headgear) yet still be indistinguishable in the context of a specific attribute. Meanwhile, those that are distinguishable (right panel) may have only subtle differences.
Quantitative: JND detection accuracy for all attributes based on F1-scores. We show the precision-recall and ROC curves (AUC values in the legend). We outperform all baselines by a sizeable margin, roughly 4-18% on UT-Zap50K and 10-15% on LFW-10. This clearly demonstrates the advantages of our local learning approach, which accounts for the non-uniformity of attribute space.
Image Search: We incorporate our model into the existing WhittleSearch image search framework [Kovashka et al. 12]. WhittleSearch is an interactive method that allows a user to provide relative attribute feedback. We argument this pipeline such that the user can express not only "more/less" preferences, but also "equal" preferences. For example, the user can now say, "I want cars that are similarly streamlined as car X." We show experimentally that enriching the feedback in this manner helps the user more quickly zero in on the relevant images that match his envisioned target.
A. Yu and K. Grauman. "Just Noticeable Differences in Visual Attributes". In ICCV, 2015. [bibtex]
@InProceedings{jnd,
author = {A. Yu and K. Grauman},
title = {Just Noticeable Differences in Visual Attributes},
booktitle = {International Conference on Computer Vision (ICCV)},
month = {Dec},
year = {2015}
}