Existing methods to learn visual attributes are prone to learning the wrong thing---namely, properties that are correlated with the attribute of interest among training samples. Yet, many proposed applications of attributes rely on being able to learn the correct semantic concept corresponding to each attribute. We propose to resolve such confusions by jointly learning decorrelated, discriminative attribute models. Leveraging side information about semantic relatedness, we develop a multi-task learning approach that uses structured sparsity to encourage feature competition among unrelated attributes and feature sharing among related attributes. On three challenging datasets, we show that accounting for structure in the visual attribute space is key to learning attribute models that preserve semantics, yielding improved generalizability that helps in the recognition and discovery of unseen object categories.

Motivation: The curse of correlation

Semantic visual attributes are supposed to be shareable across categories, and in a lot of their envisioned applications, they are expected to be detected correctly in novel settings entirely different from the attribute training data. Yet, the status quo independent attribute classifier training pipeline ignores this, and is content with learning properties correlated with the semantic attribute on the training data.

Figure: Given the above training data and no other information, can you figure out which concept to learn? Could it be, say, brown? What about furry? forest animal? Or maybe combinations of these?

Figure: As an extreme case, suppose the same training image set is fed into a system to train both forest animal and brown, the standard learner simply learns the same concept for both, so that at the very least, it will be wrong on one of the two. In the example in the figure, the standard learner uses tree-like patterns as cues for both forest animals and brown, i.e., the brown classifier is wrong.

Problem: Attributes that are correlated in the training data may easily be conflated by a learner.

Solution idea

Figure: Our idea is to encourage different attributes to use different features. By forcing the brown classifier and the forest animal classifier to compete for features, we will hopefully avoid conflations and learn what it truly means to be brown. In the above image, the brown classifier correctly selects the color histogram features.

Figure: The key to our approach is to jointly learn all attributes in a vocabulary, while enforcing a structured sparsity prior that aligns feature sharing patterns with semantically close attributes and feature competition with semantically distant ones.




Main result: attribute detection

Figure: By decorrelating attributes, our attribute detectors generalize much better than previous approaches to novel unseen categories.

See the paper for more extensive results, including attribute detection and localization examples, plus tests of attribute classifier applicability to high-level tasks like zero-shot recognition and category discovery.


author = {D. Jayaraman and F. Sha and K. Grauman},
title = {{Decorrelating Semantic Visual Attributes by Resisting the Urge to Share}},
booktitle = {CVPR},
year = {2014}



Supplementary material

CVPR 2014 oral presentation slides

CVPR 2014 poster