Foreground Focus: Finding Meaningful

Features in Unlabeled Images


Yong Jae Lee and Kristen Grauman

University of Texas at Austin







We present a method to automatically discover meaningful features in unlabeled image collections. Each image is decomposed into semi-local features that describe neighborhood appearance and geometry. The goal is to determine for each image which of these parts are most relevant, given the image content in the remainder of the collection. Our method first computes an initial image-level grouping based on feature correspondences, and then iteratively refines cluster assignments based on the evolving intra-cluster pattern of local matches. As a result, the significance attributed to each feature influences an image’s cluster membership, while related images in a cluster affect the estimated significance of their features. We show that this mutual reinforcement of object-level and feature-level similarity improves unsupervised image clustering, and apply the technique to automatically discover categories and foreground regions in images from benchmark datasets.



System Overview



The images are grouped based on weighted semilocal feature matchings (a), and then image-specific feature weights are adjusted based on their contribution in the match relative to all other intra-cluster images (b). These two processes are iterated (as denoted by the block arrows in the center) to simultaneously determine foreground features while improving cluster quality. Dotted arrows denote images with updated cluster memberships.





We performed experiments both to analyze the mutual reinforcement of foreground and clusters, and to compare against existing unsupervised methods. We work with images from the Caltech-101, because the dataset provides object segmentations that we need as ground truth to evaluate our foreground detection.  We formed a four-class (Faces, Dalmatians, Hedgehogs, and Okapi) and 10-class (previous four plus Leopards, Car side, Cougar face, Guitar, Sunflower, and Wheelchair) set. For each class, we use the first 50 images.



If our algorithm correctly identifies the important features, we expect those features to lie on the foreground objects since that is what primarily re-occurs in these datasets. To evaluate this, we compare the feature weights computed by our method with the ground truth list of foreground features. We quantify accuracy by the percentage of total feature weight in an image that our method attributes to true foreground features.

As our method weights foreground features more highly, we also expect a positive effect on cluster quality. Since we know the true labels of each image, we can use the F-measure to measure cluster homogeneity.





Foreground Focus: Finding Meaningful Features in Unlabeled Images [pdf] [slides (ppt)]
Yong Jae Lee and Kristen Grauman
In Proceedings of the 19th British Machine Vision Conference (BMVC), Leeds, U.K., September 2008.