CS 378H Honors Machine Learning and Vision
Spring 2017

Assignment 5
Out: Tuesday April 18
Due: Tuesday May 2, 11:59 PM

Face detection with a sliding window

face detection




The sliding window model is conceptually simple: independently classify all image patches as being object or non-object. Sliding window classification is the dominant paradigm in object detection and for one object category in particular -- faces -- it is one of the most noticeable successes of computer vision. For example, modern cameras and photo organization tools have prominent face detection capabilities. These success of face detection (and object detection in general) can be traced back to influential works such as Rowley et al. 1998 and Viola-Jones 2001. You can look at these papers for suggestions on how to implement your detector. However, for this project you will be implementing the simpler (but still very effective!) sliding window detector of Dalal and Triggs 2005. Dalal-Triggs focuses on representation more than learning and introduces the SIFT-like Histogram of Gradients (HoG) representation (pictured to the right). You will not be asked to implement HoG. You will be responsible for the rest of the detection pipeline -- handling heterogeneous training and testing data, training a linear classifier, and using your classifier to classify millions of sliding windows at multiple scales. Fortunately, linear classifiers are compact, fast to train, and fast to execute. A linear SVM can also be trained on large amounts of data, including mined hard negatives.

Details and Starter Code

The following is an outline of the stencil code (see links for provided code and VLFeats library download above):

Creating the sliding window, multiscale detector is the most complex part of this project. It is recommended that you start with a single scale detector which does not detect faces at multiple scales in each test image. Such a detector will not work nearly as well (perhaps 0.3 average precision) compared to the full multi-scale detector. With a well trained multi-scale detector with small step size you can expect to match the papers linked above in performance with average precision above 0.9.


The choice of training data is critical for this task.  Face detection methods have traditionally trained on heterogeneous, even proprietary, datasets. As with most of the literature, we will use three databases: (1) positive training crops, (2) non-face scenes to mine for negative training data, and (3) test scenes with ground truth face locations.

You are provided with a positive training database of 6,713 cropped 36x36 faces from the Caltech Web Faces project. This subset has already filtered away faces which were not high enough resolution, upright, or front facing. There are many additional databases available For example, see Figure 3 in Huang et al. and the LFW database described in the paper. You are free to experiment with additional or alternative training data for extra credit.

Non-face scenes, the second source of your training data, are easy to collect. You are provided with a small database of such scenes from Wu et al. and the SUN scene database. You can add more non-face training scenes, although you are unlikely to need more negative training data unless you are doing hard negative mining for extra credit.

The most common benchmark for face detection is the CMU+MIT test set. This test set contains 130 images with 511 faces. The test set is challenging because the images are highly compressed and quantized. Some of the faces are illustrated faces, not human faces. For this project, we have converted the test set's ground truth landmark points in to bounding boxes. We have inflated these bounding boxes to cover most of the head, as the provided training data does. For this reason, you are arguably training a "head detector" not a "face detector" for this project.

Copies of these data sets are provided with your starter code linked above in Resources.   Please do not include them in your submission on Canvas.

Write up

In the report, please include the following:

Face detection contest

There will be extra credit and recognition for the students who achieve the highest average precision, whether with the baseline classifier or any bells and whistles from the extra credit.   You aren't allowed to modify evaluate_all_detections.m which measures your accuracy.

Extra Credit

For all extra credit, be sure to analyze in your report cases where your extra credit implementation has improved classification accuracy. Each item is "up to" some amount of points because trivial implementations may not be worthy of full extra credit.  A maximum of 20 extra credit points are allowable.

Some ideas:

For any of the above, be sure to explain clearly in your report --- with quantitative results wherever relevant --- the outcomes via experiments.

Handing in

Creating a single zip file to submit on Canvas.  It should contain the following:

Do NOT submit the provided data within your zip file.  The TA will have a local copy.

How grades will be calculated

Advice, Tips


This project is based on the one originally created by Prof. James Hays of Georgia Tech, who kindly gave us permission to use his project description and code.   Figures in this handout are from Dalal and Triggs Thanks also to Chao-Yeh Chen for performing a trial run of the assignment.

im2 im5

We tried to make especially easy test cases with neutral, frontal faces.

im7 im10

The CS 378H class demonstrates how not to be seen by a robot.