Meets: Wednesdays 1-4 pm in GDC 2.502
Instructor: Kristen Grauman
Office: GDC 4.726
Office hours: by appointment (send email)
TA: Kai-Yang Chiang
Office: GDC 4.802D
Office hours: Thursday 10:30 am-12:30 pm
Please use Piazza for
This is a graduate seminar course in computer
vision. We will survey and discuss current
vision papers relating to visual recognition
(primarily of objects and object
categories), auto-annotation of images, and scene
understanding. The goals of the course will be to
understand current approaches to some important problems,
to actively analyze their strengths and weaknesses, and to
identify interesting open questions and possible
directions for future research.
See the syllabus for an outline
of the main topics we'll be covering.
Students will be responsible for:
- writing two paper reviews each week
- posting two short review summaries/discussion points on the course discussion board (Piazza)
- participating in discussions during class
- completing two programming assignments
- presenting ~twice in class (details depending on final enrollment)
- completing a project with a partner
Note that presentations are due
one week before
the slot your presentation is scheduled. This means
you will need to read the papers, prepare experiments,
create slides, etc. more than
one week before the date you are signed up for.
The idea is to meet and discuss ahead of time, so that we
can iterate as needed the week leading up to your
presentation. Please coordinate
in advance with the other student presenters on your day
to ensure that no single paper receives 2 experiments or
2 paper presentations.
More details on the requirements
and grading breakdown are here.
Courses in computer vision and/or machine learning (CS 376 Computer Vision and/or CS
391 Machine Learning, or similar); ability to understand
and analyze conference papers in this area; programming
required for experiment presentations and projects.
talk to me if you are unsure if the course is a good match for
your background. I generally recommend scanning through
a few papers on the syllabus to gauge what kind of background
is expected. I don't assume you are already familiar
with every single algorithm/tool/image feature a given paper
mentions, but you should feel comfortable following the key
- Instance recognition
- Category recognition
- Mid-level representations
- Object detection
- Attributes and parts
- Language and vision
- Low-supervision learning
- Great outdoors
- 3d scenes and objects
- Recognition in action
- Noticing and remembering
- Social signals
||Papers and links
||Topic preferences due via
email to Kai by Wed Jan 27. Write "CS381V" in the
Invariant local features, local feature matching, instance recognition, visual vocabularies and bag-of-words, large-scale mining
image credit: Andrea Vedaldi and Andrew Zisserman
assignment 1 out, due Friday Feb 19.
Image descriptors, classifiers, support vector machines, nearest neighbors, convolutional neural networks, large-scale image collections
Image credit: ImageNet
Intro to categorization and case studies of discriminative models
slides 2 handout
slides 2 with links
Guest lecture on CNNs, Dinesh Jayaraman
|Monday Feb 15, 5-7 pm: Hands on
CNN/Caffe tutorial, by Dinesh Jayaraman and Yu-Chuan
Su. GDC 4.302 (not the usual classroom)
Segmentation into regions, contours, grouping, video segmentation, category-independent object proposals, 3d structure
Image credit: Pablo Arbelaez et al.
|Coding assignment 2 out Monday Feb 22, due Wed March 9 (with follow up due Thurs March 10)|
Localizing objects within an image, efficient search, part-based models, semantic segmentation, voting, context, objects in scenes
Image credit: Felzenszwalb et al.
|Tuesday March 1, 11 am: UTCS
Distinguished Lecture by Prof.
Jim Rehg, Georgia Tech. GDC Auditorium
Visual properties, learning from natural language descriptions, intermediate shared representations
Image credit: Lampert et al.
|Thursday March 4, 11 am: Talk
Khosla, MIT. GDC Auditorium
Tuesday March 8, 11 am: Talk by Philipp Krahenbuhl, UC Berkeley. GDC Auditorium
Image credit: Antol et al.
Guest speaker: Subha Venugopalan
proposal and paper guidelines
Tuesday March 22, 11 am: Talk by David Fouhey, CMU. GDC Auditorium
||No class - spring
Feature learning, semantics learning. Leveraging free or nearly free cues for supervision. Internet data, video, egomotion, context...
Image credit: X. Chen et al.
Linking and visualizing multi-view data from tourist photos, image-based geolocalization, natural scene text detection, discovering correlated non-visual properties in street-side imagery
Image credit: T-Y. Lin et al.
scenes and objects
3d structure (single views, panoramas, RGBD) and scene layout for visual recognition
Image credit: Y. Xiang et al.
Learning how to move for recognition, manipulation. 3D objects and the next best view.
Image credit: Malmir et al.
Predicting what gets noticed or remembered in images and video. Saliency, importance, memorability, photography biases.
Image credit: T. Liu et al.
Cues from people in images: body pose, social groups and roles, attention, gaze following, scene structure
Image credit: Khosla et al.
|Note April 27/29 deadlines
for free poster printing at UTCS
See Piazza post for details
||Final project presentations in class
||See poster presentation
instructions on Piazza.
||Final papers and poster reviews due
Friday May 6