CS381V: Visual Recognition, Spring 2016

Course overview        Syllabus        Detailed schedule          Piazza

Meets: Wednesdays 1-4 pm in GDC 2.502

Instructor: Kristen Grauman 
Office: GDC 4.726 
Office hours: by appointment (send email)

TA: Kai-Yang Chiang
Office: GDC 4.802D
Office hours: Thursday 10:30 am-12:30 pm

Please use Piazza for assignment questions.

Course overview:

Topic: This is a graduate seminar course in computer vision.   We will survey and discuss current vision papers relating to visual recognition (primarily of objects and object categories), auto-annotation of images, and scene understanding.  The goals of the course will be to understand current approaches to some important problems, to actively analyze their strengths and weaknesses, and to identify interesting open questions and possible directions for future research.

See the syllabus for an outline of the main topics we'll be covering.

Requirements: Students will be responsible for:

Note that presentations are due one week before the slot your presentation is scheduled.  This means you will need to read the papers, prepare experiments, create slides, etc. more than one week before the date you are signed up for.  The idea is to meet and discuss ahead of time, so that we can iterate as needed the week leading up to your presentation.  Please coordinate in advance with the other student presenters on your day to ensure that no single paper receives 2 experiments or 2 paper presentations.

More details on the requirements and grading breakdown are here.

Prereqs:  Courses in computer vision and/or machine learning (CS 376 Computer Vision and/or CS 391 Machine Learning, or similar); ability to understand and analyze conference papers in this area; programming required for experiment presentations and projects. 

Please talk to me if you are unsure if the course is a good match for your background.  I generally recommend scanning through a few papers on the syllabus to gauge what kind of background is expected.  I don't assume you are already familiar with every single algorithm/tool/image feature a given paper mentions, but you should feel comfortable following the key ideas.

Syllabus overview:
  1. Instance recognition
  2. Category recognition
  3. Mid-level representations
  4. Object detection
  5. Attributes and parts
  6. Language and vision
  7. Low-supervision learning
  8. Great outdoors
  9. 3d scenes and objects
  10. Recognition in action
  11. Noticing and remembering
  12. Social signals

Important dates:

Schedule and papers:

Papers and links
Items due
Jan 20
Course intro 


Jan 27
No class

Topic preferences due via email to Kai by Wed Jan 27.  Write "CS381V" in the subject line.
Feb 3
Instance recognition

Invariant local features, local feature matching, instance recognition, visual vocabularies and bag-of-words, large-scale mining 

object instances
image credit: Andrea Vedaldi and Andrew Zisserman
  • *Object Recognition from Local Scale-Invariant Features, Lowe, ICCV 1999.  [pdf]  [code] [other implementations of SIFT] [IJCV]

  • *Local Invariant Feature Detectors: A Survey, Tuytelaars and Mikolajczyk.  Foundations and Trends in Computer Graphics and Vision, 2008. [pdf]  [Oxford code] [Selected pages -- read pp. 178-188, 216-220, 254-255]

  • *Video Google: A Text Retrieval Approach to Object Matching in Videos, Sivic and Zisserman, ICCV 2003.  [pdf]  [demo]

Coding assignment 1 out, due Friday Feb 19.
Feb 10
Category recognition

Image descriptors, classifiers, support vector machines, nearest neighbors, convolutional neural networks, large-scale image collections

Image credit: ImageNet
  • *ImageNet Large Scale Visual Recognition Challenge.  Russakovsky et al. IJCV 2015.  [pdf]
  • *ImageNet Classification with Deep Convolutional Neural Networks.  A. Krizhevsky, I. Sutskever, and G. Hinton.  NIPS 2012  [pdf]
  • *Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, Lazebnik, Schmid, and Ponce, CVPR 2006. [pdf]  [15 scenes dataset]  [libpmk] [Matlab]
  • *80 Million tiny images: a large dataset for non-parametric object and scene recognition.  A. Torralba, R. Fergus, and W. Freeman.  PAMI 2008. [pdf]
slides 1

Intro to categorization and case studies of discriminative models

slides 2 handout
slides 2 with links

Guest lecture on CNNs, Dinesh Jayaraman

Monday Feb 15, 5-7 pm: Hands on CNN/Caffe tutorial, by Dinesh Jayaraman and Yu-Chuan Su.  GDC 4.302 (not the usual classroom)

Tutorial slides
Tutorial code
Feb 17
Mid-level representations

Segmentation into regions, contours, grouping, video segmentation, category-independent object proposals
, 3d structure

Image credit: Pablo Arbelaez et al.
  • *Constrained Parametric Min-Cuts for Automatic Object Segmentation. J. Carreira and C. Sminchisescu. CVPR 2010.  [pdf] [code]

  • *Selective Search for Object Recognition.  J. Uijilings, K. van de Sande, T. Gevers, A. Smeulders.  IJCV 2013.  [pdf] [project,code]
  • *Discriminatively trained dense surface normal estimation.  L. Ladicky, B. Zeisl, M. Pollefeys.  ECCV 2014.  [pdf]
  • *Streaming hierarchical video segmentation.  C. Xu, C. Xiong, J. Corso.  ECCV 2012.  [pdf]  [code]

Paper-Chun-Chen Kuo
Paper-Andrew Sharp
Expt-Kim Houck
Expt-Chad Voegele

Coding assignment 2 out Monday Feb 22, due Wed March 9 (with follow up due Thurs March 10)
Feb 24
Object detection

Localizing objects within an image, efficient search, part-based models, semantic segmentation, voting, context, objects in scenes

Image credit: Felzenszwalb et al.
  • *Rich feature hierarchies for accurate object detection and semantic segmentation.  R. Girshick et al.  CVPR 2013 [pdf]
  • *Contextual Priming for Object Detection, A. Torralba.  IJCV 2003.  [pdf] [web] [code]

  • *A Discriminatively Trained, Multiscale, Deformable Part Model, by P. Felzenszwalb,  D.  McAllester and D. Ramanan.   CVPR 2008.  [pdf]  [code]

  • *Hough Forests for Object Detection, Tracking, and Action Recognition.  J. Gall et al.  PAMI 2011.  [pdf] [code]

Paper-Richard Teammco
Paper-Huihuang Zheng
Expt-Adam Allevato
Expt-William Xie

Tuesday March 1, 11 am: UTCS Distinguished Lecture by Prof. Jim Rehg, Georgia Tech.  GDC Auditorium
Mar 2
Attributes and parts

Visual properties, learning from natural language descriptions, intermediate shared representations

Image credit: Lampert et al.
  • Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, C. Lampert, H. Nickisch, and S. Harmeling, CVPR 2009  [pdf] [web] [data]

  • Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations.  L. Bourdev and J. Malik.  CVPR 2009.  [pdf]  [code]  [web]

  • Relative attributes.  Parikh and Grauman.  ICCV 2011.  [pdf[code/data]
  • Discovering the spatial extent of relative attributes.  F. Xiao and Y. J. Lee.  ICCV 2015. [pdf]  [code]
Paper-Ruohan Gao
Paper-Akanksha Saran
Paper-Zhuode Liu
Expt-Aishwarya Padmakumar
Expt-Abhishek Sinha
Expt-Ashwini Venkatesh
Thursday March 4, 11 am: Talk by Aditya Khosla, MIT.  GDC Auditorium

Tuesday March 8, 11 am: Talk by Philipp Krahenbuhl, UC Berkeley.  GDC Auditorium
Mar 9
Language and vision

Image credit: Antol et al.
  • Sequence to Sequence - Video to Text.  S. Venugopalan et al.  ICCV 2015  [pdf] [web] [code]
  • Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing.  Izadinia, Sadeghi, Divvala, Hajishirzi, Choi,  Farhadi.  ICCV 2015 [pdf]
  • Ask Your Neurons: A Neural-Based Approach to Answering Questions About Images.  Malinowski, Rohrbach, Fritz.  ICCV 2015.  [pdf]  [video] [code/data]

Paper-Tyler Folkman
Paper-Edward Banner
Paper-Surbhi Goel
Expt-Huihuang Zheng
Expt-Kunal Lad

Guest speaker: Subha Venugopalan
Project proposal and paper guidelines

Tuesday March 22, 11 am: Talk by David Fouhey, CMU.  GDC Auditorium

Mar 16
No class - spring break
Mar 23
Low-supervision learning

Feature learning, semantics learning. Leveraging free or nearly free cues for supervision.  Internet data, video, egomotion, context...

Image credit: X. Chen et al. 
  • Learning image representations equivariant to ego-motion.  Jayaraman and Grauman.  ICCV 2015.  [pdf]  [web] [slides] [data]
  • NEIL: Extracting Visual Knowledge from Web Data, Chen, Shrivastava, and Gupta, ICCV 2013 [pdf]
  • Learning temporal embeddings for complex video analysis.  Ramanathan, Tang, Mori, Fei-Fei. ICCV 2015  [pdf]
  • Unsupervised learning of visual representations using videos.  X. Wang and A. Gupta.  ICCV 2015.  [pdf]  [code]  [web]
Paper-Hilgad Montelo
Paper-Chad Voegele
Paper-Bo Xiong
Expt-Ashish Bora
Expt-Ruohan Gao

Mar 30
Great outdoors

Linking and visualizing multi-view data from tourist photos, image-based geolocalization, natural scene text detection, discovering correlated non-visual properties in street-side imagery

Image credit: T-Y. Lin et al.
  • *Building Rome in a Day, Agarwala et al. CACM 2011.  [pdf]  [web]  [code]
  • *Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition.  Jaderberg, Simonyan, Vedaldi, Zisserman. NIPS Deep Learning Workshop, 2014.  [pdf]  [journal paper]
  • *Learning Deep Representations for Ground-to-Aerial Geolocalization.  T. Lin, Y. Cui, S. Belongie, and J. Hays.  CVPR 2015.  [pdf]  [poster] [slides]
  • *City Forensics: Using VIsual Elements to Predict Non-Visual City Attributes.  Arietta, Efros, Ramammoorthy, Agrawala.  Trans on Visualization and Graphics, 2014.  [pdf]  [web]

Paper-Manu Agarwal
Paper-Kunal Lad
Expt-Zhuode Liu
Expt-Ruohan Zhang
Expt-Richard Teammco

April 6
3d scenes and objects

3d structure (single views, panoramas, RGBD) and scene layout for visual recognition

3d objects
Image credit: Y. Xiang et al.
  • PanoContext: A Whole-room 3D Context Model for Panoramic Scene Understanding.  Y. Zhang, S. Song, P. Tan, J. Xiao.  ECCV 2014.  [pdf] [data/code]  [slides]
  • Data-Driven 3D Voxel Patterns for Object Category Recognition, Y. Xiang, W. Choi, Y. Lin and S. Savarese, CVPR 2015.  [pdf]  [web/data] [slides]
  • Indoor Segmentation and Support Inference from RGBD Images.  N. Silberman, D. Hoiem, P. Kohli, and R. Fergus.  ECCV 2012.  [pdf] [code/data]  [NYU depth dataset] [slides]

Paper-Adam Allevato
Paper-William Xie
Expt-Hilgad Montelo
Expt-Chun-Chen Kuo
Expt-Andrew Sharp

April 13
Recognition in action

Learning how to move for recognition, manipulation.  3D objects and the next best view.

Image credit: Malmir et al.
  • Deep Q-learning for active recognition of GERMS: Baseline performance on a standardized dataset for active learning.  Malmir et al. BMVC 2015.  [pdf] [data]
  • Active Object Recognition using Vocabulary Trees.  N Govender, J. Claassens, P. Torr, J. Warrell.  Workshop on Robot Vision, 2013.  [pdf]
  • 3D ShapeNets: A Deep Representation for Volumetric Shape Modeling.  Wu et al. CVPR 2015.  [pdf] [code/data] [slides]
Paper-Aishwarya Padmakumar
Paper-Ruohan Zhang
Paper-Abhishek Sinha
Expt-Manu Agarwal
Expt-Yinan Zhao

April 20
Noticing and remembering

Predicting what gets noticed or remembered in images and video.  Saliency, importance, memorability, photography biases.

Image credit: T. Liu et al.
  • Understanding and Predicting Image Memorability at a Large Scale.  A. Khosla, S. Raju, A. Torralba, and A. Oliva.  ICCV 2015.  [pdf]  [web] [code/data]
  • Learning video saliency from human gaze using candidate selection.  D. Rudoy et al. CVPR 2013 [pdf] [web] [video] [code]
  • Learning to Detect a Salient Object.  T. Liu et al. CVPR 2007.  [pdf]  [results]  [data]  [code]

Paper-Kim Houck
Paper-Ashish Bora
Expt-Bo Xiong
Expt-Akanksha Saran
Expt-Tyler Folkman

April 27
Social signals

Cues from people in images: body pose, social groups and roles, attention, gaze following, scene structure

Image credit: Khosla et al.
  • Where are they looking?  Khosla, Recasens, Vondrick, Torralba.  NIPS 2015.  [pdf] [demo] [web]
  • People Watching: Human Actions as a Cue for Single View Geometry.  Fouhey, Delaitre, Gupta, Efros, Laptev, Sivic.  ECCV 2012  [pdf] [journal] [web] [slides] [video]
  • Discovering Groups of People in Images.  Choi, Chao, Pantofaru, Savarese.  ECCV 2014  [pdf] [web]

Paper-Yinan Zhao
Paper-Ashwini Venkatesh
Expt-Surbhi Goel
Expt-Edward Banner

Note April 27/29 deadlines for free poster printing at UTCS

See Piazza post for details

May 4
Final project presentations in class

See poster presentation instructions on Piazza.

Final papers and poster reviews due Friday May 6

LDV Vision Challenges

Index of computer vision datasets