CS381V: Visual Recognition, Spring 2025

Meets: Thurs 3:05-6:00 pm in GDC 4.304

Unique#: 51280

Instructor: Kristen Grauman 
Office:
GDC 4.726 
Office hours: by appointment (send email)

TAs: Ashutosh Kumar (kumar.ashutosh@utexas.edu) and
Sagnik Majumder (sagnik@cs.utexas.edu)
Office: GDC 4S vision lab
Office hours: by appointment (send email)

Schedule

Requirements

Piazza for accessing assignments, submitting reviews, assignment questions (Please enroll yourselves).  

Canvas for grades.

Course overview

This is an advanced graduate seminar course in computer vision.   We will survey and discuss current papers relating to high-level visual understanding—objects, scenes, activities, and multimodal learning— with an emphasis on new problems in video.  

The goals of the course are to understand what are the important problems, how are they being approached, and how well do things work today.  We will actively analyze strengths and weaknesses, and strive to identify interesting open questions and directions for future research.  

The class meets in person and will consist of student presentations about papers, discussion, and intermittent implementation working sessions.  Outside class sessions, students will gain hands-on experience via assignments and a final project.

Auditing the course: Due to the format of the course and classroom, unfortunately we are not able to accommodate auditing.  The class sessions are for registered students only.

Requirements 

Students will be responsible for:


Important details on all the requirements and grading breakdown can be seen
here.

Prereqs 

Courses in computer vision and machine learning (CS 376 / CS 378H Computer Vision and/or CS 391 Machine Learning and/or CS 395T Deep Learning, or similar); ability to understand and analyze conference papers in this area; programming required for assignments and final project.

Please talk to me if you are unsure if the course is a good match for your background.  I generally recommend scanning through a few papers on the syllabus to gauge what kind of background is expected.  I don't assume you are already familiar with every single algorithm/tool/feature a given paper mentions, but you should feel comfortable following the key ideas.

Topics

  1. Objects and scenes
  2. Activity and video
  1. Video representations and activity recognition
  2. Egocentric video and first-person perception
  3. Procedural activity and long video understanding
  4. Video and language
  5. People: bodies, hands, clothes
  6. Skill assessment and AI coaching
  1. Multisensory
  1. Vision and sound
  2. Vision and touch
  1. Beyond AI perception
  1. Vision and cog sci
  2. Ethics

 [reading list]

 

Important dates