CS381V: Visual Recognition, Spring 2025

Meets: Thurs 3:05-6:00 pm in GDC 4.304

Unique#: 51280

Instructor: Kristen Grauman
Office: GDC 4.726
Office hours: by appointment (send email)

TAs: Ashutosh Kumar (kumar.ashutosh@utexas.edu) and
Sagnik Majumder (sagnik@cs.utexas.edu)
Office: GDC 4S vision lab
Office hours: by appointment (send email)

Schedule

Requirements

Piazza for accessing assignments, submitting reviews, assignment questions (Please enroll yourselves).

Canvas for grades.

Course overview

This is an advanced graduate seminar course in computer vision. We will survey and discuss current papers relating to high-level visual understanding—objects, scenes, activities, and multimodal learning— with an emphasis on new problems in video.

The goals of the course are to understand what are the important problems, how are they being approached, and how well do things work today. We will actively analyze strengths and weaknesses, and strive to identify interesting open questions and directions for future research.

The class meets in person and will consist of student presentations about papers, discussion, and intermittent implementation working sessions. Outside class sessions, students will gain hands-on experience via assignments and a final project.

Auditing the course: Due to the format of the course and classroom, unfortunately we are not able to accommodate auditing. The class sessions are for registered students only.

Requirements

Students will be responsible for:

writing two paper reviews each week, due before class
participating in discussions during class
completing three programming assignments in pairs
presenting papers and background material with a small group in class (group size and frequency depends on final enrollment)
completing a research-oriented final project with a small group

Important details on all the requirements and grading breakdown can be seen here.

Prereqs

Courses in computer vision and machine learning (CS 376 / CS 378H Computer Vision and/or CS 391 Machine Learning and/or CS 395T Deep Learning, or similar); ability to understand and analyze conference papers in this area; programming required for assignments and final project.

Please talk to me if you are unsure if the course is a good match for your background. I generally recommend scanning through a few papers on the syllabus to gauge what kind of background is expected. I don't assume you are already familiar with every single algorithm/tool/feature a given paper mentions, but you should feel comfortable following the key ideas.

Topics

Objects and scenes
Activity and video

Video representations and activity recognition
Egocentric video and first-person perception
Procedural activity and long video understanding
Video and language
People: bodies, hands, clothes
Skill assessment and AI coaching

Multisensory

Vision and sound
Vision and touch

Beyond AI perception

Vision and cog sci
Ethics

[reading list]

Important dates

Mon Jan 20: paper topic preferences due
Every Wed: paper reviews due
Thurs Feb 6: hw1 due
Thurs Feb 20: hw2 due
Thurs Mar 6 (tentative): project proposals due
Thurs Mar 27: hw3 due
Thurs April 24: final project presentations
Fri May 2: final papers due