Action and Anticipation for Visual Learning

Recorded videos: morning session, panel discussion (we forgot the afternoon session, sorry!).

In the only context in which we have observed intelligence, namely the biological context, the emergence of intelligence and superior visual abilities in different families of animals has each time been closely tied to the emergence of the ability to move and act in their environments [Moravec'84]. Cognitive scientists have also empirically verified that self-generated motions are critical to the development of visual perceptual skills in animals [Held'63], and a sizeable research program in cognitive science studies "embodied cognition" --- the hypothesis that cognition is strongly influenced by aspects of an agent's body beyond the brain itself [Wilson'02].

Progress in standard visual recognition tasks in the last few years has been largely fueled by access to today's largest painstakingly curated and hand-labeled datasets. Images sampled independently from the web are manually assigned to one of several categories by thousands of human workers to create these datasets. This cumbersome process may be replaceable. Specifically, an agent continuously acting, moving and monitoring its environment has available to it many avenues of knowledge, going well beyond what can be learned from observing only orderless, i.i.d. ``bags of images'' with category labels like today's standard datasets. For instance, such an agent may exploit ordered image sequences i.e., its observed video stream, with freely available image \leftrightarrow image or image \leftrightarrow other sensor relationships. Such an agent can use the observed results of those actions as a form of self-acquired supervision. For instance, it may tap an object to determine its material properties ("action"), walk around it to observe a less ambiguous view of it ("motion"), or learn natural world physics by dropping, pushing or throwing objects and learning to anticipate their behavior ("anticipation"). These forms of supervision may allow agents to discover knowledge not available through the standard supervised paradigm.

Moreover, alleviating the non-scalable curation and labeling requirements involved in compiling today's standard datasets is a worthwhile goal in itself, so that all or most visual learning may happen without manual supervision. Replacing manual supervision with alternative forms of supervision in this manner would have many advantages: (1) it would open up the possibility of exploiting much larger datasets for visual learning. This could potentially drive even better-performing computer vision systems for conventional tasks, since the evidence over the last few years has suggested that visual learning benefits from ever-higher capacity models trained on ever-larger datasets, (2) it would enable easy development of visual applications for more narrow, non-standard domains for which large labeled datasets neither currently exist, nor are likely to be curated in the future, such as say, vision for an inter-planetary rover, and (3) compared to standard supervised learning, it more closely resembles what we know about the avenues of learning available to the biological visual systems we hope to ultimately emulate in performance, even if not in design.

In this workshop, we aim to focus on how action, motion and anticipation may all offer viable and important means for visual learning. Several closely intertwined emerging research directions touching on our theme are being concurrently and largely independently explored by researchers in the vision, machine learning, and robotics communities around the world. A major goal of our workshop will be to bring these researchers together, provide a forum to foster collaborations and exchange of ideas, and ultimately, to help advance research along these directions.

Call for Abstracts

We had invited 2-page abstracts describing relevant work that has been recently published, is in progress, or is to be presented at ECCV. Review of the submissions was double blind. While there will be no formal proceedings, accepted abstracts are posted here. Authors of accepted abstracts will present their work in a poster session at the workshop, and as short spotlight talks. We encouraged submissions from not only the vision community, but also from machine learning, robotics, cognitive science and other related disciplines. A representative, but not exhaustive list of topics of interest is shown below:

Learning from observer motions accompanying video
Models of dynamics of object trajectories/pose changes in video
Unsupervised representation learning from video
Predictive/anticipatory models of future video frames given past frames
Learning from other sensor streams accompanying video
Generative modeling of unobserved object/scene views conditional on observed views
Reinforcement learning approaches for robotic control from pixels
Temporal coherence modeling for learning from video sequences
Sensorimotor learning with visual sensors
Embodied cognition and learning applied to machines
Inferring/modeling physics and physical properties in video
Active vision

The LaTeX template for submission is posted here. Abstracts must be no longer than 2 pages (including references), and submitted on or before August 3, 11.59 p.m. US Central Time. Submission will be via the workshop CMT. (Submission is now closed.)

Important Dates

August 3	Abstracts due (closed)
August 19	Reviews due
August 29	Decision notifications to authors
September 12	Final abstracts due
October 9 (tentatively 9 a.m. to 5 p.m.)	Workshop date

Speakers

We have invited researchers from across different disciplines to bring their perspectives to the workshop. Here is our speaker list:


Abhinav Gupta CMU	Jitendra Malik UC Berkeley	Honglak Lee U Michigan

Ali Farhadi UW	Jeanette Bohg MPI Tübingen	Nando de Freitas Oxford

Program

Recorded videos: morning session, panel discussion (we forgot the afternoon session, sorry!).

The workshop will be held at Oudemanhuispoort. Here is our program.

09.00 a.m.	Opening remarks
09.15 a.m.	Invited Speaker 1: Jitendra Malik: "Acquiring mental models through perception, simulation and action."
09.45 a.m.	Invited Speaker 2: Nando de Freitas: "Make learning off-policy again! Sample efficient actor-critic with experience replay."
10.15 a.m.	Invited Speaker 3: Jeanette Bohg: "Interactive Perception for Perceptive Manipulation - or, how putting perception on a physical system changes everything."
10.45 a.m.	Coffee break
11.00 a.m.	Poster spotlights
11.45 a.m.	Lunch break
01.00 p.m.	Poster session
02.00 p.m.	Invited Speaker 4: Abhinav Gupta: "Scaling Self-supervision: From one task, one robot to multiple tasks and robots "
02.30 p.m.	Invited Speaker 5: Honglak Lee: "Learning Disentangled Representations for Prediction and Anticipation"
03.00 p.m.	Invited Speaker 6: Ali Farhadi: "Towards Crowifying Vision"
03.30 p.m.	Coffee break
03.45 p.m.	Panel discussion with invited speakers

Accepted Papers

The following papers will be presented at the workshop as spotlights and posters:

Mohamed Daoudi, Maxime Devanne, Francois Quesque, Yann Coello. Computational Modeling of Human Social Intention.
Carolina Redondo-Cabrera, Roberto Lopez-Sastre. Unsupervised Feature Learning from Videos for Discovering and Recognizing Actions.
Bert De Brabandere, Xu Jia, Tinne Tuytelaars, Luc Van Gool. Dynamic Filter Networks for Predicting Unobserved Views.
Timo Luddecke, Florentin Worgotter. Scene Affordance: Inferring Actions from Household Scenes.
Senthil Purushwalkam, Abhinav Gupta. Pose from Action: Unsupervised Learning of Pose Features based on Motion.
Georgios Mastorakis, Xavier Hildenbrand, Kevin Grand, Dimitrios Makris. Customisable Fall Detection: A hybrid approach using physics based simulation and machine learning.
Sven Bambach, David Crandall, Linda Smith, Chen Yu. Active Vision: Learning Visual Objects through Egocentric Views of Children and Parents.
Tomas Petricek, Vojtech Salansky, Karel Zimmermann, Tomas Svoboda. Simultaneous Exploration and Segmentation with Incomplete Data.
Ilker Yildirim, Jiajun Wu, Yilun Du, Joshua Tenenbaum. Interpreting Dynamic Scenes by a Physics Engine and Bottom-Up Visual Cues.
Jacob Walker, Carl Doersch, Abhinav Gupta, Martial Hebert. An Uncertain Future: Forecasting from Static Images using Variational Autoencoders.
Tianfan Xue, Jiajun Wu, Katherine Bouman, Bill Freeman. Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks.
Rui Shu, James Brofos, Frank Zhang, Mohammad Ghavamzadeh, Hung Bui, Mykel Kochenderfer. Stochastic Video Prediction with Conditional Density Estimation.
Atabak Dehban, Lorenzo Jamone, Adam Kampff, Jose Santos-Victor. A Moderately Large Size Dataset to Learn Visual Affordances of Objects and Tools Using iCub Humanoid Robot.
Jeff Donahue, Philipp Krahenbuhl, Trevor Darrell. Adversarial Feature Learning.
Roeland de Geest, Efstratios Gavves, Amir Ghodrati, Zheynang Li, Cees Snoek, Tinne Tuytelaars. Online Action Detection.

Program committee

We thank our program committee for kindly donating their time and expertise towards reviewing workshop submissions:

Andrew Owens, MIT	Aron Monszpart, UCL	Bharath Sankaran, USC
Carl Vondrick, MIT	Chelsea Finn, UC Berkeley	Cornelia Fermuller, UMD
David Vernon, University of Genoa	Honglak Lee, University of Michigan	Jiajun Wu, MIT
John Tsotsos, York University	Katerina Fragkiadaki, Google Research	Lerrel Pinto, CMU
Lorenzo Natale, Italian Institute of Technology	Pulkit Agrawal, UC Berkeley	Raia Hadsell, Google Deepmind
Ruzena Bajcsy, UC Berkeley	Swapnaa Jayaraman, Indiana University	Xiaolong Wang, CMU
Yiannis Aloimonos, UMD	Serena Yeung, Stanford	Jeff Donahue, UC Berkeley
Omar Florez, Intel Labs

Organizers


Dinesh Jayaraman UT Austin	Kristen Grauman UT Austin	Sergey Levine UW

1st Workshop on Action and Anticipation for Visual Learning

In conjunction with ECCV 2016, Amsterdam. Date: Sunday, Oct 9. Venue: Oudemanhuispoort, UvA.

Themes

Call for Abstracts

Important Dates

Speakers

Program

Accepted Papers

Program committee

Organizers

Sponsors

Call for Abstracts

Important Dates

Speakers

Program

Accepted Papers

Program committee

Organizers

Sponsors