Switch-a-View: View Selection Learned from Unlabeled In-the-wild Videos

Switch-a-View: View Selection Learned from
Unlabeled In-the-wild Videos

Sagnik Majumder¹, Tushar Nagarajan¹, Ziad Al-Halah², Kristen Grauman¹

¹UT Austin,²U. Utah
Accepted to ICCV 2025

[arXiv]

We introduce Switch-a-View, a model that learns to automatically select the viewpoint to display at each timepoint when creating a how-to video. The key insight of our approach is how to train such a model from unlabeled--but human-edited--video samples. We pose a pretext task that pseudo-labels segments in the training videos for their primary viewpoint (egocentric or exocentric), and then discovers the patterns between the visual and spoken content in a how-to video on the one hand and its view-switch moments on the other hand. Armed with this predictor, our model can be applied to new multi-view video settings for orchestrating which viewpoint should be displayed when, even when such settings come with limited labels. We demonstrate our idea on a variety of real-world videos from HowTo100M and Ego-Exo4D, and rigorously validate its advantages.

Qualitative Results

Task and model description, prediction examples and failure cases.

Citation


@article{majumder2024switch,
  author       = {Sagnik Majumder and Tushar Nagarajan and Ziad Al-Halah and Kristen Grauman},
  title        = {Switch-a-View: Few-Shot View Selection Learned from Edited Videos},
  year         = {2024},
  eprint       = {arXiv:2412.18386},
  archivePrefix = {arXiv},
  primaryClass = {cs.CV},
}