| Sagnik Majumder1, Tushar Nagarajan1, Ziad Al-Halah2, Kristen Grauman1 |
|
1UT Austin,2U. Utah Accepted to ICCV 2025 |
|
| We introduce Switch-a-View, a model that learns to automatically select the viewpoint to display at each timepoint when creating a how-to video. The key insight of our approach is how to train such a model from unlabeled--but human-edited--video samples. We pose a pretext task that pseudo-labels segments in the training videos for their primary viewpoint (egocentric or exocentric), and then discovers the patterns between those view-switch moments on the one hand and the visual and spoken content in the how-to video on the other hand. Armed with this predictor, our model then takes an unseen multi-view video as input and orchestrates which viewpoint should be displayed when, even when such settings come with limited labels. We demonstrate our idea on a variety of real-world videos from HowTo100M and Ego-Exo4D and rigorously validate its advantages. |
|
Task and model description, prediction examples and failure cases.
|
|
|
@article{majumder2024switch,
author = {Sagnik Majumder and Tushar Nagarajan and Ziad Al-Halah and Kristen Grauman},
title = {Switch-a-View: View Selection Learned from Unlabeled In-the-wild Videos},
year = {2024},
eprint = {arXiv:2412.18386},
archivePrefix = {arXiv},
primaryClass = {cs.CV},
}
|
| Copyright © 2024 University of Texas at Austin |