Learning Object State Changes in Videos: An Open-World Perspective

CVPR, 2024


1 UT Austin   2 FAIR, Meta

We propose a novel open-world formulation of the video OSC problem.



We present HowToChange---the first open-world benchmark for video OSC localization.


  • HowToChange (Training): 36,075 videos, automatically collected by LLAMA2 and pseudo-labeled with VideoCLIP
  • HowToChange (Evaluation): 5,424 videos, fine-grained temporal OSC state annotations done by 30 professional annotators

We introduce VidOSC, a holistic open-world learning framework

Model predictions




Known OSCs

Novel OSCs

Video

A 9-minute video designed to supplement the paper

A shorter version (5-minute video)

BibTeX

@article{xue2023learning,
      title={Learning Object State Changes in Videos: An Open-World Perspective},
      author={Xue, Zihui and Ashutosh, Kumar and Grauman, Kristen},
      journal={CVPR},
      year={2024}
  }