UT Egocentric (UT Ego) Dataset
[Download (1.4GB)]

Information

The Univ. of Texas at Austin Egocentric (UT Ego) Dataset contains 4 videos captured from head-mounted cameras.  Each video is about 3-5 hours long, captured in a natural, uncontrolled setting.

We used the Looxcie wearable camera, which captures video at 15 fps at 320 x 480 resolution.  Four subjects wore the camera for us: one undergraduate student, two graduate students, and one office worker.  The videos capture a variety of activities such as eating, shopping, attending a lecture, driving, and cooking.



Data

  • The UT Ego Dataset can be downloaded hereIt is 1.4 GB.  Note that the human faces in the videos are artificially blurred due to privacy reasons.
  • Ground-truth annotations for important regions can be downloaded here.

Evaluation


  • Important region prediction
Can be evaluated with provided ground-truth above.  Training/testing should be conducted in a leave-one-out fashion (i.e., train on 3 videos test on 1 remaining video).  A region whose overlap score (intersection over union) with any ground-truth region is greater than 0.5 should be considered as a true positive (i.e., important object)See "Important region prediction accuracy" in Sec. 4 of CVPR 2012 for guidance on prior studies. 

  • Summarization
Requires human subject studies for evaluation.  See "User studies to evaluate summaries" in Sec. 4 of CVPR 2012 and "Evaluating summary quality" in Sec. 4 of CVPR 2013 for guidance on prior studies.


Publications







Y. J. Lee, J. Ghosh, and K. Grauman. 
Discovering Important People and Objects for Egocentric Video Summarization
.  Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. [project page]








Z. Lu and K. Grauman.  
Story-Driven Summarization for Egocentric Video
.  Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013. [project page]


People