Predicting the location of
“interactees”
in novel human-object interactions
Chao-Yeh Chen and
Kristen Grauman
The University of Texas at Austin
Understanding
images with people often entails understanding their
interactions with other objects or people. As such, given a
novel image, a vision system ought to infer which other
objects/people play an important role in a given
person’s activity. However, while recent work learns about
action-specific interactions (e.g., how the pose of a tennis
player relates to the position of his racquet when serving
the ball) for improved recognition, they are not
equipped to reason about novel interactions that contain
actions or objects not observed in the training data. We
propose an approach to predict the localization parameters
for “interactee” objects in novel images. Having learned the
generic, action independent connections between (1) a
person’s pose, gaze, and scene cues and (2) the interactee
object’s position and scale, our method estimates a
probability distribution over likely places for an
interactee in novel images. The result is a human
interaction-informed saliency metric, which we show is
valuable for both improved object detection and image
retargeting applications.
Problem: how to model human-object interaction
Human-object interactions critical to understand activities, but existing methods:
(1) Require pairwise training of
class-specific models.
(2) Cannot reason
about novel interactions.
Intuition
Can we predict where is the object with which
each person is interacting?
Yes, humans can infer interactee’s position+size:
- Without knowing what it is.
- Rely on patterns in pose/gaze/scene layout.
Our goal
Predict the
localization parameters for “interactee” objects in novel
images
- Independent of the action/object.
- Provide applications beyond recognition.
Approach Overview
Results
Datasets: Image with
interactions from SUN and PASCAL action.
Accuracy of interactee localization
-
Our MDN method accurately Infer the location of interactee.
Human subject experiment
-
Human performance provides upper bound for task.
Interactee-aware image retargeting
- Use
interactee prediction to preserve content related to both
person and interactee.
Interactee-aware object detecot contextual priming.
- Use interactee prediction to guide the detector
for where to expect an object.
Conclusion
(1) Our method predict where an interactee will
appear, given cues from person’s pose and gaze.
(2) Our method predict interactees in an action/object type
independent manner.
(3) Provide applications for contextual object detection and
image retargeting.
Download
- Paper,
Supp
- Data
- Bibtex