Predicting the location of “interactees”
in novel human-object interactions

Chao-Yeh Chen and Kristen Grauman
The University of Texas at Austin


Understanding images with people often entails understanding their interactions with other objects or people. As such, given a novel image, a vision system ought to infer which other objects/people play an important role in a  given person’s activity. However, while recent work learns about action-specific interactions (e.g., how the pose of a tennis player relates to the position of his racquet when serving the ball) for improved recognition, they are  not equipped to reason about novel interactions that contain actions or objects not observed in the training data. We propose an approach to predict the localization parameters for “interactee” objects in novel images. Having learned the generic, action independent connections between (1) a person’s pose, gaze, and scene cues and (2) the interactee object’s position and scale, our method estimates a probability distribution over likely places for an  interactee in novel images. The result is a human interaction-informed saliency metric, which we show is valuable for both improved object detection and image retargeting applications.


Problem: how to model human-object interaction



                                          Human-object interactions critical to understand activities, but existing methods: 

                                                        (1) Require pairwise training of class-specific models.
                                                        (2) Cannot reason about novel interactions.



Intuition
Can we predict where is the object with which each person is interacting?


Yes, humans can infer interactee’s position+size:
- Without knowing what it is.
- Rely on patterns in pose/gaze/scene layout.


Our goal


Predict the localization parameters for “interactee” objects in novel images
- Independent of the action/object.
- Provide applications beyond recognition.


Approach Overview





Results

DatasetsImage with interactions from SUN and PASCAL action.


Accuracy of interactee localization

- Our MDN method accurately Infer the location of interactee.


Human subject experiment


- Human performance provides upper bound for task.

Interactee-aware image retargeting


- Use interactee prediction to preserve content related to both person and interactee.

Interactee-aware object detecot contextual priming.

- Use interactee prediction to guide the detector for where to expect an object.


Conclusion

(1) Our method predict where an interactee will appear, given cues from person’s pose and gaze.
(2) Our method predict interactees in an action/object type independent manner.
(3) Provide applications for contextual object detection and image retargeting.


Download
- Paper, Supp
- Data
- Bibtex