Predicting the location of
              “interactees”
              in novel human-object interactions
Chao-Yeh Chen and
            Kristen Grauman
            The University of Texas at Austin
          
Understanding
            images with people often entails understanding their
            interactions with other objects or people. As such, given a
            novel image, a vision system ought to infer which other
            objects/people play an important role in a  given
            person’s activity. However, while recent work learns about
            action-specific interactions (e.g., how the pose of a tennis
            player relates to the position of his racquet when serving
            the ball) for improved recognition, they are  not
            equipped to reason about novel interactions that contain
            actions or objects not observed in the training data. We
            propose an approach to predict the localization parameters
            for “interactee” objects in novel images. Having learned the
            generic, action independent connections between (1) a
            person’s pose, gaze, and scene cues and (2) the interactee
            object’s position and scale, our method estimates a
            probability distribution over likely places for an 
            interactee in novel images. The result is a human
            interaction-informed saliency metric, which we show is
            valuable for both improved object detection and image
            retargeting applications.
Problem: how to model human-object interaction
            
          
      
Human-object interactions critical to understand activities, but existing methods:
   
                       
                       
                       
                     
                  (1) Require pairwise training of
            class-specific models.
                       
                       
                       
                       
                    (2) Cannot reason
            about novel interactions.
        
        
Intuition
            Can we predict where is the object with which
          each person is interacting?
          Yes, humans can infer interactee’s position+size:
          - Without knowing what it is.
          - Rely on patterns in pose/gaze/scene layout.
Our goal
            
        
        Predict the
          localization parameters for “interactee” objects in novel
          images
          - Independent of the action/object.
          - Provide applications beyond recognition.
      
Approach Overview
            
        
        
Results
Datasets: Image with
            interactions from SUN and PASCAL action.
          
        
Accuracy of interactee localization
            
-
            Our MDN method accurately Infer the location of interactee.
            
          
              Human subject experiment
            
        
      
-
            Human performance provides upper bound for task.
          
Interactee-aware image retargeting
            
        
        - Use
          interactee prediction to preserve content related to both
          person and interactee.
      
Interactee-aware object detecot contextual priming.
              
            
        
- Use interactee prediction to guide the detector
          for where to expect an object.
        
Conclusion
        
(1) Our method predict where an interactee will
          appear, given cues from person’s pose and gaze. 
          (2) Our method predict interactees in an action/object type
          independent manner.
          (3) Provide applications for contextual object detection and
          image retargeting. 
        
Download
          - Paper,
              Supp
            - Data
            - Bibtex