Smash and Spread! Teaching Robots to
Transform Objects via Spatial Progress

Preprint, 2025


UT Austin

Video

A 3-minute silent video supplementing the paper



Overview


The status quo in robotic manipulation today heavily focuses on rigid-body motion (e.g., pick-and-place, open-and-close, pour-and-rotate, etc). However, a wide range of real-world human manipulation involves object state changes—such as smashing or spreading —where an object’s visual state evolves gradually over time, often in an irreversible way.

We introduce a novel vision-based RL approach to capture these fine-grained, spatially-progressing transformations, successfully demonstrating how to guide real robot manipulation for this family of tasks.






SPARTA Framework



At each episode step, our policy takes the current and past SPOC visual-affordance (segmentation) maps as inputs, along with the robot arm's proprioception data and predicts a displacement action for the arm's end-effector. We train the policy using RL with a novel reward function that incentivizes the robot to keep transforming the actionable object regions as efficiently as possible. Using visual-affordance map inputs facilitate zero-shot transfer to novel objects of vastly different shapes, texture and color (e.g., tortilla or cheese vs.~bread) and novel tasks (e.g., smashing vs.~spreading).



Tasks




SPARTA is tested on two different object transformation tasks across 6 diverse real-world objects



Results









Cumulative Reward Curves




BibTeX

@inproceedings{mandikal2025sparta,
      title={Smash and Spread! Teaching Robots to Transform Objects via Spatial Progress},
      author={Mandikal, Priyanka and Hu, Jiaheng and Dass, Shivin and Majumder, Sagnik and Martin-Martin, Roberto and Grauman, Kristen},
      booktitle={ArXiv},
      year={2025}
  }