Mash, Spread, Slice! Learning to Manipulate Object States via Visual Spatial Progress

Video

A 5-minute video (with audio) supplementing the paper

Overview

The dominant paradigm in robotic manipulation today heavily focuses on rigid-body motion (e.g., pick-and-place, open-and-close, pour-and-rotate, etc). However, a wide range of real-world human manipulation involves object state changes—such as mashing, spreading, or slicing —where an object’s physical and visual state evolve progressively over time, often in an irreversible way.

We introduce a unified vision-based approach to capture these fine-grained, spatially-progressing transformations, successfully demonstrating how to guide real robot manipulation for this family of tasks.

SPARTA Framework

At each episode step, our policy takes the current and past SPOC visual-affordance (segmentation) maps as inputs , along with the robot arm’s proprioception data and predicts a displacement action for the arm’s end-effector.

SPARTA supports two robot policy variants:

(a) SPARTA-L (Learning): a reinforcement learning agent trained using a dense reward that measures the progressive change of object regions from actionable (red) to transformed (green);

(b) SPARTA-G (Greedy): selects among 8 discrete directions based on the local density of actionable pixels, producing a fast, greedy policy guided by visual progress

Tasks

SPARTA is tested on two different object transformation tasks across 10 diverse real-world objects

Results

SPARTA decisively beats sparse and dense goal-conditioned baselines, with trained RL policies surpassing greedy control in complex, fine-precision tasks.

Reward Curves

Below we show reward curves for the bread-spreading task

a) Cumulative episode reward curves: SPARTA produces smooth, incremental rewards aligned with visual progress, while LIV rewards remain unstable throughout the episode, offering poor guidance.

b) Training curves: stable, dense feedback drives sample-efficient learning, with SPARTA rapidly improving while SPARSE and LIV stagnate.

BibTeX

@inproceedings{mandikal2025sparta,
      title={Mash, Spread, Slice! Learning to Manipulate Object States via Visual Spatial Progress},
      author={Mandikal, Priyanka and Hu, Jiaheng and Dass, Shivin and Majumder, Sagnik and Martín-Martín, Roberto and Grauman, Kristen},
      booktitle={ArXiv},
      year={2025}
  }

Mash, Spread, Slice! Learning to Manipulate
Object States via Visual Spatial Progress

Preprint, 2025

CoRL Beyond Rigid Worlds Workshop, 2025 (Best Poster Award)

Video

A 5-minute video (with audio) supplementing the paper

Overview

SPARTA Framework

Tasks

SPARTA is tested on two different object transformation tasks across 10 diverse real-world objects

Results

SPARTA decisively beats sparse and dense goal-conditioned baselines, with trained RL policies surpassing greedy control in complex, fine-precision tasks.

Reward Curves

Below we show reward curves for the bread-spreading task

BibTeX

Mash, Spread, Slice! Learning to ManipulateObject States via Visual Spatial Progress

Preprint, 2025

CoRL Beyond Rigid Worlds Workshop, 2025 (Best Poster Award)

Video

A 5-minute video (with audio) supplementing the paper

Overview

SPARTA Framework

Tasks

SPARTA is tested on two different object transformation tasks across 10 diverse real-world objects

Results

SPARTA decisively beats sparse and dense goal-conditioned baselines, with trained RL policies surpassing greedy control in complex, fine-precision tasks.

Reward Curves

Below we show reward curves for the bread-spreading task

BibTeX

Mash, Spread, Slice! Learning to Manipulate
Object States via Visual Spatial Progress