Sim2real for robotic manipulation is difficult due to the challenges of simulating complex contacts and generating realistic task distributions. To tackle the latter problem, we introduce ManipGen, which leverages a new class of policies for sim2real transfer: local policies. Locality enables a variety of appealing properties including invariances to absolute robot and object pose, skill ordering, and global scene configuration. We combine these policies with foundation models for vision, language and motion planning and demonstrate SOTA zero-shot performance of our method to Robosuite benchmark tasks in simulation (97%). We transfer our local policies from simulation to reality and observe they can solve unseen long-horizon manipulation tasks with up to 8 stages with significant pose, object and scene configuration variation. ManipGen outperforms SOTA approaches such as SayCan, OpenVLA, LLMTrajGen and VoxPoser across 50 real-world manipulation tasks by 36%, 76%, 62% and 60% respectively.
We present ManipGen, which consists of 3 main components. Left: Train 1000s of RL experts in simulation using PPO Middle: Distill single-task RL experts into generalist visuomotor policies via DAgger Right: Text-conditioned long-horizon manipulation via task decomposition (VLM), pose estimation and goal reaching (Motion Planning) and sim2real transfer of local policies
CabinetStore (4 stages): 80%
DrawerStore (6 stages): 60%
Cook (2 stages): 90%
Replace (4 stages): 80%
Tidy (8 stages): 60%
Shelf object manipulation
Picking pepper from clutter
Picking the screwdriver around the books
Placing the large sunscreen bottle in the drawer
Orange/Black clip
HDMI cable
Pliers
Scotch tape
Paper towel
Soft toy
Trashbag
Coffee and biscuit
Horizontal dishrack bars under bowl
Spirals on the stove
Picking from clutter and circular bowl surface
Horizontal bars under razor box
Pick
Place
Grasp Handle
Open
Close
UnidexGrasp (pick/place)
PartNet (grasp handle/open/close)
@article{dalal2024manipgen,
title={Local Policies Enable Zero-shot Long-horizon Manipulation},
author={Murtaza Dalal and Min Liu and Walter Talbott and Chen Chen and Deepak Pathak and Jian Zhang and Ruslan Salakhutdinov},
journal = {arXiv preprint arXiv:2410.22332},
year={2024},
}