Deep Reactive Policy

Learning Reactive Manipulator Motion Planning
for Dynamic Environments

Jiahui Yang* Jason Jingzhou Liu*
Yulong Li Youssef Khaky Kenneth Shaw Deepak Pathak

*Equal Contribution

Carnegie Mellon University

CoRL 2025

Abstract

Generating collision-free motion in dynamic, partially observable environments is a fundamental challenge for robotic manipulators. Classical motion planners can compute globally optimal trajectories but require full environment knowledge and are typically too slow for dynamic scenes. Neural motion policies offer a promising alternative by operating in closed-loop directly on raw sensory inputs but often struggle to generalize in complex or dynamic settings. We propose Deep Reactive Policy (DRP), a visuo-motor neural motion policy designed for reactive motion generation in diverse dynamic environments, operating directly on point cloud sensory input. At its core is IMPACT, a transformer-based neural motion policy pretrained on 10 million generated expert trajectories across diverse simulation scenarios. We further improve IMPACT's static obstacle avoidance through iterative student-teacher finetuning. We additionally enhance the policy's dynamic obstacle avoidance at inference time using DCP-RMP, a locally reactive goal-proposal module. We evaluate DRP on challenging tasks featuring cluttered scenes, dynamic moving obstacles, and goal obstructions. DRP achieves strong generalization, outperforming prior classical and neural methods in success rate across both simulated and real-world settings.

All videos play at 1x speed

Results Highlights

Our policy operates on point cloud observations to reach desired goal poses, with goals visualized as RGB frame axes in the videos below.

Cabinet Rearrangement

Collaborative Cooking

Fridge Rearrangement

Drawer Rearrangement

Kitchen Cleanup

Safe Human-Robot Interaction

Garbage Cleanup

Kitchen Sink

Method Overview

Deep Reactive Policy (DRP) is a visuo-motor neural motion policy designed for dynamic, real-world environments. First, the locally reactive DCP-RMP module adjusts joint goals to handle fast-moving dynamic obstacles in the local scene. Then, IMPACT, a transformer-based closed-loop motion planning policy, takes as input the scene point cloud, the modified joint goal, and the current robot joint position to output action sequences for real-time execution on the robot.

Simulation Evaluations

We evaluate DRP on over 4000 environments across 5 different categories of tasks, featuring complex static scenes and dynamic obstacles.

Static Environments

Suddenly Appearing Obstacle

Goal Blocking

Dynamic Goal Blocking

Floating Dynamic Obstacle

DRP on Static Environments

These scenarios feature challenging fixed obstacles, evaluating policies performance in predictable, unchanging settings.

Shelf

Dishwasher

Cubby

Box

Cage

Hybrid

Microwave

wallcabinet

Success Rate: DRP 84.6% | NeuralMP 50.59% | cuRobo 82.97%

DRP on Suddenly Appearing Obstacle

Obstacles appear suddenly ahead of the robot, directly blocking its path and requiring dynamic trajectory adaptation. This tests the policy's ability to react to unexpected changes in the environment.

Level 1

Level 2

Level 3

Success Rate: DRP 86% | NeuralMP 33.16% | cuRobo 59%

DRP on Goal Blocking

The goal is temporarily obstructed by an obstacle, and the robot must approach as closely as possible without colliding.

Level 1

Level 2

Success Rate: DRP 66.67% | NeuralMP 0% | cuRobo 0%

DRP on Dynamic Goal Blocking

After reaching the goal, the robot encounters a moving obstacle and must avoid it before safely returning to the goal, testing its ability to remain reactive even after task completion.

Level 1

Level 2

Level 3

Level 4

Success Rate: DRP 65.25% | NeuralMP 0.25% | cuRobo 3%

DRP on Floating Dynamic Obstacle

Obstacles move randomly throughout the environment, challenging the robot's reactivity and its ability to avoid collisions in real time.

Level 1

Level 2

Success Rate: DRP 75.5% | NeuralMP 19% | cuRobo 39.5%