Generating collision-free motion in dynamic, partially observable environments is a fundamental challenge for robotic manipulators. Classical motion planners can compute globally optimal trajectories but require full environment knowledge and are typically too slow for dynamic scenes. Neural motion policies offer a promising alternative by operating in closed-loop directly on raw sensory inputs but often struggle to generalize in complex or dynamic settings. We propose Deep Reactive Policy (DRP), a visuo-motor neural motion policy designed for reactive motion generation in diverse dynamic environments, operating directly on point cloud sensory input. At its core is IMPACT, a transformer-based neural motion policy pretrained on 10 million generated expert trajectories across diverse simulation scenarios. We further improve IMPACT's static obstacle avoidance through iterative student-teacher finetuning. We additionally enhance the policy's dynamic obstacle avoidance at inference time using DCP-RMP, a locally reactive goal-proposal module. We evaluate DRP on challenging tasks featuring cluttered scenes, dynamic moving obstacles, and goal obstructions. DRP achieves strong generalization, outperforming prior classical and neural methods in success rate across both simulated and real-world settings.
Cabinet Rearrangement
Collaborative Cooking
Fridge Rearrangement
Drawer Rearrangement
Kitchen Cleanup
Safe Human-Robot Interaction
Garbage Cleanup
Kitchen Sink
Deep Reactive Policy (DRP) is a visuo-motor neural motion policy designed for dynamic, real-world environments. First, the locally reactive DCP-RMP module adjusts joint goals to handle fast-moving dynamic obstacles in the local scene. Then, IMPACT, a transformer-based closed-loop motion planning policy, takes as input the scene point cloud, the modified joint goal, and the current robot joint position to output action sequences for real-time execution on the robot.
Static Environments
Suddenly Appearing Obstacle
Goal Blocking
Dynamic Goal Blocking
Floating Dynamic Obstacle
Microwave
Tall Drawer
Front Cabinet
Side Cabinet
Slanted Shelf
Kitchen Shelf
Success Rate: DRP 90% | NeuralMP 30% | cuRobo-Voxels 60%
Cluttered — Large Blocker
Cluttered — Small Blocker
Tabletop — Large Blocker
Tabletop — Medium Blocker
Tabletop — Small Blocker
Success Rate: DRP 100% | NeuralMP 6.67% | cuRobo-Voxels 3.33%
Cluttered — Large Blocker
Cluttered — Small Blocker
Tabletop — Large Blocker
Tabletop — Medium Blocker
Tabletop — Small Blocker
Success Rate: DRP 92.86% | NeuralMP 0% | cuRobo-Voxels 0%
Cluttered — Side Blocker
Cluttered — Front Blocker
Tabletop — Large Blocker
Tabletop — Medium Blocker
Tabletop — Small Blocker
Success Rate: DRP 93.33% | NeuralMP 0% | cuRobo-Voxels 0%
DRP
NeuralMP
cuRobo-Voxels
Success Rate: DRP 70% | NeuralMP 0% | cuRobo-Voxels 0%
DRP
NeuralMP
cuRobo-Voxels
Success Rate: DRP 70% | NeuralMP 0% | cuRobo-Voxels 0%
Language Conditioned Pick-and-Place
We use GroundedDINO+SAM to extract the object's point cloud based on the user-provided prompt. A grasp generation module then proposes a grasp pose. Finally, DRP navigates to the grasp pose while safely avoiding collisions, even in the presence of dynamic obstacles.
Collision-Free Teleoperation
The user teleoperates the robot using a space mouse, with goal configurations visualized in green. DRP tracks these goals while ensuring collision-free motion, even when the goal is obstructed by obstacles. This allows the user to control the robot without concern for potential collisions.
Language Conditioned Pick-and-Place
We use GroundedDINO+SAM to extract the object's point cloud based on the user-provided prompt. A grasp generation module then proposes a grasp pose. Finally, DRP navigates to the grasp pose while safely avoiding collisions, even in the presence of dynamic obstacles.
Collision-Free Teleoperation
The user teleoperates the robot using a space mouse, with goal configurations visualized in green. DRP tracks these goals while ensuring collision-free motion, even when the goal is obstructed by obstacles. This allows the user to control the robot without concern for potential collisions.
The obstacle geometry is signifincantly outside of DRP's training distribution, hence resulting in minor collision.
Small goal-blocking obstacles are challenging to avoid. Nevertheless, DRP attempts to slow down the robot in response.
When dynamic obstacles are large and fast-moving, DRP can have reduced collision avoiding performance.
@article{yang2025deep,
title={Deep Reactive Policy: Learning Reactive Manipulator Motion Planning for Dynamic Environments},
author={Jiahui Yang and Jason Jingzhou Liu and Yulong Li and Youssef Khaky and Deepak Pathak},
journal={9th Annual Conference on Robot Learning},
year={2025},
}
Website borrowed from NeRFies and UMI on Legs under a Creative Commons Attribution-ShareAlike 4.0 International License.