Abstract

Generating collision-free motion in dynamic, partially observable environments is a fundamental challenge for robotic manipulators. Classical motion planners can compute globally optimal trajectories but require full environment knowledge and are typically too slow for dynamic scenes. Neural motion policies offer a promising alternative by operating in closed-loop directly on raw sensory inputs but often struggle to generalize in complex or dynamic settings. We propose Deep Reactive Policy (DRP), a visuo-motor neural motion policy designed for reactive motion generation in diverse dynamic environments, operating directly on point cloud sensory input. At its core is IMPACT, a transformer-based neural motion policy pretrained on 10 million generated expert trajectories across diverse simulation scenarios. We further improve IMPACT's static obstacle avoidance through iterative student-teacher finetuning. We additionally enhance the policy's dynamic obstacle avoidance at inference time using DCP-RMP, a locally reactive goal-proposal module. We evaluate DRP on challenging tasks featuring cluttered scenes, dynamic moving obstacles, and goal obstructions. DRP achieves strong generalization, outperforming prior classical and neural methods in success rate across both simulated and real-world settings.



All videos play at 1x speed


Results Highlights

Our policy operates on point cloud observations to reach desired goal poses, with goals visualized as RGB frame axes in the videos below.

Cabinet Rearrangement

Collaborative Cooking

Fridge Rearrangement

Drawer Rearrangement

Kitchen Cleanup

Safe Human-Robot Interaction

Garbage Cleanup

Kitchen Sink


Method Overview

Image description

Deep Reactive Policy (DRP) is a visuo-motor neural motion policy designed for dynamic, real-world environments. First, the locally reactive DCP-RMP module adjusts joint goals to handle fast-moving dynamic obstacles in the local scene. Then, IMPACT, a transformer-based closed-loop motion planning policy, takes as input the scene point cloud, the modified joint goal, and the current robot joint position to output action sequences for real-time execution on the robot.


Simulation Evaluations

We evaluate DRP on over 4000 environments across 5 different categories of tasks, featuring complex static scenes and dynamic obstacles.

Static Environments

Suddenly Appearing Obstacle

Goal Blocking

Dynamic Goal Blocking

Floating Dynamic Obstacle

Real World Evaluations

In addition to simulation, we evaluate DRP in real-world environments across the same five categories, comparing it to NeuralMP—a SOTA learning-based motion policy, and cuRobo—a SOTA optimization-based motion planner.

DRP on Static Environments

These scenarios feature challenging fixed obstacles, evaluating policies performance in predictable, unchanging settings.

Microwave

Tall Drawer

Front Cabinet

Side Cabinet

Slanted Shelf

Kitchen Shelf

Success Rate: DRP 90% | NeuralMP 30% | cuRobo-Voxels 60%

DRP on Suddenly Appearing Obstacle

Obstacles appear suddenly ahead of the robot, directly blocking its path and requiring dynamic trajectory adaptation. This tests the policy's ability to react to unexpected changes in the environment.

Cluttered — Large Blocker

Cluttered — Small Blocker

Tabletop — Large Blocker

Tabletop — Medium Blocker

Tabletop — Small Blocker

Success Rate: DRP 100% | NeuralMP 6.67% | cuRobo-Voxels 3.33%

DRP on Goal Blocking

The goal is temporarily obstructed by an obstacle, and the robot must approach as closely as possible without colliding.

Cluttered — Large Blocker

Cluttered — Small Blocker

Tabletop — Large Blocker

Tabletop — Medium Blocker

Tabletop — Small Blocker

Success Rate: DRP 92.86% | NeuralMP 0% | cuRobo-Voxels 0%

DRP on Dynamic Goal Blocking

Obstacles move randomly throughout the environment, challenging the robot's reactivity and its ability to avoid collisions in real time.

Cluttered — Side Blocker

Cluttered — Front Blocker

Tabletop — Large Blocker

Tabletop — Medium Blocker

Tabletop — Small Blocker

Success Rate: DRP 93.33% | NeuralMP 0% | cuRobo-Voxels 0%

DRP on Floating Dynamic Obstacle

Obstacles move randomly throughout the environment, challenging the robot's reactivity and its ability to avoid collisions in real time. In this task, we demonstrate DRP's ability to navigate dynamic environments — a capability absent in all prior baselines. Note: During all dynamic evaluations, testers blindfold themselves to avoid seeing the scene, ensuring an unbiased performance assessment.

DRP

NeuralMP

cuRobo-Voxels

Success Rate: DRP 70% | NeuralMP 0% | cuRobo-Voxels 0%

Floating Dynamic Obstacle

Obstacles move randomly throughout the environment, challenging the robot's reactivity and its ability to avoid collisions in real time. In this task, we demonstrate DRP's ability to navigate dynamic environments — a capability absent in all prior baselines. Note: During all dynamic evaluations, testers blindfold themselves to avoid seeing the scene, ensuring an unbiased performance assessment.

DRP

NeuralMP

cuRobo-Voxels

Success Rate: DRP 70% | NeuralMP 0% | cuRobo-Voxels 0%

DRP Applications

Language Conditioned Pick-and-Place

We use GroundedDINO+SAM to extract the object's point cloud based on the user-provided prompt. A grasp generation module then proposes a grasp pose. Finally, DRP navigates to the grasp pose while safely avoiding collisions, even in the presence of dynamic obstacles.

Collision-Free Teleoperation

The user teleoperates the robot using a space mouse, with goal configurations visualized in green. DRP tracks these goals while ensuring collision-free motion, even when the goal is obstructed by obstacles. This allows the user to control the robot without concern for potential collisions.

DRP Applications

Language Conditioned Pick-and-Place

We use GroundedDINO+SAM to extract the object's point cloud based on the user-provided prompt. A grasp generation module then proposes a grasp pose. Finally, DRP navigates to the grasp pose while safely avoiding collisions, even in the presence of dynamic obstacles.

Collision-Free Teleoperation

The user teleoperates the robot using a space mouse, with goal configurations visualized in green. DRP tracks these goals while ensuring collision-free motion, even when the goal is obstructed by obstacles. This allows the user to control the robot without concern for potential collisions.

DRP Failure Cases

The obstacle geometry is signifincantly outside of DRP's training distribution, hence resulting in minor collision.

Small goal-blocking obstacles are challenging to avoid. Nevertheless, DRP attempts to slow down the robot in response.

When dynamic obstacles are large and fast-moving, DRP can have reduced collision avoiding performance.

BibTeX


@article{yang2025deep,
  title={Deep Reactive Policy: Learning Reactive Manipulator Motion Planning for Dynamic Environments},
  author={Jiahui Yang and Jason Jingzhou Liu and Yulong Li and Youssef Khaky and Deepak Pathak},
  journal={9th Annual Conference on Robot Learning},
  year={2025},
}

Acknowledgements

We thank Murtaza Dalal, Ritvik Singh, Arthur Allshire, Tal Daniel, Zheyuan Hu, Mohan Kumar Srirama, and Ruslan Salakhutdinov for their valuable discussions on this work. We are grateful to Karl Van Wyk and Nathan Ratliff for contributing ideas and implementations of Geometric Fabrics used in this project. We also thank Murtaza Dalal for his feedback on the early ideations of this paper. In addition, we thank Andrew Wang, Tony Tao, Hengkai Pan, Tiffany Tse, Sheqi Zhang, and Sungjae Park for their assistance with experiments. This work is supported in part by ONR MURI N00014-22-1-2773, ONR MURI N00014-24-1-2748, and AFOSR FA9550-23-1-0747.