Contact-Aware Diffusion Sampling for RRT-Based Manipulation
Abstract
1. Introduction
- Toggle–subgoal prediction for contact-aware planning. We introduce a ResNet-based predictor that maps a single RGB image and the current robot state (joints and gripper mode) to a toggle configuration, its toggle mode, and an intermediate joint-space subgoal. This exposes grasp/release timing and provides task-aligned guidance for where the RRT tree should expand within each short planning phase on the way to the next contact event.
- Diffusion over joint-space extension segments. We learn a conditional Denoising Diffusion Probabilistic Model (DDPM) [6] that samples joint-space directions and step lengths for RRT tree extensions instead of absolute configurations. The sampler is conditioned on the scene image, current configuration, and predicted toggle/subgoal, and plugs into RRT/RRT-Connect without modifying nearest-neighbor search, steering, or collision checking.
- Mixture-based completeness with receding-horizon targets. We mix diffusion-based segment proposals with a nonzero fraction of uniform segments, preserving probabilistic completeness while biasing the search toward contact- and task-relevant regions. Using the predicted subgoal as the current target in each phase yields a receding-horizon planner that aligns local tree growth with task progress.
- Simulation evidence on contact-rich mug pick-and-place. For a contact-rich mug pick-and-place benchmark with multiple grasp/release events, our planner achieves higher success rates than imitation-only baselines and requires fewer RRT expansions than uniform/goal-biased RRT and prior learned samplers under identical collision checking and iteration limits.
2. Related Work
3. Preliminaries
3.1. System Model and Notation
- : generic state space (for SBMP discussion).
- , (obstacle and collision-free subsets).
- : joint-limit–feasible set (per-joint bounds enforced).
- : forward kinematics to end-effector pose (used only if task-space goals are considered).
- , (start configuration and terminal goal region in joint space).
- State and observations: the current joint configuration is and the current gripper mode is . The observation consists of an RGB image and .
- Toggle and subgoal: the upper network predicts a toggle configuration , a toggle mode , and a joint-space subgoal used as a phase-wise receding-horizon target between consecutive toggle events.
- Distance and tolerance: unless stated otherwise, the tree metric is a (possibly weighted) Euclidean distancewhere is a symmetric positive-definite (typically diagonal) weight matrix over joints.
3.2. Geometric Motion-Planning Problem
- Collision-free: for all .
- Feasible path: a collision-free path from to .
3.3. RRT Primitives
- : returns .
- : local extension from q toward with step size (or cap) .
- : true if the straight-line (or local steering) motion from q to lies in (checked continuously).
- Tree-extension segment: an ordered pair with unit direction () and step length , producing the proposalwhere enforces joint limits prior to collision checking along the induced local motion.
3.4. Sensing and Task Assumptions
- Observation: a scene image and the current robot state .
- Gripper mode: .
- Toggle state: a joint configuration at which the gripper should switch between modes for the task at hand (used as a contact-aware cue for sampling).
4. Proposed Method
4.1. Overview
4.2. Upper Network: Toggle–Subgoal Predictor
4.3. Lower Planner: Diffusion-Guided Segment Sampling
| Algorithm 1 Toggle–subgoal conditioned diffusion-guided RRT (phase-wise planning and execution) | |
| Require: Access to current RGB image (camera/rendering); initial robot state ; terminal region ; RRT iteration limit ; uniform-proposal probability ; goal tolerance ; upper network ; diffusion sampler | |
| 1: | |
| 2: while do | |
| 3: Obtain current RGB image | |
| 4: | |
▹ predict next toggle and subgoal for the current phase | |
| 5: Initialize tree with root : | |
| 6: | ▹ best node reached so far toward |
| 7: for to do | |
| 8: | |
| 9: if then | |
| 10: | |
| ▹ is the phase start state | |
| 11: | |
| 12: else | |
| 13: UniformSegment() | |
| 14: | |
| 15: end if | |
| 16: | |
| 17: if then | |
| 18: ; | |
| 19: if then | |
| 20: | |
| 21: end if | |
| 22: end if | |
| 23: if then | ▹ phase goal reached |
| 24: break | |
| 25: end if | |
| 26: end for | |
| 27: ExtractPath(T, qreach) | |
| 28: Smooth(pathphase) | |
| 29: Execute on the robot/simulator, toggling the gripper to when | |
| 30: Update from the final state of the executed trajectory | |
| 31: end while | |
| 32: return concatenation of all executed phase trajectories | |
4.4. Inference Procedure
- Initialize the current state as .
- While has not reached the terminal region :
- (a)
- Upper-level prediction (current phase). Acquire the current RGB image and query the upper network on the current observation to obtainwhere is a joint-space subgoal (typically near the next toggle) that serves as the target for the upcoming phase.
- (b)
- Diffusion-guided RRT/RRT-Connect. Initialize an RRT/RRT-Connect tree T with root and grow it toward for up to expansions. At each expansion, obtain a learned segment from the diffusion sampler conditioned on (and an optional tree summary), and with probability replace it with a UniformSegment(). Each proposal is converted to via (4) and accepted only if CollisionFree holds. The phase terminates when a node v in the tree satisfies or the iteration limit is reached.
- (c)
- Path extraction and execution. Extract a path from to the best-reached node near , apply shortcut smoothing and spline-based time-parameterization under joint limits, and validate the resulting trajectory with dense collision checks. Execute this trajectory on the robot (or in simulation), toggling the gripper to when the trajectory passes near . The final state of the executed trajectory becomes the new for the next phase.
- The overall motion is obtained by concatenating the phase trajectories until or a global time/iteration limit is exceeded.
5. Experimental Results
5.1. Simulation Setup and Data Collection
5.2. Qualitative Overview of the Proposed Method
5.3. Evaluation Protocol and Metrics
- Comparison 1 (Primary): Success rate. End-to-end imitation policies (no explicit search or collision checking) versus our search-based planner.
- Comparison 2 (Efficiency within the RRT family): normalized RRT expansions. RRT-style planners that share the same collision checker are compared by the number of expansions relative to ours, along with associated runtime metrics (per-proposal time and episode-level TTFS).
5.4. Comparison 1: Success Rate vs. Imitation Policies
5.5. Comparison 2: Normalized RRT Expansions Within the RRT Family
5.6. Qualitative Comparison of Learned Sample Distributions
5.7. Robustness to Viewpoint and Illumination
5.8. Limitations and Practical Considerations
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- LaValle, S.M.; Kuffner, J.J., Jr. Randomized kinodynamic planning. Int. J. Robot. Res. 2001, 20, 378–400. [Google Scholar] [CrossRef]
- Kuffner, J.J.; LaValle, S.M. RRT-connect: An efficient approach to single-query path planning. In Proceedings of the 2000 IEEE International Conference on Robotics and Automation (ICRA), San Francisco, CA, USA, 24–28 April 2000; Volume 2, pp. 995–1001. [Google Scholar]
- Karaman, S.; Frazzoli, E. Sampling-based algorithms for optimal motion planning. Int. J. Robot. Res. 2011, 30, 846–894. [Google Scholar] [CrossRef]
- Ichter, B.; Harrison, J.; Pavone, M. Learning sampling distributions for robot motion planning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 7087–7094. [Google Scholar]
- Qureshi, A.H.; Simeonov, A.; Bency, M.J.; Yip, M.C. Motion planning networks. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 2118–2124. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
- Gammell, J.D.; Srinivasa, S.S.; Barfoot, T.D. Informed RRT*: Optimal sampling-based path planning focused via direct sampling of an admissible ellipsoidal heuristic. In Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, USA, 14–18 September 2014; pp. 2997–3004. [Google Scholar]
- Huang, Z.; Chen, H.; Pohovey, J.; Driggs-Campbell, K. Neural informed rrt*: Learning-based path planning with point cloud state representations under admissible ellipsoidal constraints. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 8742–8748. [Google Scholar]
- Sohn, K.; Lee, H.; Yan, X. Learning structured output representation using deep conditional generative models. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar]
- Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar] [CrossRef]
- Chi, C.; Xu, Z.; Feng, S.; Cousineau, E.; Du, Y.; Burchfiel, B.; Tedrake, R.; Song, S. Diffusion policy: Visuomotor policy learning via action diffusion. Int. J. Robot. Res. 2025, 44, 1684–1704. [Google Scholar] [CrossRef]
- Janner, M.; Du, Y.; Tenenbaum, J.; Levine, S. Planning with Diffusion for Flexible Behavior Synthesis. In Proceedings of the International Conference on Machine Learning, Baltimore, MA, USA, 17–23 July 2022. [Google Scholar]
- Carvalho, J.; Le, A.T.; Baierl, M.; Koert, D.; Peters, J. Motion planning diffusion: Learning and planning of robot motions with diffusion models. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 1916–1923. [Google Scholar]
- Pan, C.; Yi, Z.; Shi, G.; Qu, G. Model-based diffusion for trajectory optimization. Adv. Neural Inf. Process. Syst. 2024, 37, 57914–57943. [Google Scholar]
- Mahler, J.; Liang, J.; Niyaz, S.; Laskey, M.; Doan, R.; Liu, X.; Ojea, J.A.; Goldberg, K. Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics. In Proceedings of the Robotics: Science and Systems (RSS), Cambridge, MA, USA, 12–14 July 2017. [Google Scholar]
- Ten Pas, A.; Gualtieri, M.; Saenko, K.; Platt, R. Grasp pose detection in point clouds. Int. J. Robot. Res. 2017, 36, 1455–1473. [Google Scholar] [CrossRef]
- Sundermeyer, M.; Mousavian, A.; Triebel, R.; Fox, D. Contact-graspnet: Efficient 6-dof grasp generation in cluttered scenes. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 13438–13444. [Google Scholar]
- Kaelbling, L.P.; Lozano-Pérez, T. Hierarchical task and motion planning in the now. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 1470–1477. [Google Scholar]
- Garrett, C.R.; Lozano-Pérez, T.; Kaelbling, L.P. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In Proceedings of the International Conference on Automated Planning and Scheduling, Nancy, France, 14–19 June 2020; Volume 30, pp. 440–448. [Google Scholar]
- Wirnshofer, F.; Schmitt, P.S.; Meister, P.; Wichert, G.v.; Burgard, W. State estimation in contact-rich manipulation. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 3790–3796. [Google Scholar]
- Beltran-Hernandez, C.C.; Petit, D.; Ramirez-Alpizar, I.G.; Nishi, T.; Kikuchi, S.; Matsubara, T.; Harada, K. Learning force control for contact-rich manipulation tasks with rigid position-controlled robots. IEEE Robot. Autom. Lett. 2020, 5, 5709–5716. [Google Scholar] [CrossRef]
- Zhou, Y.; Barnes, C.; Lu, J.; Yang, J.; Li, H. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5745–5753. [Google Scholar]
- Song, J.; Meng, C.; Ermon, S. Denoising Diffusion Implicit Models. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
- Mandlekar, A.; Xu, D.; Wong, J.; Nasiriany, S.; Wang, C.; Kulkarni, R.; Li, F.; Savarese, S.; Zhu, Y.; Martín-Martín, R. What Matters in Learning from Offline Human Demonstrations for Robot Manipulation. In Proceedings of the 5th Conference on Robot Learning (CoRL) PMLR, London, UK, 8–11 November 2021. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]





| Method | Success (%) |
|---|---|
| Diffusion Policy (IL) | 89.33 |
| LSTM–GMM (IL) | 69.67 |
| Transformer (IL) | 60.33 |
| Ours (Diffusion Sampler + Toggle/Subgoal) | 95.67 |
| RRT Family Method | Normalized RRT Expansions |
|---|---|
| Ours (Diffusion Sampler + Toggle/Subgoal) | 1.00 |
| Uniform RRT | 2.75 |
| Goal-biased RRT (0.5 to ) | 1.73 |
| CVAE Sampler + RRT | 1.36 |
| Cond. GAN Sampler + RRT | 1.67 |
| Method | Normalized Time |
|---|---|
| Ours (Diffusion, DDIM ) | 1.00 |
| Ours (Diffusion, DDIM ) | 0.42 |
| CVAE sampler | 0.09 |
| Conditional-GAN sampler | 0.11 |
| Method | Normalized TTFS (per Setting) | |
|---|---|---|
| Ours (Diffusion, DDIM ) | 1.00 | 1.00 |
| Ours (Diffusion, DDIM ) | 0.45 | 0.49 |
| CVAE sampler | 0.16 | 0.24 |
| Conditional-GAN sampler | 0.20 | 0.27 |
| Perturbation | (deg) | (m) | Success (%) | Toggle Err (Rad) | |
|---|---|---|---|---|---|
| Nominal | 0 | 0.00 | 0.000 | 96 | 0.0112 |
| Viewpoint (mild) | 5 | 0.01 | 0.000 | 88 | 0.0201 |
| Viewpoint (strong) | 10 | 0.02 | 0.000 | 82 | 0.0285 |
| Photometric | 0 | 0.00 | 0.020 | 90 | 0.0197 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, K.; Cho, K. Contact-Aware Diffusion Sampling for RRT-Based Manipulation. Electronics 2025, 14, 4837. https://doi.org/10.3390/electronics14244837
Lee K, Cho K. Contact-Aware Diffusion Sampling for RRT-Based Manipulation. Electronics. 2025; 14(24):4837. https://doi.org/10.3390/electronics14244837
Chicago/Turabian StyleLee, Kyoungho, and Kyunghoon Cho. 2025. "Contact-Aware Diffusion Sampling for RRT-Based Manipulation" Electronics 14, no. 24: 4837. https://doi.org/10.3390/electronics14244837
APA StyleLee, K., & Cho, K. (2025). Contact-Aware Diffusion Sampling for RRT-Based Manipulation. Electronics, 14(24), 4837. https://doi.org/10.3390/electronics14244837

