Rapidly Exploring Random Trees Reinforcement Learning (RRT-RL): A New Era in Training Sample Diversity
Abstract
1. Introduction
1.1. Motivation
1.2. Related Work
1.3. Contribution
- The component that bridged the gap from solving lane keeping to becoming a general method is the cosine-similarity-based distance metric. Consequently, this study introduces a novel perspective on the distance function defined over the set of samples, which is used by the algorithm during the construction of the RRT tree. In this innovative approach, a probability distribution is generated for each state from the Q-vector estimated by the DQN network using the softmax function, and the cosine distance (cosine similarity) between these distributions is considered the relevant metric.
- While the referenced article uses the softmax-Q vector only in the tree construction’s rollout phase to select the next action, we incorporate it into the nearest neighbor search phase as well, replacing the Euclidean distance.
- Even though the method is implemented for the DQN algorithm for the experiment’s sake, the concept does not have any component that relies on this specific algorithm. It can be used in any model-free RL method and utilizes a memory buffer for storing the training samples during the learning process.
2. Methodology
2.1. Reinforcement Learning
2.2. Deep Q Learning
2.3. Efficient Sample Tree Construction: RRT
2.4. Optimal Training Sample Collection with the RRT Algorithm
- The distance metric between two tree nodes (two points of the state space);
- The policy of the local planner.
| Algorithm 1 RRT-based sample collection adapted for this study. | 
| Input:  vector-valued function with softmax, MDP environment E, N no. of samples, K rollout steps Output: T tree of samples distance ← cosine similarity Initialize T with seed states from E while less than N samples in T do RandomState(E) distance() for K steps do Set state in E RandomSample() Execute action a, observe reward r and next state in E if episode is terminated then break Return T | 
2.5. Limitations
3. Description of Environments
3.1. Cartpole
3.2. Acrobot
3.3. Mountain Car
4. Experiment Details and Rationale
- The standard RL approach, where the training experience is gathered during trial-and-error, and the agent is also trained during the experience collection process (the circular train buffer contents change over time).
- Trial-and-error-based experience gathering, but the agent is not trained during this process, only afterwards, and thus works with a fixed pool of the required number of training samples (train buffer) that does not change in tandem with the progression of the agent training itself.
5. Results
5.1. State Space Coverage
5.2. Impact of Hyperparameters
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of go without human knowledge. Nature 2017, 550, 354–359. [Google Scholar] [CrossRef]
- Fawzi, A.; Balog, M.; Huang, A.; Hubert, T.; Romera-Paredes, B.; Barekatain, M.; Novikov, A.; R Ruiz, F.J.; Schrittwieser, J.; Swirszcz, G.; et al. Discovering faster matrix multiplication algorithms with Reinforcement Learning. Nature 2022, 610, 47–53. [Google Scholar] [CrossRef] [PubMed]
- Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef] [PubMed]
- Bai, Y.; Jones, A.; Ndousse, K.; Askell, A.; Chen, A.; DasSarma, N.; Drain, D.; Fort, S.; Ganguli, D.; Henighan, T.; et al. Training a helpful and harmless assistant with Reinforcement Learning from Human Feedback. arXiv 2022, arXiv:2204.05862. [Google Scholar]
- LaValle, S. Rapidly-Exploring Random Trees: A New Tool for Path Planning; Research Report 9811; Iowa State University: Ames, IA, USA, 1998. [Google Scholar]
- Bécsi, T. RRT-guided experience generation for Reinforcement Learning in autonomous lane keeping. Sci. Rep. 2024, 14, 24059. [Google Scholar] [CrossRef] [PubMed]
- Nasir, J.; Islam, F.; Ayaz, Y. Adaptive Rapidly-Exploring-Random-Tree-Star (RRT*) -Smart: Algorithm Characteristics and Behavior Analysis in Complex Environments. Asia-Pac. J. Inf. Technol. Multimed. 2013, 2, 39–51. [Google Scholar] [CrossRef]
- Balint, K.; Gergo, A.B.; Tamas, B. Deep Reinforcement Learning combined with RRT for trajectory tracking of autonomous vehicles. Transp. Res. Procedia 2024, 78, 246–253. [Google Scholar] [CrossRef]
- Hollenstein, J.J.; Renaudo, E.; Saveriano, M.; Piater, J. Improving the exploration of deep Reinforcement Learning in continuous domains using Planning for Policy Search. arXiv 2020, arXiv:2010.12974. [Google Scholar]
- Chiang, H.T.L.; Hsu, J.; Fiser, M.; Tapia, L.; Faust, A. RL-RRT: Kinodynamic Motion Planning via Learning Reachability Estimators from RL Policies. arXiv 2019, arXiv:1907.04799. [Google Scholar] [CrossRef]
- Faust, A.; Ramirez, O.; Fiser, M.; Oslund, K.; Francis, A.; Davidson, J.; Tapia, L. PRM-RL: Long-Range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-Based Planning. arXiv 2018, arXiv:1710.03937. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Gong, Z.; Zhong, P.; Hu, W. Diversity in Machine Learning. IEEE Access 2019, 7, 64323–64350. [Google Scholar] [CrossRef]
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized Experience Replay. arXiv 2016, arXiv:1511.05952. [Google Scholar]
- Lillicrap, T. Continuous control with deep Reinforcement Learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Lövétei, I.; Kővári, B.; Bécsi, T.; Aradi, S. Environment representations of railway infrastructure for Reinforcement Learning-based traffic control. Appl. Sci. 2022, 12, 4465. [Google Scholar] [CrossRef]
- Mihály, A.; Do, T.T.; Thinh, K.D.; Van Vinh, N.; Gáspár, P. Linear Parameter Varying and Reinforcement Learning Approaches for Trajectory Tracking Controller of Autonomous Vehicles. Period. Polytech. Transp. Eng. 2025, 53, 94–102. [Google Scholar] [CrossRef]
- Mankowitz, D.J.; Michi, A.; Zhernov, A.; Gelmi, M.; Selvi, M.; Paduraru, C.; Leurent, E.; Iqbal, S.; Lespiau, J.B.; Ahern, A.; et al. Faster sorting algorithms discovered using deep reinforcement learning. Nature 2023, 618, 257–263. [Google Scholar] [CrossRef]
- Vinyals, O.; Babuschkin, I.; Czarnecki, W.M.; Mathieu, M.; Dudzik, A.; Chung, J.; Choi, D.H.; Powell, R.; Ewalds, T.; Georgiev, P.; et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 2019, 575, 350–354. [Google Scholar] [CrossRef]
- Gutiérrez-Moreno, R.; Barea, R.; López-Guillén, E.; Araluce, J.; Bergasa, L.M. Reinforcement Learning-based autonomous driving at intersections in CARLA simulator. Sensors 2022, 22, 8373. [Google Scholar] [CrossRef]
- Vieira, M.; Vieira, M.A.; Galvão, G.; Louro, P.; Véstias, M.; Vieira, P. Enhancing urban intersection efficiency: Utilizing visible light communication and learning-driven control for improved traffic signal performance. Vehicles 2024, 6, 666–692. [Google Scholar] [CrossRef]
- Sivamayil, K.; Rajasekar, E.; Aljafari, B.; Nikolovski, S.; Vairavasundaram, S.; Vairavasundaram, I. A systematic study on Reinforcement Learning based applications. Energies 2023, 16, 1512. [Google Scholar] [CrossRef]
- Ando, T.; Iino, H.; Mori, H.; Torishima, R.; Takahashi, K.; Yamaguchi, S.; Okanohara, D.; Ogata, T. Learning-based collision-free planning on arbitrary optimization criteria in the latent space through cGANs. Adv. Robot. 2023, 37, 621–633. [Google Scholar] [CrossRef]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the ICML’20, 37th International Conference on Machine Learning, Virtual, 13–18 July 2020. [Google Scholar]
- Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv 2019, arXiv:1908.10084. [Google Scholar]
- Keogh, E.; Mueen, A. Curse of Dimensionality. In Encyclopedia of Machine Learning and Data Mining; Sammut, C., Webb, G.I., Eds.; Springer: Boston, MA, USA, 2017; pp. 314–315. [Google Scholar] [CrossRef]
- Barto, A.G.; Sutton, R.S.; Anderson, C.W. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. 1983, SMC-13, 834–846. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Moore, A.W. Efficient Memory-Based Learning for Robot Control; Technical report; University of Cambridge: Cambridge, UK, 1990. [Google Scholar]
- Christiano, P.; Leike, J.; Brown, T.; Martic, M.; Legg, S.; Amodei, D. Deep Reinforcement Learning from human preferences. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar] [CrossRef]
- Chen, M.; Tworek, J.; Jun, H.; Yuan, Q.; de Oliveira Pinto, H.P.; Kaplan, J.; Edwards, H.; Burda, Y.; Joseph, N.; Brockman, G.; et al. Evaluating Large Language Models Trained on Code. arXiv 2021, arXiv:2107.03374. [Google Scholar]




| Method | 1000 Samples | 5000 Samples | 10,000 Samples | 
|---|---|---|---|
| RRT-RL | 245 | 231 | 198 | 
| DQN | 309 | 282 | 234 | 
| DQN with fixed train buffer | 367 | 338 | 294 | 
| Method | Acrobot 10k | Acrobot 50k | Mountain Car 50k | Mountain Car 100k | 
|---|---|---|---|---|
| RRT-RL | ||||
| DQN | ||||
| DQN with fixed train buffer | 
| Environment | n = 1 | n = 5 | n = 10 | n = 20 | n = 40 | 
|---|---|---|---|---|---|
| Cartpole5k | ± | ± | ± | ||
| Acrobot10k | |||||
| Mountain Car50k | ± | ± | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Péter, I.; Kővári, B.; Bécsi, T. Rapidly Exploring Random Trees Reinforcement Learning (RRT-RL): A New Era in Training Sample Diversity. Electronics 2025, 14, 443. https://doi.org/10.3390/electronics14030443
Péter I, Kővári B, Bécsi T. Rapidly Exploring Random Trees Reinforcement Learning (RRT-RL): A New Era in Training Sample Diversity. Electronics. 2025; 14(3):443. https://doi.org/10.3390/electronics14030443
Chicago/Turabian StylePéter, István, Bálint Kővári, and Tamás Bécsi. 2025. "Rapidly Exploring Random Trees Reinforcement Learning (RRT-RL): A New Era in Training Sample Diversity" Electronics 14, no. 3: 443. https://doi.org/10.3390/electronics14030443
APA StylePéter, I., Kővári, B., & Bécsi, T. (2025). Rapidly Exploring Random Trees Reinforcement Learning (RRT-RL): A New Era in Training Sample Diversity. Electronics, 14(3), 443. https://doi.org/10.3390/electronics14030443
 
        
 
                                                


 
       