Application of an Improved Double Q-Learning Algorithm in Ground Mobile Robots
Abstract
Highlights
- An improved algorithm was developed for unmanned ground robots, which has stronger anti-interference and noise resistance capabilities.
- The algorithm can quickly find the shortest path and precisely avoid all obstacles.
- The algorithm has excellent application effects in complex environments.
- It can enhance the ability of unmanned ground robots to perform tasks and improve efficiency.
Abstract
1. Introduction
- (1)
- Analyze the limitations of existing path planning algorithms, and introduce a competitive network and curiosity mechanism to break through the limitations.
- (2)
- The main idea of the proposed method is to combine it with the competitive network and the curiosity module to improve its stability and path planning ability, and to solve the problem of convergence too early and too late.
- (3)
- Establish a suitable environment model for the smooth progress of the simulation experiment and provide a wide range of comparison results to prove the advantages of the algorithm. In addition, Sparrow Search Algorithm (SSA), Dung Beetle Optimization Algorithm (DBO), and Particle Swarm Optimization Algorithm (PSO) were simulated in the same environment to verify the feasibility of the algorithm.
2. Construction of Double Q Algorithm
2.1. Double Q-Learning
2.2. Dueling Network
2.2.1. Optimal Advantage Function
2.2.2. Dueling Network Architecture
3. Path Planning Combined with Intrinsic Rewards
3.1. Reinforcement Learning Exploration
3.1.1. Classical Exploration Methods
3.1.2. Exploration Based on Internal Rewards
3.2. The Curiosity Mechanism of Self-Monitoring Prediction
- Things that can be controlled by the GMR.
- Things that the GMR fails to control but may have an impact on.
- Things that the GMR fails to control and has no effect on.
# Initialize the forward model , the strategy network , and the reverse prediction model . (optical) initialize_forward_model() initialize_policy_network() initialize_inverse_model() # optical initialize_feature_encoder() for episode in range(num_episodes): = env.reset() for in range(max_steps_per_episode): # 1. Select action according to current state (use policy ) = .select_action() # 2. Perform action , get the next state and external rewards , , done, _ = env.step() # 3. Feature extraction (for forward prediction and reward evaluation) = feature_encoder() = feature_encoder() # 4. The forward model predicts the next state feature _pred = forward_model.predict(, ) # 5. Intrinsic reward calculation (based on prediction error or access frequency) prediction_error = compute_error(_pred, ) visit_bonus = 1/sqrt(state_visit_count[]) = prediction_error + visit_bonus # 6. Calculate total rewards R = + + exploration_reward + path_reward # Weighted fusion # 7. Training Strategy Network .update(, , R, ) # 8. Update the forward model (supervised learning) forward_model.update(, , ) # Optional: train the reverse action prediction model (Inverse model) inverse_model.update(, , ) # State update = if done: break |
4. Algorithm Simulation and Verification
4.1. Environment Model Building
4.2. Simulation Parameter Setting
4.3. Simulation Experiment
5. Conclusions
- (1)
- After introducing the curiosity mechanism and training, the GMR obtains a higher positive reward. By using the internal reward to guide GMR exploration, the reward setting is optimized, priority experience playback is added to improve the training efficiency, and the navigation accuracy and stability in the complex environment are improved.
- (2)
- The simulation experiment of the grid environment shows that the exploration mechanism of reinforcement learning is improved after adding curiosity. It not only successfully explores the unknown environment and successfully reaches the end of the environment model, but also fully illustrates that the GMR solves the convergence problem of double Q-learning path planning in a complex environment after combining a curiosity mechanism. The GMR successfully finds the shortest path to reach the end point in the environment and obtains higher positive rewards. In the 20 × 20, 25 × 25, and 30 × 30 grid maps, the path distance of the improved algorithm is shorter than that of the SSA, DBO, and PSO algorithms, and the maximum shortening ranges are 6.53%, 11.42%, and 18.07%, respectively. In the 25 × 25 environment, the performance of the improved algorithm is better than that of the traditional DQN algorithm, A* algorithm, and Dijkstra algorithm, and the maximum shortening is 12%, 5.05%, and 7.2%, respectively. The results show that the improved algorithm, combined with the curiosity module, performs better in a larger environment with more obstacles. In different environments, although the overall reward fluctuates, it is still positive in the end.
- (3)
- Combined with the above simulation experiments and data, it is shown that the algorithm proposed in this paper has stronger noise suppression ability, stronger stability, and higher accuracy than conventional PSO, SSA, DBO, Dijkstra, , and other algorithms, which are conducive to the exploration of GMRs in unknown environments.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
SSA | Sparrow Search Algorithm |
DBO | Dung Beetle Optimization |
PSO | Particle Swarm Optimization |
GMR | Ground mobile robot |
RRT | Rapidly-exploring Random Trees |
DQN | Deep Q-network |
ICM | Intrinsic curiosity module |
UCB | Upper confidence bound |
References
- Wang, H.; Liu, J.; Dong, H.; Shao, Z. A Survey of the Multi-Sensor Fusion Object Detection Task in Autonomous Driving. Sensors 2025, 25, 2794. [Google Scholar] [CrossRef] [PubMed]
- Carberry, S. Land Mass: Ground Robot Swarm Tech Crawling Along. Natl. Def. 2024, 109. [Google Scholar]
- Ou, Y.; Cai, Y.; Sun, Y.; Qin, T. Correction: Ou et al. Autonomous Navigation by Mobile Robot with Sensor Fusion Based on Deep Reinforcement Learning. Sensors 2024, 24, 3895. [Google Scholar] [CrossRef]
- Yan, W.; Xu, X.; Rodić, A.; Petrovich, P.B. FRRT*-Connect: A Bidirectional Sampling-Based Path Planner with Potential Field Guidance for Complex Obstacle Environments. Sensors 2025, 25, 2761. [Google Scholar] [CrossRef]
- Wolf, Y.; Levy, E.; Rotbart, M. Ground Robot Drive System: US15450445. US20170174278A1, 20 March 2025. [Google Scholar]
- Hong, F.; Zhao, Y.; Ji, W.; Hao, J.; Fang, F.; Liu, J. A dynamic migration route planning optimization strategy based on real-time energy state observation considering flexibility and energy efficiency of thermal power unit. Appl. Energy 2025, 377, 124575. [Google Scholar] [CrossRef]
- Cao, M.; Xu, X.; Cao, K.; Xie, L. System identification and control of the ground operation mode of a hybrid aerial–ground robot. Control Theory Technol. 2023, 21, 458–468. [Google Scholar] [CrossRef]
- Parkhomenko, V.; Medvedev, M. Nerual Network System For Ground Robot Path Planning and Obstacle Avoidance. In Proceedings of the 2021 7th International Conference on Mechatronics and Robotics Engineering (ICMRE), Budapest, Hungary, 3–5 February 2021. [Google Scholar] [CrossRef]
- Tavares, A.J.A., Jr.; Oliveira, N.M.F. A Novel Approach for Kalman Filter Tuning for Direct and Indirect Inertial Navigation System/Global Navigation Satellite System Integration. Sensors 2024, 24, 7331. [Google Scholar] [CrossRef]
- Bansal, T.; Anand, S. Probabilistic Roadmap Generation for Autonomous Robot Path Planning in Dynamic Environments. In International Conference on Mechanical and Energy Technologies; Springer: Singapore, 2024. [Google Scholar] [CrossRef]
- Altun, G.; Aydin, L. Optimizing Unmanned Vehicle Navigation: A Hybrid PSO-GWO Algorithm for Efficient Route Planning. Firat Univ. J. Exp. Comput. Eng. FUJECE 2025, 4, 100–114. [Google Scholar] [CrossRef]
- Xue, Q.; Zheng, S.-T.; Han, X.; Jiang, R. A two-level framework for dynamic route planning and trajectory optimization of connected and automated vehicles in road networks. Phys. A Stat. Mech. Its Appl. 2025, 668, 130552. [Google Scholar] [CrossRef]
- Chen, M.Z.; Zhu, D.Q. Optimal Time-Consuming Path Planning for Autonomous Underwater Vehicles Based on a Dynamic Neural Network Model in Ocean Current Environments. IEEE Trans. Veh. Technol. 2020, 69, 14401–14412. [Google Scholar] [CrossRef]
- AbuJabal, N.; Rabie, T.; Baziyad, M.; Kamel, I.; Almazrouei, K. Path Planning Techniques for Real-Time Multi-Robot Systems: A Systematic Review. Electronics 2024, 13, 2239. [Google Scholar] [CrossRef]
- Yang, Z.; Zhuang, Y.; Chen, Y. Path planning of a building robot based on BIM and an improved RRT algorithm. Exp. Technol. Manag. 2024, 41, 31–42. [Google Scholar] [CrossRef]
- Sahoo, B.; Das, D.; Pujhari, K.C.; Vikas. Optimization of route planning for the mobile robot using a hybrid Neuro-IWO technique. Int. J. Inf. Technol. 2025, 17, 1431–1439. [Google Scholar] [CrossRef]
- Fan, J.; Zhang, X.; Zou, Y.; Li, Y.; Liu, Y.; Sun, W. Improving policy training for autonomous driving through randomized ensembled double Q-learning with Transformer encoder feature evaluation. Appl. Soft Comput. 2024, 167, 112386. [Google Scholar] [CrossRef]
- Liao, S.; Xiao, W.; Wang, Y. Optimization of route planning based on active towed array sonar for underwater search and rescue. Ocean. Eng. 2025, 330, 121249. [Google Scholar] [CrossRef]
- Moras, J. Continuous Online Semantic Implicit Representation for Autonomous Ground Robot Navigation in Unstructured Environments. Robotics 2024, 13, 108. [Google Scholar] [CrossRef]
- Pan, Y.; Li, L.; Qin, J.; Chen, J.; Gardoni, P. Unmanned aerial vehicle–human collaboration route planning for intelligent infrastructure inspection. Comput.-Aided Civ. Infrastruct. Eng. 2024, 39, 2074–2104. [Google Scholar] [CrossRef]
- Tai, L. Sensorimotor Learning for Ground Robot Navigation. Ph.D. Thesis, The Hong Kong University of Science and Technology, Hong Kong, China, 2019. [Google Scholar] [CrossRef]
- Saga, R.; Kozono, R.; Tsurumi, Y.; Nihei, Y. Deep-reinforcement learning-based route planning with obstacle avoidance for autonomous vessels. Artif. Life Robot. 2024, 29, 136–144. [Google Scholar] [CrossRef]
- Yin, Y.; Zhang, L.; Shi, X.; Wang, Y.; Peng, J.; Zou, J. Improved Double Deep Q Network Algorithm Based on Average Q-Value Estimation and Reward Redistribution for Robot Path Planning. Comput. Mater. Contin. 2024, 81, 2769–2790. [Google Scholar] [CrossRef]
- Puvaneswari, G. An Efficient Double Deep Q Learning Network-Based Soft Faults Detection and Localization in Analog Circuits. J. Circuits Syst. Comput. 2024, 33, 2450230. [Google Scholar] [CrossRef]
- Hewing, L.; Kabzan, J.; Zeilinger, M.N. Cautious Model Predictive Control Using Gaussian Process Regression. IEEE Trans. Control Syst. Technol. 2020, 28, 2736–2743. [Google Scholar] [CrossRef]
- Saccani, D.; Cecchin, L.; Fagiano, L. Multitrajectory Model Predictive Control for Safe UAV Navigation in an Unknown Environment. IEEE Trans. Control Syst. Technol. 2023, 31, 1982–1997. [Google Scholar] [CrossRef]
- Fossen, T.I. An Adaptive Line-of-Sight (ALOS) Guidance Law for Path Following of Aircraft and Marine Craft. IEEE Trans. Control Syst. Technol. 2023, 31, 2887–2894. [Google Scholar] [CrossRef]
- Fan, Y.; Dong, H.; Zhao, X.; Denissenko, P. Path-Following Control of Unmanned Underwater Vehicle Based on an Improved TD3 Deep Reinforcement Learning. IEEE Trans. Control Syst. Technol. 2024, 32, 1904–1919. [Google Scholar] [CrossRef]
- De Lellis, F.; Coraggio, M.; Russo, G.; Musolesi, M.; di Bernardo, M. Guaranteeing Control Requirements via Reward Shaping in Reinforcement Learning. IEEE Trans. Control Syst. Technol. 2024, 32, 2102–2113. [Google Scholar] [CrossRef]
- Sharifi, M.; Azimi, V.; Mushahwar, V.K.; Tavakoli, M. Impedance Learning-Based Adaptive Control for Human–Robot Interaction. IEEE Trans. Control Syst. Technol. 2022, 30, 1345–1358. [Google Scholar] [CrossRef]
- Bai, H.; Gao, W.; Ma, H.; Ding, P.; Wang, G.; Xu, W.; Wang, W.; Du, Z. A study of robotic search strategy for multi-radiation sources in unknown environments. Robot. Auton. Syst. 2023, 169, 18. [Google Scholar] [CrossRef]
- Feng, A.; Xie, Y.; Sun, Y.; Wang, X.; Jiang, B.; Xiao, J. Efficient Autonomous Exploration and Mapping in Unknown Environments. Sensors 2023, 23, 4766. [Google Scholar] [CrossRef]
- Bellemare, M.G.; Srinivasan, S.; Ostrovski, G.; Schaul, T.; Saxton, D.; Munos, R. Unifying Count-Based Exploration and Intrinsic Motivation. arXiv 2016, arXiv:1606.01868. [Google Scholar] [CrossRef]
- Tang, H.; Houthooft, R.; Foote, D.; Stooke, A.; Chen, X.; Duan, Y.; Schulman, J.; De Turck, F.; Abbeel, P. #Exploration: A study of count-based exploration for deep reinforcement learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Stadie, B.C.; Levine, S.; Abbeel, P. Incentivizing exploration in reinforcement learning with deep predictive models. arXiv 2015, arXiv:1507.00814. [Google Scholar] [CrossRef]
Parameter | Implication | Numerical Value |
---|---|---|
ε | search strategy | 0.9 |
β | Proportion of forward loss | 0.8 |
α | learning rate | 0.0001 |
γ | discount factor | 0.9 |
i_scale | Internal reward ratio coefficient | 0.6 |
e_scale | External Rewards Proportion Coefficient | 0.4 |
TOTAL | Total number of training | 15,000 |
MEMORY_CAPACITY | Experience pool | 2000 |
TARGET_REPLACE_ITER | Target network parameter update interval | 200 |
BATCH_SIZE | Number of one training | 64 |
Map Size | Algorithm | |||
---|---|---|---|---|
SSA | Improve the Algorithm | DBO | PSO | |
20 × 20 | 35.0564 | 26.3478 | 29.5514 | 31.4204 |
33.2074 | 28.4268 | 29.0778 | 30.3883 | |
31.7121 | 28.0192 | 31.9641 | 30.6962 | |
37.0226 | 28.0932 | 36.0067 | 30.8259 | |
25 × 25 | 43.8584 | 34.8282 | 37.9499 | 38.0633 |
52.4361 | 35.1193 | 39.0432 | 39.0101 | |
46.0552 | 34.9859 | 40.6303 | 37.4334 | |
43.5563 | 35.2752 | 47.9148 | 41.4558 | |
30 × 30 | 59.3848 | 43.6276 | 51.1516 | 46.6471 |
61.1908 | 42.4656 | 51.8012 | 50.1706 | |
66.6493 | 43.7405 | 50.6116 | 48.1259 | |
62.3338 | 43.4581 | 51.3346 | 50.5918 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, J.; Zhang, Y.; Wu, N.; Han, X.; Ning, L.; Ren, X.; Fang, L.; Wang, J.; Ren, X.; Zhang, Y.; et al. Application of an Improved Double Q-Learning Algorithm in Ground Mobile Robots. Symmetry 2025, 17, 1530. https://doi.org/10.3390/sym17091530
Zhao J, Zhang Y, Wu N, Han X, Ning L, Ren X, Fang L, Wang J, Ren X, Zhang Y, et al. Application of an Improved Double Q-Learning Algorithm in Ground Mobile Robots. Symmetry. 2025; 17(9):1530. https://doi.org/10.3390/sym17091530
Chicago/Turabian StyleZhao, Jinchao, Ya Zhang, Nan Wu, Xinye Han, Luoyin Ning, Xiaowei Ren, Lingling Fang, Jiaxuan Wang, Xu Ren, Yu Zhang, and et al. 2025. "Application of an Improved Double Q-Learning Algorithm in Ground Mobile Robots" Symmetry 17, no. 9: 1530. https://doi.org/10.3390/sym17091530
APA StyleZhao, J., Zhang, Y., Wu, N., Han, X., Ning, L., Ren, X., Fang, L., Wang, J., Ren, X., Zhang, Y., & Feng, J. (2025). Application of an Improved Double Q-Learning Algorithm in Ground Mobile Robots. Symmetry, 17(9), 1530. https://doi.org/10.3390/sym17091530