Experimental Research on Avoidance Obstacle Control for Mobile Robots Using Q-Learning (QL) and Deep Q-Learning (DQL) Algorithms in Dynamic Environments
Abstract
:1. Introduction
- -
- Successful studies on the navigation controller of mobile robot paths based on the DQL algorithm based on the ROS operating system through simulation and experimental results.
- -
- Simulation and experimental results for mobile robots using the DQL algorithm compared to the QL algorithm have demonstrated the efficiency and superiority of the proposed algorithm in terms of (1) the proposed DQL algorithm being able to quickly and safely generate optimal and near-optimal paths; (2) the mobile robot moving quickly to the required location and avoiding obstacles; (3) the DQL algorithm not needing a defined environment and lacking a good trade-off between convergence speed and path length and the algorithm requiring a few milliseconds to compute a good solution in terms of length and safety; (4) the proposed DQL performance being improved compared with the performance of the latest related work; and (5) the suggested DQL increasing the route quality with regard to the length, computation time, and robot safety.
2. Mathematical Modeling of an Operating System for a Mobile Robot
2.1. Obstacle Modeling in the Mobile Robot Operating Environment
2.2. Mathematical Model
- : fragment index generated by the node and .
- : index of the obstacle in the navigation environment.
- : indices of the point that defines an obstacle.
- : index of segment r from the rectangle approximating the mobile robot.
- : index of the segment r that defines the rectangle of an obstacle.
- node of the path.
- obstacle.
- segment of the path defined by two nodes .
- segment of the obstacle defined by two points ().
- CurrentPos: current location of the robot.
- segment of the rectangle approximating the robot.
3. Deep Q-Learning and Q-Learning Algorithms in Path Planning for Mobile Robots
3.1. Q-Leaning
Algorithm 1: Classical Q-learning algorithm begins |
Initialization: , (states and m actions) for (each episode): ← a random state from the states set s; goal stage) by using an adequate policy (ε -greedy, etc.); ; using Equation (9); end-while end-for end |
3.2. Deep Q-Leaning
Algorithm 2: |
, learning factor α, discount factor, epsilon-greedy policy , robot pose, safety constraints , states’ s ∈ S, actions a ∈ A, weight θ Begin Initialize replay memory to capacity N with random weights , with random weights for episode = 1, M do Randomly set the robots pose in the scenario Observe initial states of robots s for t = 1, T due to: in relay memory EASY from EASY Calculate the target value for each minibatch transition end for |
- -
- Step 1: Transition through the neural network for the current state to obtain the predicted value Q(sj, aj; θ).
- -
- Step 2: If the transition sampled is a collision sample, then the evaluation for this pair () is directly set as the termination reward. Otherwise, forward neural networks are performed for the next state s’, the maximum overall network output is calculated, and the target for the action is computed using the Bellman equation (r + ). For all other activities, the target value is set to be the same as that initially returned in step 1.
- -
- Step 3: The Q-learning update algorithm uses the following loss function:
4. Simulation and Experimental Results
4.1. Set Status for a Mobile Robot
4.2. Set Action for a Mobile Robot
4.3. Setup of a Reward for a Mobile Robot
4.4. Parameter Setting for the Controller
4.5. Simulation Results on ROS-GAZEBO
4.6. Experiment Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Volos, C.K.; Kyprianidis, I.M.; Stouboulos, I.N. A chaotic path planning generator for autonomous mobile robots. Robots Auton. Syst. 2012, 60, 651–656. [Google Scholar] [CrossRef]
- Châari, I.; Koubâa, A.; Trigui, S.; Bennaceur, H.; Ammar, A.; Al-Shalfan, K. SmartPATH: An efficient hybrid ACO-GA algorithm for solving the global path planning problem of mobile robots. Int. J. Adv. Robot. Syst. 2014, 11, 94. [Google Scholar] [CrossRef]
- Gharajeh, M.S.; Jond, H.B. An intelligent approach for autonomous mobile robots path planning based on adaptive neuro-fuzzy inference system. Ain Shams Eng. J. 2021, 13, 101491. [Google Scholar] [CrossRef]
- Vagale, A.; Oucheikh, R.; Bye, R.T.; Osen, O.L.; Fossen, T.I. Path planning and collision avoidance for autonomous surface vehicles I: A review. J. Mar. Sci. Technol. 2021, 26, 1292–1306. [Google Scholar] [CrossRef]
- Zhang, C.; Zhou, L.; Li, Y.; Fan, Y. A dynamic path planning method for social robots in the home environment. Electronics 2020, 9, 1173. [Google Scholar] [CrossRef]
- Yingqi, X.; Wei, S.; Wen, Z.; Jingqiao, L.; Qinhui, L.; Han, S. A real-time dynamic path planning method combining artificial potential field method and biased target RRT algorithm. J. Phys. Conf. Ser. 2021, 1905, 012015. [Google Scholar] [CrossRef]
- Yang, B.; Yan, J.; Cai, Z.; Ding, Z.; Li, D.; Cao, Y.; Guo, L. A novel heuristic emergency path planning method based on vector grid map. ISPRS Int. J. Geo-Inf. 2021, 10, 370. [Google Scholar] [CrossRef]
- Xiao, S.; Tan, X.; Wang, J. A simulated annealing algorithm and grid map-based UAV coverage path planning method for 3D reconstruction. Electronics 2021, 10, 853. [Google Scholar] [CrossRef]
- Lin, T. A path planning method for mobile robot based on A and antcolony algorithms. J. Innov. Soc. Sci. Res. 2020, 7, 157–162. [Google Scholar]
- Guo, J.; Liu, L.; Liu, Q.; Qu, Y. An Improvement of D* Algorithm for Mobile Robot Path Planning in Partial Unknown Environment. In Proceedings of the 2009 Second International Conference on Intelligent Computation Technology and Automation, Changsha, China, 10–11 October 2009; ISBN 978-0-7695-3804-4. [Google Scholar] [CrossRef]
- Lai, X.; Wu, D.; Wu, D.; Li, J.H.; Yu, H. Enhanced DWA algorithm for local path planning of mobile robot. Ind. Robot. Int. J. Robot. Res. Appl. 2022, 50, 186–194. [Google Scholar] [CrossRef]
- Zong, C.; Han, X.; Zhang, D.; Liu, Y.; Zhao, W.; Sun, M. Research on local path planning based on improved RRT algorithm. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2021, 235, 2086–2100. [Google Scholar] [CrossRef]
- Tsai, C.C.; Huang, H.C.; Chan, C.K. Parallel elite genetic algorithm and its application to global path planning for autonomous robot navigation. IEEE Trans. Ind. Electron. 2011, 58, 4813–4821. [Google Scholar] [CrossRef]
- Saska, M.; Macăs, M.; Přeučil, L.; Lhotská, L. Robot path planning using particle swarm optimization of Ferguson splines. In Proceedings of the IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Diplomat Hotel Prague, Czech Republic, 20–22 September 2006; IEEE Press: New York, NY, USA, 2006; pp. 833–839. [Google Scholar]
- Raja, P.; Pugazhenthi, S. On-line path planning for mobile robots in dynamic environments. Neural Netw. World 2012, 22, 67–83. [Google Scholar] [CrossRef]
- Thuong, T.T.; Ha, V.T. Adaptive Control for Mobile Robots Based on Inteligent Controller. J. Appl. Sci. Eng. 2023, 27, 2481–2487. [Google Scholar]
- Thuong, T.T.; Ha, V.T.; Truc, L.N. Intelligent Control for Mobile Robots Based on Fuzzy Logic Controller. In The International Conference on Intelligent Systems & Networks; Springer: Singapore, 2023; pp. 566–573. [Google Scholar]
- Chen, X.; Kong, Y.; Fang, X.; Wu, Q. A fast two-stage ACO algorithm for robotic path planning. Neural Comput. Appl. 2013, 22, 313–319. [Google Scholar] [CrossRef]
- Purcaru, C.; Precup, R.E.; Iercan, D.; Fedorovici, L.-O.; David, R.-C.; Dragan, F. Optimal robot path planning using gravitational search algorithm. Int. J. Artif. Intell. 2013, 10, 1–20. [Google Scholar]
- Li, P.; Duan, H.B. Path planning of unmanned aerial vehicle based on improved gravitational search algorithm. Sci. China Technol. Sci. 2012, 55, 2712–2719. [Google Scholar] [CrossRef]
- Duan, H.B.; Qiao, P.X. Pigeon-inspired optimization: A new swarm intelligence optimizer for air robot path planning. Int. J. Intell. Comput. Cybern. 2014, 7, 24–37. [Google Scholar] [CrossRef]
- Liu, J.; Wang, Q.; He, C.; Jaffrès-Runser, K.; Xu, Y.; Li, Z.; Xu, Y. QMR:Q-learning based Multi-objective optimization Routing protocol for Flying Ad Hoc Networks. Comput. Commun. 2019, 150, 304–316. [Google Scholar] [CrossRef]
- Low, E.S.; Ong, P.; Cheah, K.C. Solving the optimal path planning of a mobile robot using improved Q-learning. Robot. Auton. Syst. 2019, 115, 143–161. [Google Scholar] [CrossRef]
- Luviano, D.; Yu, W. Continuous-time path planning for multi-agents with fuzzy reinforcement learning. J. Intell. Fuzzy Syst. 2017, 33, 491–501. [Google Scholar] [CrossRef]
- Qu, C.; Gai, W.; Zhong, M.; Zhang, J. A novel reinforcement learning based gray wolf optimizer algorithm for un-manned aerial vehicles (UAVs) path planning. Appl. Soft Comput. 2020, 89, 106099. [Google Scholar] [CrossRef]
- Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Jaradat, M.A.K.; Al-Rousan, M.; Quadan, L. Reinforcement based mobile robot navigation in dynamic environment. Robot. Comput. Manuf. 2011, 27, 135–149. [Google Scholar] [CrossRef]
- Ganapathy, V.; Yun, S.C.; Joe, H.K. Neural Q-learning controller for mobile robot. In Proceedings of the 2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics, Singapore, 14–17 July 2009; pp. 863–868. [Google Scholar]
- Oh, C.H.; Nakashima, T.; Ishibuchi, H. Initialization of Q-values by fuzzy rules for hastening Qlearning. In Proceedings of the 1998 IEEE International Joint Conference on Neural Networks Proceedings, IEEE World Congress on Computational Intelligence (Cat. No. 98CH36227), Anchorage, AK, USA, 4–9 May 1998; Volume 3, pp. 2051–2056. [Google Scholar]
- Jiang, J.; Xin, J. Path planning of a mobile robot in a free-space environment using Q-learning. Prog. Artif. Intell. 2019, 8, 133–142. [Google Scholar] [CrossRef]
- Wang, Y.-H.; Li, T.-H.S.; Lin, C.-J. Backward Q-learning: The combination of Sarsa algorithm and Q-learning. Eng. Appl. Artif. Intell. 2013, 26, 2184–2193. [Google Scholar] [CrossRef]
- Kdas, P.; Mandhata, S.C.; Behera, H.S.; Patro, S.N. An Improved Q-learning Algorithm for Path-Planning of a Mobile Robot. Int. J. Comput. Appl. 2012, 51, 40–46. [Google Scholar] [CrossRef]
- Goswami, I.; Das, P.K.; Konar, A.; Janarthanan, R. Extended Q-learning algorithm for pathplanning of a mobile robot. In Proceedings of the Asia-Pacific Conference on Simulated Evolution and Learning, IIT Kanpur, India, 1–4 December 2010; Springer: Berlin/Heidelberg, Berlin, 2010; pp. 379–383. [Google Scholar]
Action | Angular Velocity |
---|---|
0 | −1.5 |
1 | −0.75 |
2 | 0 |
3 | 0.75 |
4 | 1.5 |
T | 6000 (s) | Time step of one cycle |
γ | 0.99 | The discount factor |
α | 25 × 10−5 | Learning speed |
ξ | 1.0 | Probability of choosing a random action |
ξreduce | 0.99 | Reduction rate of epsilon. When a cycle ends, epsilon decreases |
ξmin | 0.05 | Minimum stats of epsilon |
Batch size | sixty-four | Activate a group of training templates |
Train start | Sixty-four | Start of input training |
Memory | 106 | Memory size |
No. | Algorithm | Case 1 | Case 2 | ||
---|---|---|---|---|---|
Distance (m) | Run Time (s) | Distance (m) | Run Time (s) | ||
1 | QL | 17.758 | 12.314 | 18.416 | 14.637 |
2 | DQL | 17.129 | 7.927 | 17.235 | 8.324 |
No. | Algorithm | Case 1 | |
---|---|---|---|
Distance (m) | Run Time (s) | ||
1 | QL | 6.271 | 72.132 |
DQL | 5.753 | 47.853 | |
2 | QL | 6.314 | 74.097 |
DQL | 5.958 | 51.845 | |
3 | QL | 6.264 | 72.124 |
DQL | 5.386 | 45.734 | |
Case 2 | |||
Distance (m) | Run time (s) | ||
1 | QL | 20.123 | 57.372 |
DQL | 21.235 | 32.735 | |
2 | QL | 20.123 | 57.375 |
DQL | 21.235 | 33.738 | |
3 | QL | 20.123 | 57.379 |
DQL | 19.235 | 30.735 | |
Case 3 | |||
Distance (m) | Run time (s) | ||
1 | QL | 27.682 | 92.132 |
DQL | 25.638 | 87.867 | |
2 | QL | 27.682 | 92.132 |
DQL | 25.638 | 87.656 | |
3 | QL | 27.682 | 92.132 |
DQL | 26.338 | 60.853 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ha, V.T.; Vinh, V.Q. Experimental Research on Avoidance Obstacle Control for Mobile Robots Using Q-Learning (QL) and Deep Q-Learning (DQL) Algorithms in Dynamic Environments. Actuators 2024, 13, 26. https://doi.org/10.3390/act13010026
Ha VT, Vinh VQ. Experimental Research on Avoidance Obstacle Control for Mobile Robots Using Q-Learning (QL) and Deep Q-Learning (DQL) Algorithms in Dynamic Environments. Actuators. 2024; 13(1):26. https://doi.org/10.3390/act13010026
Chicago/Turabian StyleHa, Vo Thanh, and Vo Quang Vinh. 2024. "Experimental Research on Avoidance Obstacle Control for Mobile Robots Using Q-Learning (QL) and Deep Q-Learning (DQL) Algorithms in Dynamic Environments" Actuators 13, no. 1: 26. https://doi.org/10.3390/act13010026
APA StyleHa, V. T., & Vinh, V. Q. (2024). Experimental Research on Avoidance Obstacle Control for Mobile Robots Using Q-Learning (QL) and Deep Q-Learning (DQL) Algorithms in Dynamic Environments. Actuators, 13(1), 26. https://doi.org/10.3390/act13010026