A Reinforcement Learning-Based Double Layer Controller for Mobile Robot in Human-Shared Environments
Abstract
1. Introduction
- 1.
- We estimate environmental uncertainties through stochastic simulations and propose a hierarchical mobile robot controller that integrates scLTL and RL. Unlike conventional approaches that rely on real-time obstacle avoidance, our method explicitly handles environmental uncertainties at the planning level.
- 2.
- A novel RL-based double-layer algorithm is developed. Unlike traditional product- or encoder-type algorithms that use scLTL and RL, the proposed algorithm separates the learning problem into high-level and low-level layers. This decomposition enables the robot to compute an optimal path in human-shared environments with significantly reduced learning costs.
- 3.
- At the higher level, an FSA-RL learner is proposed on the basis of FSA transitions to find an optimal global strategy for multiple tasks. Notably, we extend the deterministic state transitions of the FSA to a stochastic formulation, allowing the controller to better adapt to dynamic environments.
- 4.
- At the lower level, we construct an MDP learner that improves upon the classic MDP by redefining the reward function to explicitly account for environmental uncertainties at the planning level. This enhancement enables the generation of an optimal local path under dynamic and uncertain conditions.
2. Related Work
3. Preliminaries
- (False) is an scLTL formula,
- is an scLTL formula, and
- given an scLTL formula , it is written in a positive normal form where ¬ is only written in front of an atomic proposition .
4. Problem Formulation
- Visits and collects workpieces, which is called the collection task
- Delivers the collected workpieces to its destination , named the delivery task
- are the budgets for the collection and delivery tasks, respectively.
- shows a task that is not achieved
- represents a task that is achieved.
- : The agent does not accomplish the collection task at workstation
- : The agent successfully completes the online collection task at workstation
- : The agent successfully completes the online delivery task of workpieces from workstation .
- : The agent will achieve online collection task in future
- : The agent will achieve online delivery task in future
- : The delivery task will not be achieved before the collection task is completed.
5. Proposals
5.1. Model Environment Uncertainties
5.2. High-Level: FSA-RL Learner
- is the state set
- is the action set
- is the state transition probability
- is the state transition function (with being the low-level state set)
- is the label function
- is the atomic proposition set
- is the reward function, which is determined by the low-level MDP learner
- is the initial state and is the accept state.
5.3. Low-Level: MDP Learner
- is the state set
- is the action set
- is the state transition probability
- is the reward function with , where and
- is the initial state and is the goal state, both of which are provided by the higher level
- is the risk set of the environment.
Algorithm 1: FSA-RL learner |
Algorithm 2: Low-level MDP learner |
6. Results and Discussion
6.1. Reward Design and Environment Setting
6.2. Results on Scenario 1: Workstation Number and Forklift Number
6.3. Results on Scenario 2: Increase Forklift Number to
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
RL | Reinforcement Learning |
MDP | Markov Decision Process |
scLTL | Syntactically Co-Safe Linear Temporal Logic |
FSA | Finite State Automaton |
FSA-RL | Finite State Automaton Reinforcement Learning |
References
- Belda, K.; Jirsa, J. Control Principles of Autonomous Mobile Robots Used in Cyber-Physical Factories. In Proceedings of the 2021 23rd International Conference on Process Control (PC), Strbske Pleso, Slovakia, 1–4 June 2021; pp. 96–101. [Google Scholar] [CrossRef]
- Ramdani, N.; Panayides, A.; Karamousadakis, M.; Mellado, M.; Lopez, R.; Christophorou, C.; Rebiai, M.; Blouin, M.; Vellidou, E.; Koutsouris, D. A Safe, Efficient and Integrated Indoor Robotic Fleet for Logistic Applications in Healthcare and Commercial Spaces: The ENDORSE Concept. In Proceedings of the 2019 20th IEEE International Conference on Mobile Data Management (MDM), Hong Kong, China, 10–13 June 2019; pp. 425–430. [Google Scholar] [CrossRef]
- Özbaran, C.; Dilibal, S.; Sungur, G. Mechatronic System Design of A Smart Mobile Warehouse Robot for Automated Storage/Retrieval Systems. In Proceedings of the 2020 Innovations in Intelligent Systems and Applications Conference (ASYU), Istanbul, Turkey, 15–17 October 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Latsou, C.; Farsi, M.; Erkoyuncu, J.A.; Morris, G. Digital twin integration in multi-agent cyber physical manufacturing systems. IFAC-PapersOnLine 2021, 54, 811–816. [Google Scholar] [CrossRef]
- Pan, L.; Hu, B.; Sun, Z.; Xu, L.; Gong, G.; Xie, X.; Sun, Z. A Review of the Evolutionary Algorithm Based VRP Problem. In Proceedings of the 2023 42nd Chinese Control Conference (CCC), Tianjin, China, 24–26 July 2023; pp. 1939–1944. [Google Scholar] [CrossRef]
- Zhang, C.; Sun, P. Heuristic Methods for Solving the Traveling Salesman Problem (TSP): A Comparative Study. In Proceedings of the 2023 IEEE 34th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), Toronto, ON, Canada, 5–8 September 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Di, L.; Sun, D.; Qi, Y.; Xiao, Z. Research on Shortest Path Planning and Smoothing Without Obstacle Collision Based on Moving Carrier. Int. J. Aerosp. Eng. 2024, 2024, 5235125. [Google Scholar] [CrossRef]
- Voloch, N.; Zadok, Y.; Voloch-Bloch, N.; Hajaj, M.M. Using Combined Knapsack and Shortest Path Problems for Planning Optimal Navigation Paths for Robotic Deliveries. In Proceedings of the 2024 10th International Conference on Automation, Robotics and Applications (ICARA), Athens, Greece, 22–24 February 2024; pp. 139–143. [Google Scholar] [CrossRef]
- Ferguson, D.; Stentz, A. Using interpolation to improve path planning: The Field D* algorithm. J. Field Robot. 2006, 23, 79–101. [Google Scholar] [CrossRef]
- Warren, C. Fast path planning using modified A* method. In Proceedings of the IEEE International Conference on Robotics and Automation, Atlanta, GA, USA, 2–6 May 1993; Volume 2, pp. 662–667. [Google Scholar] [CrossRef]
- Calin Belta, B.Y.; Gol, E.A. Formal Methods for Discrete-Time Dynamical Systems; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar] [CrossRef]
- Chen, Y.; Ding, X.C.; Stefanescu, A.; Belta, C. Formal approach to the deployment of distributed robotic teams. IEEE Trans. Robot. 2012, 28, 158–171. [Google Scholar] [CrossRef]
- Mi, J.; Zhang, X.; Long, Z.; Wang, J.; Xu, W.; Xu, Y.; Deng, S. A mobile robot safe planner for multiple tasks in human-shared environments. PLoS ONE 2025, 20, e0324534. [Google Scholar] [CrossRef] [PubMed]
- Huang, Y.; Wang, Y.; Zatarain, O. Dynamic Path Optimization for Robot Route Planning. In Proceedings of the 2019 IEEE 18th International Conference on Cognitive Informatics & Cognitive Computing (ICCI-CC), Milan, Italy, 23–25 July 2019; pp. 47–53. [Google Scholar] [CrossRef]
- Wang, B.; Liu, Z.; Li, Q.; Prorok, A. Mobile Robot Path Planning in Dynamic Environments Through Globally Guided Reinforcement Learning. IEEE Robot. Autom. Lett. 2020, 5, 6932–6939. [Google Scholar] [CrossRef]
- Jiaoyang, L.; Andrew, T.; Scott, K.; Joseph W., D.; Satish, T.K.K.; Sven, K. Lifelong Multi-Agent Path Finding in Large-Scale Warehouses. In Proceedings of the The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), Auckland, New Zealand, 9–13 May 2021; pp. 11272–11281. [Google Scholar]
- Ullah, Z.; Xu, Z.; Zhang, L.; Zhang, L.; Ullah, W. RL and ANN Based Modular Path Planning Controller for Resource-Constrained Robots in the Indoor Complex Dynamic Environment. IEEE Access 2018, 6, 74557–74568. [Google Scholar] [CrossRef]
- Hazem, B. Study of Q-learning and deep Q-network learning control for a rotary inverted pendulum system. Discov. Appl. Sci. 2024, 6, 49. [Google Scholar] [CrossRef]
- Jiang, J.; Yang, L.; Zhang, L. DQN-based on-line Path Planning Method for Automatic Navigation of Miniature Robots. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 5407–5413. [Google Scholar] [CrossRef]
- Carvalho, J.P.; Aguiar, A.P. A Reinforcement Learning Based Online Coverage Path Planning Algorithm. In Proceedings of the 2023 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), Tomar, Portugal, 26–27 April 2023; pp. 81–86. [Google Scholar] [CrossRef]
- Chai, R.; Niu, H.; Carrasco, J.; Arvin, F.; Yin, H.; Lennox, B. Design and Experimental Validation of Deep Reinforcement Learning-Based Fast Trajectory Planning and Control for Mobile Robot in Unknown Environment. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 5778–5792. [Google Scholar] [CrossRef]
- Mi, J.; Kuze, N.; Ushio, T. A mobile robot controller using reinforcement learning under scLTL specifications with uncertainties. Asian J. Control 2022, 24, 2916–2930. [Google Scholar] [CrossRef]
- Guruji, A.K.; Agarwal, H.; Parsediya, D. Time-efficient A* Algorithm for Robot Path Planning. Procedia Technol. 2016, 23, 144–149. [Google Scholar] [CrossRef]
- Dakulović, M.; Petrović, I. Two-way D* algorithm for path planning and replanning. Robot. Auton. Syst. 2011, 59, 329–342. [Google Scholar] [CrossRef]
- Blekas, K.; Vlachos, K. RL-based path planning for an over-actuated floating vehicle under disturbances. Robot. Auton. Syst. 2018, 101, 93–102. [Google Scholar] [CrossRef]
- Dam, T.; Chalvatzaki, G.; Peters, J.; Pajarinen, J. Monte-Carlo robot path planning. IEEE Robot. Autom. Lett. 2022, 7, 11213–11220. [Google Scholar] [CrossRef]
- Li, W.; Liu, Y.; Ma, Y.; Xu, K.; Qiu, J.; Gan, Z. A self-learning Monte Carlo tree search algorithm for robot path planning. Front. Neurorobotics 2023, 17, 1039644. [Google Scholar] [CrossRef] [PubMed]
- Cheng, R.; Orosz, G.; Murray, R.M.; Burdick, J.W. End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks. Proc. AAAI Conf. Artif. Intell. 2019, 33, 3387–3395. [Google Scholar] [CrossRef]
- El-Shamouty, M.; Wu, X.; Yang, S.; Albus, M.; Huber, M.F. Towards Safe Human-Robot Collaboration Using Deep Reinforcement Learning. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 4899–4905. [Google Scholar] [CrossRef]
- Liu, Q.; Liu, Z.; Xiong, B.; Xu, W.; Liu, Y. Deep reinforcement learning-based safe interaction for industrial human-robot collaboration using intrinsic reward function. Adv. Eng. Inform. 2021, 49, 101360. [Google Scholar] [CrossRef]
- Shao, Y.S.; Chen, C.; Kousik, S.; Vasudevan, R. Reachability-Based Trajectory Safeguard (RTS): A Safe and Fast Reinforcement Learning Safety Layer for Continuous Control. IEEE Robot. Autom. Lett. 2021, 6, 3663–3670. [Google Scholar] [CrossRef]
- Ghandour, M.; Liu, H.; Stoll, N.; Thurow, K. A hybrid collision avoidance system for indoor mobile robots based on human-robot interaction. In Proceedings of the 2016 17th International Conference on Mechatronics–Mechatronika (ME), Prague, Czech Republic, 7–9 December 2016; pp. 1–7. [Google Scholar]
- Zeng, L.; Bone, G.M. Mobile Robot Collision Avoidance in Human Environments. Int. J. Adv. Robot. Syst. 2013, 10, 41. [Google Scholar] [CrossRef]
- Yu, Q.; Zhou, J. A Review of Global and Local Path Planning Algorithms for Mobile Robots. In Proceedings of the 2024 8th International Conference on Robotics, Control and Automation (ICRCA), Shanghai, China, 12–14 January 2024; pp. 84–90. [Google Scholar] [CrossRef]
- De Araujo, P.R.M.; Mounier, E.; Dawson, E.; Noureldin, A. Smart Mobility: Leveraging Perception Sensors for Map-Based Navigation in Autonomous Vehicles. In Proceedings of the 2024 IEEE International Conference on Smart Mobility (SM), Niagara Falls, ON, Canada, 16–18 September 2024; pp. 281–286. [Google Scholar] [CrossRef]
- Chan, C.C.; Tsai, C.C. Collision-Free Speed Alteration Strategy for Human Safety in Human-Robot Coexistence Environments. IEEE Access 2020, 8, 80120–80133. [Google Scholar] [CrossRef]
- Ziebart, B.D.; Ratliff, N.; Gallagher, G.; Mertz, C.; Peterson, K.; Bagnell, J.A.; Hebert, M.; Dey, A.K.; Srinivasa, S. Planning-based prediction for pedestrians. In Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, St. Louis, MO, USA, 10–15 October 2009; pp. 3931–3936. [Google Scholar] [CrossRef]
- Fisac, J.F.; Bajcsy, A.; Herbert, S.L.; Fridovich-Keil, D.; Wang, S.; Tomlin, C.J.; Dragan, A.D. Probabilistically safe robot planning with confidence-based human predictions. arXiv 2018, arXiv:1806.00109. [Google Scholar] [CrossRef]
- Chen, M.; Shih, J.C.; Tomlin, C.J. Multi-vehicle collision avoidance via hamilton-jacobi reachability and mixed integer programming. In Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC), Las Vegas, NV, USA, 12–14 December 2016; pp. 1695–1700. [Google Scholar] [CrossRef]
- Bansal, S.; Chen, M.; Fisac, J.F.; Tomlin, C.J. Safe Sequential Path Planning of Multi-Vehicle Systems Under Disturbances and Imperfect Information. arXiv 2016, arXiv:1603.05208. [Google Scholar] [CrossRef]
- Bajcsy, A.; Herbert, S.L.; Fridovich-Keil, D.; Fisac, J.F.; Deglurkar, S.; Dragan, A.D.; Tomlin, C.J. A Scalable Framework for Real-Time Multi-Robot, Multi-Human Collision Avoidance. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 936–943. [Google Scholar] [CrossRef]
- Grenouilleau, F.; Van Hoeve, W.J.; Hooker, J.N. A multi-label A* algorithm for multi-agent pathfinding. In Proceedings of the International Conference on Automated Planning and Scheduling, Berkeley, CA, USA, 11–15 July 2019; Volume 29, pp. 181–185. [Google Scholar] [CrossRef]
- Lifelong Multi-Agent Path Finding in A Dynamic Environment. In Proceedings of the 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV), Singapore, 18–21 November 2018; pp. 875–882. [CrossRef]
- Sadigh, D.; Kim, E.S.; Coogan, S.; Sastry, S.S.; Seshia, S.A. A learning based approach to control synthesis of markov decision processes for linear temporal logic specifications. In Proceedings of the 53rd IEEE Conference on Decision and Control, Singapore, 18–21 November 2014; pp. 1091–1096. [Google Scholar] [CrossRef]
- Cho, K.; Suh, J.; Tomlin, C.J.; Oh, S. Cost-aware path planning under co-safe temporal logic specifications. IEEE Robot. Autom. Lett. 2017, 2, 2308–2315. [Google Scholar] [CrossRef]
- Hiromoto, M.; Ushio, T. Learning an Optimal Control Policy for a Markov Decision Process Under Linear Temporal Logic Specifications. In Proceedings of the 2015 IEEE Symposium Series on Computational Intelligence, Cape Town, South Africa, 7–10 December 2015; pp. 548–555. [Google Scholar] [CrossRef]
- Dong, X.; Wan, G.; Zeng, P.; Song, C.; Cui, S. Optimizing Robotic Task Sequencing and Trajectory Planning on the Basis of Deep Reinforcement Learning. Biomimetics 2024, 9, 10. [Google Scholar] [CrossRef] [PubMed]
- Dalal, M.; Chiruvolu, T.; Chaplot, D.; Salakhutdinov, R. Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks. arXiv 2024, arXiv:2405.01534. [Google Scholar]
- Teja Singamaneni, P.; Favier, A.; Alami, R. Human-Aware Navigation Planner for Diverse Human-Robot Interaction Contexts. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 5817–5824. [Google Scholar] [CrossRef]
- Kupferman, O.; Vardi, M.Y. Model Checking of Safety Properties. Form. Methods Syst. Des. 2001, 19, 291–314. [Google Scholar] [CrossRef]
- Latvala, T. Efficient Model Checking of Safety Properties. In Proceedings of the 10th International SPIN Workshop on Model Checking of Software, Portland, OR, USA, 9–10 May 2003; pp. 74–88. [Google Scholar] [CrossRef]
Methods | Environment | Risk Foresight | |
---|---|---|---|
Static | Dynamic | ||
A* | ✓ | – | – |
D* | ✓ | – | – |
RRT* | ✓ | – | – |
RL | ✓ | ✓ | ✓ |
FSA-MDP | ✓ | ✓ | ✓ |
MCTS | ✓ | ✓ | ✓ |
Real-time solution | ✓ | ✓ | – |
Methods | Handling of Environmental Uncertainty | Human- Interaction | Task Sequencing |
---|---|---|---|
Product MDP [46] | – | – | ✓ |
Feb-MDP [22] | real-time way | – | ✓ |
DRL [47] | – | – | ✓ |
Plan-Seq-Learn [48] | – | – | ✓ |
G2RL [15] | real-time way | – | – |
Co-HAN [49] | real-time way | ✓ | – |
Ours | at the planning level | ✓ | ✓ |
Types of Reward | Value | Learner |
---|---|---|
E | high-level | |
high-level | ||
step_cost | high-/low-level | |
goal_reward | low-level | |
Penalty | low-level | |
c_cost | low-level |
Parameter | Value | |
---|---|---|
100 | Section 4 | |
(randomly move one step or two steps) | Section 5.1 | |
≈ | ||
2000 | ||
high-/low-level | ||
0.9 | ||
cimputation_budget | 1000 | |
low-level |
Time-Step t | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 |
---|---|---|---|---|---|---|---|---|---|---|
door | 0.010 | 0.150 | 0.480 | 0.650 | 0.460 | 0.190 | 0.050 | 0.010 | 0.000 | 0.000 |
door | 0.500 | 0.150 | 0.110 | 0.050 | 0.010 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
door | 0.500 | 0.150 | 0.060 | 0.020 | 0.010 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
Time-Step t | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 |
---|---|---|---|---|---|---|---|---|---|---|---|
door | 0.336 | 0.714 | 0.730 | 0.727 | 0.739 | 0.500 | 0.197 | 0.048 | 0.006 | 0.000 | 0.000 |
door | 0.807 | 0.500 | 0.147 | 0.111 | 0.020 | 0.010 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
door | 1.178 | 0.728 | 0.208 | 0.024 | 0.020 | 0.010 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mi, J.; Liu, J.; Xu, Y.; Long, Z.; Wang, J.; Xu, W.; Ji, T. A Reinforcement Learning-Based Double Layer Controller for Mobile Robot in Human-Shared Environments. Appl. Sci. 2025, 15, 7812. https://doi.org/10.3390/app15147812
Mi J, Liu J, Xu Y, Long Z, Wang J, Xu W, Ji T. A Reinforcement Learning-Based Double Layer Controller for Mobile Robot in Human-Shared Environments. Applied Sciences. 2025; 15(14):7812. https://doi.org/10.3390/app15147812
Chicago/Turabian StyleMi, Jian, Jianwen Liu, Yue Xu, Zhongjie Long, Jun Wang, Wei Xu, and Tao Ji. 2025. "A Reinforcement Learning-Based Double Layer Controller for Mobile Robot in Human-Shared Environments" Applied Sciences 15, no. 14: 7812. https://doi.org/10.3390/app15147812
APA StyleMi, J., Liu, J., Xu, Y., Long, Z., Wang, J., Xu, W., & Ji, T. (2025). A Reinforcement Learning-Based Double Layer Controller for Mobile Robot in Human-Shared Environments. Applied Sciences, 15(14), 7812. https://doi.org/10.3390/app15147812