Dynamic Robot Navigation in Confined Indoor Environment: Unleashing the Perceptron-Q Learning Fusion
Abstract
1. Introduction
Motivation
- To propose a new perceptron-Q learning fusion (PQLF) model for effective robot navigation in dynamic environments. The new combination of perceptron-based intelligence and the Q-reinforcement learning model allows the robots to make accurate decisions by avoiding obstacles.
- To offer an efficient reward system, a novel reward system is introduced, where a larger negative reward is assigned while the robots face an obstacle. Also, a small negative reward is assigned when it deviates from the goal, and a large positive reward is allocated for reaching the goal.
- To validate the strength of the proposed work, comprehensive experiments are performed, and the results are compared with other existing methods.
2. Related Works
Problem Statement
3. Proposed Methodology
3.1. Navigation Control
3.2. Proposed Robot Navigation Using Perceptron-Q Learning Fusion Model
- First, the agent interacts directly with the environment in each of its states to gather input.
- Second, the environment responds by giving positive or negative rewards for the activity, denoted by K+ or K−, respectively.
- Third, the agent maximizes the rewards that have already been gathered and recognizes changes in the surroundings.
- Fourth, starting from the current condition, the RL approach will be used at this point to increase the predicted reward rate.
Algorithm 1: Process of the proposed PQLF model in robot navigation |
Start Input: Action: Turn right, turn left, stop, move forward State: Output: Best state and action Initialize the parameters like epochs, learning rate, reward, random action, random state iteration, and mini-batch For to epochs do For to iteration do Begin in state While is not terminal do Evaluate policy Define action If robot meets obstacle = Positive reward else reward update End if Obtained new state For to mini batch do Determine ) through MLP = append End for End for End for |
3.3. Creation of the Dataset
Algorithm 2: Generation of dataset |
Threshold distance = 20 cm Distance sample = 10,000 Threshold distance = C[Distance sample, 4] = 0 While do C[t, 1] = rand [100]% Left sensor data C[t, 2] = rand [100]% Front sensor data C[t, 3] = rand [100]% Right sensor data If do Else if then Else if then Else if then Else if then Else if then Else if then End if End while End |
4. Results and Discussion
4.1. Performance Matrix
4.1.1. Detour Percentage [26]:
4.1.2. Moving Cost:
4.1.3. Computation Time:
4.2. Performance Comparison Using Various Models
Statistical Analysis
4.3. Discussion
4.4. Limitations and Boundaries of the Proposed Approach
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AV | Autonomous Vehicle |
DQN | Deep Q-network |
DDPG | Deep Deterministic Policy Gradient |
ESS | Energy Storage Systems |
GPS | Global Positioning System |
MDP | Markov Decision Process |
MLP | Multi-Layer Perceptron |
PQLF | Perceptron-Q Learning Fusion |
QL | Q-learning |
RL | Reinforcement Learning |
ROS | Robot Operating System |
References
- Zhao, J.; Liu, S.; Li, J. Research and implementation of autonomous navigation for mobile robots based on SLAM algorithm under ROS. Sensors 2022, 22, 4172. [Google Scholar] [CrossRef] [PubMed]
- Mishra, D.K.; Thomas, A.; Kuruvilla, J.; Kalyanasundaram, P.; Prasad, K.R.; Haldorai, A. Design of mobile robot navigation controller using neuro-fuzzy logic system. Comput. Electr. Eng. 2022, 101, 108044. [Google Scholar] [CrossRef]
- Li, J.; Ran, M.; Wang, H.; Xie, L. A behavior-based mobile robot navigation method with deep reinforcement learning. Unmanned Syst. 2021, 9, 201–209. [Google Scholar] [CrossRef]
- De Groot, O.; Ferranti, L.; Gavrila, D.M.; Alonso-Mora, J. Topology-driven parallel trajectory optimization in dynamic environments. IEEE Trans. Robot. 2024. [Google Scholar] [CrossRef]
- Chen, Z.; Chen, K.; Song, C.; Zhang, X.; Cheng, J.C.; Li, D. Global path planning based on BIM and physics engine for UGVs in indoor environments. Autom. Constr. 2022, 139, 104263. [Google Scholar] [CrossRef]
- Francis, A.; Pérez-d’Arpino, C.; Li, C.; Xia, F.; Alahi, A.; Alami, R.; Martín-Martín, R. Principles and guidelines for evaluating social robot navigation algorithms. ACM Trans. Hum. Robot Interact. 2025, 14, 1–65. [Google Scholar] [CrossRef]
- Xiao, X.; Liu, B.; Warnell, G.; Stone, P. Motion planning and control for mobile robot navigation using machine learning: A survey. Auton. Robot. 2022, 46, 569–597. [Google Scholar] [CrossRef]
- Petrlík, M.; Krajník, T.; Saska, M. LiDAR-Based Stabilization, Navigation and Localization for Uavs Operating In Dark Indoor Environments. In Proceedings of the 2021 International Conference on Unmanned Aircraft Systems (ICUAS), Athens, Greece, 15–18 June 2021; pp. 243–251. [Google Scholar]
- Singamaneni, P.T.; Favier, A.; Alami, R. Human-aware navigation planner for diverse human-robot interaction contexts. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 5817–5824. [Google Scholar]
- Kong, F.; Xu, W.; Cai, Y.; Zhang, F. Avoiding dynamic small obstacles with onboard sensing and computation on aerial robots. IEEE Robot. Autom. Lett. 2021, 6, 7869–7876. [Google Scholar] [CrossRef]
- Shah, D.; Sridhar, A.; Dashora, N.; Stachowicz, K.; Black, K.; Hirose, N.; Levine, S. ViNT: A foundation model for visual navigation. arXiv 2023, arXiv:2306.14846. [Google Scholar] [CrossRef]
- Jiménez, M.F.; Scheidegger, W.; Mello, R.C.; Bastos, T.; Frizera, A. Bringing proxemics to walker-assisted gait: Using admittance control with spatial modulation to navigate in confined spaces. Pers. Ubiquitous Comput. 2022, 26, 1491–1509. [Google Scholar] [CrossRef]
- Sandino, J.; Vanegas, F.; Maire, F.; Caccetta, P.; Sanderson, C.; Gonzalez, F. UAV framework for autonomous onboard navigation and people/object detection in cluttered indoor environments. Remote Sens. 2020, 12, 3386. [Google Scholar] [CrossRef]
- Adamkiewicz, M.; Chen, T.; Caccavale, A.; Gardner, R.; Culbertson, P.; Bohg, J.; Schwager, M. Vision-only robot navigation in a neural radiance world. IEEE Robot. Autom. Lett. 2022, 7, 4606–4613. [Google Scholar] [CrossRef]
- Lee, S.; Le, X.; Dae-Hyun, C. Privacy-preserving energy management of a shared energy storage system for smart buildings: A federated deep reinforcement learning approach. Sensors 2021, 21, 4898. [Google Scholar] [CrossRef] [PubMed]
- Weerakoon, K.; Sathyamoorthy, A.J.; Elnoor, M.; Manocha, D. Vapor: Legged robot navigation in unstructured outdoor environments using offline reinforcement learning. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 10344–10350. [Google Scholar]
- Rezende, A.M.; Júnior, G.P.; Fernandes, R.; Miranda, V.R.; Azpúrua, H.; Pessin, G.; Freitas, G.M. Indoor localization and navigation control strategies for a mobile robot designed to inspect confined environments. In Proceedings of the 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE), Hong Kong, 20–21 August 2020; pp. 1427–1433. [Google Scholar]
- Pérez-D’Arpino, C.; Liu, C.; Goebel, P.; Martín-Martín, R.; Savarese, S. Robot navigation in constrained pedestrian environments using reinforcement learning. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA) Xi’an, China, 30 May–5 June 2021; pp. 1140–1146. [Google Scholar]
- Ren, J.; Wu, T.; Zhou, X.; Yang, C.; Sun, J.; Li, M.; Zhang, A. SLAM, path planning algorithm and application research of an indoor substation wheeled robot navigation system. Electronics 2022, 11, 1838. [Google Scholar] [CrossRef]
- Singh, K.J.; Kapoor, D.S.; Thakur, K.; Sharma, A.; Nayyar, A.; Mahajan, S.; Shah, M.A. Map making in social indoor environment through robot navigation using active SLAM. IEEE Access 2022, 10, 134455–134465. [Google Scholar] [CrossRef]
- De Oliveira Júnior, A.; Piardi, L.; Bertogna, E.G.; Leitão, P. Improving the mobile robots indoor localization system by combining slam with fiducial markers. In Proceedings of the 2021 Latin American Robotics Symposium (LARS), 2021 Brazilian Symposium on Robotics (SBR), and 2021 Workshop on Robotics in Education (WRE), Virtual, 11–15 October 2021; pp. 234–239. [Google Scholar]
- Ren, T.; Jebelli, H. Efficient 3D robotic mapping and navigation method in complex construction environments. Comput. Aided Civ. Infrastruct. Eng. 2025, 40, 1580–1605. [Google Scholar] [CrossRef]
- Cai, H.; Chengxin, T.; Zhenfeng, L.; Xun, G.; Yue, S.; Mingrui, J.; Bencheng, L. Efficient particulate matter source localization in dynamic indoor environments: An experimental study by a multi-robot system. J. Build. Eng. 2024, 92, 109712. [Google Scholar] [CrossRef]
- De Heuvel, J.; Xiangyu, Z.; Weixian, S.; Tharun, S.; Maren, B. Spatiotemporal attention enhances lidar-based robot navigation in dynamic environments. IEEE Robot. Autom. Lett. 2024, 9, 4202–4209. [Google Scholar] [CrossRef]
- Wang, B.; Liu, Z.; Li, Q.; Prorok, A. Mobile robot path planning in dynamic environments through globally guided reinforcement learning. IEEE Robot. Autom. Lett. 2020, 5, 6932–6939. [Google Scholar] [CrossRef]
- Li, J.; Zhiyuan, S.; Yiyu, Q. Dynamic motion planning model for multirobot using graph neural network and historical information. Adv. Intell. Syst. 2023, 5, 2300036. [Google Scholar] [CrossRef]
- Han, C.; Baoying, L. Mobile robot path planning based on improved A* algorithm. In Proceedings of the 2023 IEEE 11th joint international information technology and artificial intelligence conference (ITAIC), Chongqing, China, 8–10 December 2023; Volume 11, pp. 672–676. [Google Scholar]
- Lu, R.; Jiang, Z.; Yang, T.; Chen, Y.; Wang, D.; Peng, X. A novel hybrid-action-based deep reinforcement learning for industrial energy management. In Proceedings of the IEEE Transactions on Industrial Informatics, Beijing, China, 17–20 August 2024. [Google Scholar]
Case 1 | Steering Angle° | Orientation of Obstacles |
---|---|---|
1 | 90 | Front |
2 | 0 | Corridor |
3 | 90 | Left front corner |
4 | 45 | Left |
5 | −45 (anti clockwise) | Right |
6 | −90 (anti clockwise) | Right front corner |
7 | 180 | Left, Front, Right |
Moving Cost | |||||
---|---|---|---|---|---|
Local | Global | Naïve | G2RL | Proposed | |
Regular 50 | 1.58 | 1.31 | 1.38 | 1.18 | 1.11 |
Regular 100 | 1.57 | 1.23 | 1.42 | 1.12 | 1.07 |
Regular 150 | 1.50 | 1.19 | 1.36 | 1.09 | 1.01 |
Random 50 | 1.35 | 1.28 | 1.36 | 1.21 | 1.16 |
Random 100 | 1.43 | 1.26 | 1.34 | 1.15 | 1.15 |
Random 150 | 1.37 | 1.17 | 1.40 | 1.11 | 1.05 |
Free 50 | 1.27 | 1.24 | 1.31 | 1.14 | 1.06 |
Free 100 | 1.31 | 1.21 | 1.34 | 1.11 | 1.08 |
Free 150 | 1.27 | 1.14 | 1.32 | 1.07 | 1.07 |
Detour Percentage (%) | |||||
---|---|---|---|---|---|
Local | Global | Naïve | G2RL | Proposed | |
Regular 50 | 36.7 | 23.7 | 31.5 | 15.2 | 20.7 |
Regular 100 | 36.3 | 18.7 | 39.5 | 10.7 | 16.7 |
Regular 150 | 33.3 | 16.0 | 35.0 | 8.2 | 15.755 |
Random 50 | 25.1 | 21.1 | 30.1 | 16.7 | 20.3 |
Random 100 | 30.0 | 20.5 | 32.3 | 13.0 | 13.5 |
Random 150 | 27.0 | 14.5 | 39.4 | 9.1 | 15.5 |
Free 50 | 21.2 | 19.4 | 28.9 | 12.3 | 19.2 |
Free 100 | 23.6 | 17.3 | 33.6 | 9.1 | 15.4 |
Free 150 | 21.2 | 12.3 | 31.5 | 6.5 | 11.7 |
Computing Time (s) | ||||
---|---|---|---|---|
Local [25] | Global [25] | G2RL [25] | Proposed | |
Regular 50 | 0.004 | 0.003 | 0.011 | 0.002 |
Regular 100 | 0.005 | 0.004 | 0.012 | 0.003 |
Regular 150 | 0.007 | 0.004 | 0.015 | 0.003 |
Random 50 | 0.005 | 0.004 | 0.013 | 0.002 |
Random 100 | 0.006 | 0.006 | 0.015 | 0.007 |
Random 150 | 0.10 | 0.006 | 0.018 | 0.005 |
Free 50 | 0.008 | 0.007 | 0.018 | 0.005 |
Free 100 | 0.15 | 0.011 | 0.022 | 0.01 |
Free 150 | 0.17 | 0.013 | 0.028 | 0.013 |
Methods | Global Re-Planning [25] | Discrete-ORCA [25] | PRIMAL [25] | Proposed |
---|---|---|---|---|
Regular | 95.7% | 88.7% | 92.3% | 99.8% |
Random | 98.2% | 55.0% | 80.6% | 99.5% |
Free | 98.8% | 99.5% | 75.7% | 99.6% |
Environment | Model | Mean Success Rate (%) | SD (±) | 95% CI (Lower–Upper) |
---|---|---|---|---|
Regular100 | G2RL [25] | 98.2 | 1.8 | 96.3–99.5 |
Regular100 | Proposed | 99.8 | 0.7 | 99.1–100 |
Random150 | G2RL [25] | 90.9 | 2.3 | 88.4–93.2 |
Random150 | Proposed | 99.5 | 0.9 | 98.7–100 |
Free150 | G2RL [25] | 93.5 | 1.9 | 91.6–95.4 |
Free150 | Proposed | 99.6 | 1.0 | 98.8–100 |
Feature/Criterion | Proposed PQLF | Vision-Based Navigation |
---|---|---|
Primary Input | Ultrasonic distance sensors + GPS (global reference) | Visual input (RGB images, depth maps, semantic features) |
Computation Requirement | Lightweight (0.003–0.013 s per step, suitable for embedded use) | High (requires GPUs/accelerators for real-time inference) |
Training Data | Generated internally (10,000 sensor samples, synthetic dataset) | Requires large-scale labeled/unlabeled visual datasets |
Generalization Ability | Strong in structured indoor environments with sensor cues | Stronger in unstructured, dynamic, and semantically rich scenes |
Real-time Deployment | Highly suitable for low-power robots with limited resources | More challenging on resource-constrained platforms |
Strength | Efficient obstacle avoidance, fast decision-making | Rich perception, semantic reasoning, adaptability |
Limitation | Relies on simplified sensors, less context awareness | Computationally expensive, sensitive to lighting/occlusions |
Algorithms | Training Episodes | |||||
---|---|---|---|---|---|---|
0 | 200 | 400 | 600 | 800 | 1000 | |
D-PDQN | 4.153 | 6.161 | 6.776 | 7.391 | 7.744 | 7.844 |
DDPG | 3.744 | 4.323 | 5.482 | 5.473 | 5.663 | 5.871 |
DQN | 1.916 | 5.554 | 6.495 | 6.495 | 6.921 | 6.794 |
PPO | 3.31 | 3.997 | 5.817 | 6.341 | 6.504 | 6.821 |
Proposed | 4.305 | 6.649 | 7.373 | 7.844 | 8.052 | 8.088 |
Energy Consumption (kWh) | |||||
---|---|---|---|---|---|
Time Slot (t) | 5 | 10 | 15 | 20 | 25 |
DQN | 29.804 | 29.715 | 0.692 | 0.54 | 29.426 |
DDPG | 29.804 | 29.715 | 29.715 | 24.257 | 11.351 |
PPO | 29.804 | 29.715 | 9.115 | 19.459 | 15.388 |
D-PDQN | 29.804 | 29.715 | 0.499 | 28.304 | 6.157 |
Proposed | 29.559 | 29.982 | 0.072 | 26.982 | 2.249 |
Energy Cost | ||||||
---|---|---|---|---|---|---|
Training Episode | 0 | 200 | 400 | 600 | 800 | 1000 |
Proposed | −20 | −72.55 | −83.11 | −84.7 | −85.51 | −84.84 |
D-PDQN | −10 | −63.79 | −74.0167 | −75.6226 | −75.7899 | −75.4658 |
MILP | −80 | −80 | −80 | −80 | −80 | −80 |
Random Policy | −20.3932 | −18.1775 | −25.9067 | −23.8075 | −20.5935 | −23.8533 |
Model | Moving Cost | Detour (%) | Success Rate (%) | Avg. Comp. Time (s) |
---|---|---|---|---|
Q-learning | 1.32 | 24.6 | 91.2 | 0.007 |
MLP only | 1.25 | 21.7 | 94.3 | 0.006 |
Proposed PQLF | 1.08 | 15.5 | 99.5 | 0.003 |
Parameter Varied | Value Tested | Success Rate (%) |
---|---|---|
Learning rate | 0.001/0.01/0.1 | 96.7/99.5/98.2 |
Discount factor | 0.7/0.9/0.99 | 95.6/99.5/99.0 |
Exploration rate | 0.1/0.01/0.001 | 94.8/99.5/97.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Denesh Babu, M.; Maheswari, C.; Priya, B.M. Dynamic Robot Navigation in Confined Indoor Environment: Unleashing the Perceptron-Q Learning Fusion. Sensors 2025, 25, 6384. https://doi.org/10.3390/s25206384
Denesh Babu M, Maheswari C, Priya BM. Dynamic Robot Navigation in Confined Indoor Environment: Unleashing the Perceptron-Q Learning Fusion. Sensors. 2025; 25(20):6384. https://doi.org/10.3390/s25206384
Chicago/Turabian StyleDenesh Babu, M., C. Maheswari, and B. Meenakshi Priya. 2025. "Dynamic Robot Navigation in Confined Indoor Environment: Unleashing the Perceptron-Q Learning Fusion" Sensors 25, no. 20: 6384. https://doi.org/10.3390/s25206384
APA StyleDenesh Babu, M., Maheswari, C., & Priya, B. M. (2025). Dynamic Robot Navigation in Confined Indoor Environment: Unleashing the Perceptron-Q Learning Fusion. Sensors, 25(20), 6384. https://doi.org/10.3390/s25206384