Abstract
Deep reinforcement learning (DRL), a vital branch of artificial intelligence, has shown great promise in mobile robot navigation within dynamic environments. However, existing studies mainly focus on simplified dynamic scenarios or the modeling of static environments, which results in trained models lacking sufficient generalization and adaptability when faced with real-world dynamic environments, particularly in handling complex task variations, dynamic obstacle interference, and multimodal data fusion. Addressing these gaps is essential for enhancing its real-time performance and versatility. Through a comparative analysis of classical DRL algorithms, this study highlights their advantages and limitations in handling real-time navigation tasks under dynamic environmental conditions. In particular, the paper systematically examines value-based, policy-based, and hybrid-based DRL methods, discussing their applicability to different navigation challenges. Additionally, by reviewing recent studies from 2021 to 2024, it identifies key trends in DRL-based navigation, revealing a strong focus on indoor environments while outdoor navigation and multi-robot collaboration remain underexplored. The analysis also highlights challenges in real-world deployment, particularly in sim-to-real transfer and sensor fusion. Based on these findings, this paper outlines future directions to enhance real-time adaptability, multimodal perception, and collaborative learning frameworks, providing theoretical and technical insights for advancing DRL in dynamic environments.
1. Introduction
Since the mid-20th century, robotics has undergone rapid development, evolving from simple industrial automation devices to complex intelligent autonomous robots. Initially, robots were primarily employed in industrial production to perform highly repetitive mechanized tasks, aiming to enhance production efficiency and reduce manual labor [1]. However, with significant advancements in computational power, particularly the development of integrated circuits, and the gradual refinement of sensor technologies, robots have moved beyond preprogrammed operations and basic mechanical actions to acquire capabilities such as environmental perception, autonomous decision-making, and adaptability [2]. Today, the application of robotics has extended from traditional industrial production to various fields, including healthcare, logistics, services, and agriculture, making it an indispensable technological support across multiple industries [3].
In autonomous robot navigation, path planning and simultaneous localization and mapping (SLAM) are two core challenges [4]. Path planning involves providing robots with the optimal route from a starting point to a target destination [5], considering factors such as obstacle avoidance, safety, path length, and time efficiency. Traditional path planning algorithms, such as the A* algorithm, Dijkstra’s algorithm, and rapidly exploring random tree (RRT), perform well in static and well-known environments, systematically searching for globally optimal solutions [6]. Despite this, in dynamic, complex, or unknown environments, the limitations of these methods become increasingly apparent. Obstacles and target positions in dynamic environments frequently change, requiring traditional algorithms to replan paths repeatedly, which not only increases computational overhead but also compromises real-time performance [7,8,9,10,11]. Furthermore, these methods lack sufficient adaptability to address environmental uncertainties and dynamic changes, making them inadequate for meeting the demands of complex scenarios [12].
DRL, as an emerging technology combining deep learning and reinforcement learning, offers a novel approach to robot navigation. DRL leverages deep neural networks to extract meaningful features from raw sensor data and generates optimized decision strategies by autonomously learning the dynamic characteristics of complex environments [13]. Compared to traditional methods, DRL not only adapts quickly to environmental changes but also enables real-time planning in large-scale dynamic environments, significantly improving the efficiency and accuracy of path planning. In recent years, DRL has been widely applied in areas such as unmanned vehicles, industrial automation, and robot navigation in dynamic scenarios, becoming a hotspot in current research [14].
Despite the immense potential of DRL applications in dynamic environments, current research still faces significant challenges. Many models are trained in relatively simple environments, lacking comprehensive simulation of dynamic and complex scenarios, which limits the generalization capability of DRL models. This limitation becomes particularly pronounced when addressing complex scenarios involving dynamic obstacles, random target changes, and environmental disturbances [15]. When robots navigate in new environments, strategies trained only in static environments struggle to effectively handle complex dynamic changes, potentially leading to decreased success rates in path planning or even erroneous decisions. To address this issue, researchers have recently proposed training DRL models in dynamic environments, incorporating more complex and diverse dynamic scenarios to enable robots to learn how to adapt to environmental changes and adjust strategies rapidly [16]. This training approach significantly enhances the robustness and adaptability of DRL, laying a solid foundation for efficient robot navigation in practical applications. This special relationship is shown in Figure 1.
Figure 1.
Proposed navigation system overview [17].
To comprehensively understand the development status of DRL technology in dynamic environments, this study included a systematic review of recent research results based on the Web of Science, Google Scholar, IEEE Xplore, and arXiv database. The review covers relevant literature published between 2021 and 2024, with keywords including “dynamic environment”, “deep reinforcement learning”, and “mobile robot navigation”. Through extensive literature screening and analysis, this study summarizes the latest advancements in DRL technology for path planning in dynamic environments, as shown in Figure 2.
Figure 2.
Selection of study, search query, and inclusion criteria.
These studies span various aspects, from algorithm design to practical applications, revealing the core trends in current technological development and their potential applications in dynamically complex scenarios [18]. Additionally, this study systematically outlines the main challenges faced by DRL technology in addressing complex dynamic scenarios, such as real-time performance [19], generalization capability, and robustness to environmental changes. Based on this analysis, the study not only discusses the current state of DRL applications in dynamic environments but also delves into future research directions, aiming to provide valuable theoretical support and technical guidance for researchers in this field [20].
In recent years, DRL has attracted significant attention in the field of mobile robot navigation, leading to numerous systematic review studies. To provide a comprehensive understanding of the current research landscape, recent review articles on the applications, challenges, and development trends of DRL in mobile robot navigation have been examined. The key findings are summarized in Table 1.
Table 1.
Recent review of deep reinforcement learning in mobile robot navigation.
This paper systematically reviews the applications of DRL in mobile robot navigation within dynamic environments, with a particular focus on key technological developments in environmental adaptability, multimodal perception fusion, and task scene diversity. Unlike previous studies that primarily focus on algorithmic improvements, this paper emphasizes the integration of DRL with multimodal perception and real-time planning, aiming to bridge the gap between theoretical advancements and practical implementations. Through a comparative analysis of value-based, policy-based, and hybrid DRL methods, this study highlights their advantages and limitations in handling real-time navigation tasks under dynamic environmental conditions. Additionally, the paper identifies critical challenges, including the generalization capability of DRL models in unseen environments, policy robustness in the face of dynamic obstacle interference, and the efficiency of sim-to-real transfer for practical deployment. Based on these findings, future research directions are proposed, emphasizing improvements in real-time adaptability, scalable multimodal data fusion techniques, and the development of collaborative DRL frameworks for multi-robot navigation. These insights provide a theoretical foundation and technical reference for enhancing DRL’s effectiveness in complex and dynamic real-world scenarios. For example, in dynamic environments, robots often encounter complex scenario changes, including moving obstacles, uncertainties in target positions, and the need for efficient fusion of multimodal perception data [27]. To address these issues, this study summarizes recent optimization solutions and proposes future improvement directions. In particular, this study analyzes the innovations in cutting-edge research and discusses their impact on practical applications, especially in enhancing adaptability in dynamic environments, improving multimodal perception fusion capabilities, and achieving the transfer from simulation to real-world applications. Furthermore, the study highlights the potential of DRL technology in improving the robustness and safety of algorithms, pointing out that safe and efficient navigation in complex dynamic environments will become a crucial objective of future research [28]. By thoroughly examining these issues, this paper aims to provide theoretical foundations and technical references for the field of robot navigation and to open new avenues for future research.
2. Theories and Comparisons of Deep Reinforcement Learning Methods
DRL, as a significant branch of artificial intelligence, has emerged as a key technology for solving tasks such as navigation and control in dynamic environments due to its ability to learn in high-dimensional state spaces and its strong adaptability to complex decision-making problems [13]. By combining the decision-making optimization mechanism of reinforcement learning with the nonlinear feature representation capability of deep neural networks, DRL has successfully overcome the scalability bottlenecks of traditional reinforcement learning methods in complex environments [18]. In recent years, researchers have developed various DRL approaches tailored to the requirements of different task scenarios, including value-based methods, policy-based methods, and hybrid methods. These approaches have made significant progress in addressing issues such as overestimation bias, computational complexity, and policy robustness [16,29], laying a theoretical and practical foundation for further expanding the application scope of DRL technologies.
This paper systematically summarizes the development and applications of DRL methods from two dimensions: theoretical framework and technical comparisons. First, it introduces the core theories of DRL, including its modeling approach based on Markov decision processes (MDPs) [30] and policy optimization mechanisms, and outlines the main ideas and characteristics of value-based, policy-based, and hybrid methods. Subsequently, through a comparative analysis of classical DRL algorithms, this paper summarizes the differences in their performance in complex environments and their applicable scenarios, further exploring their advantages and limitations in practical applications.
2.1. Theoretical Framework of Deep Reinforcement Learning
Reinforcement learning (RL) is a machine learning paradigm that aims to enable an agent to learn an optimal policy for maximizing long-term cumulative rewards through interactions with its environment [18]. In the traditional RL framework, problems are modeled using MDPs, which consist of states (S), actions (A), rewards (R), and state transition probabilities (P) [30]. The agent selects actions based on a policy (π) and optimizes decision-making behavior according to the reward function. Nonetheless, traditional RL methods often struggle to scale when confronted with high-dimensional state spaces and complex dynamic environments [31].
DRL combines deep learning with reinforcement learning by leveraging deep neural networks (DNNs) to approximate the state-action space, effectively addressing the computational bottlenecks of high-dimensional state spaces [32]. DRL methods have demonstrated robust performance across various domains, such as robotic control, game agents, and autonomous driving, becoming an essential tool for solving complex decision-making problems.
The development of DRL can be broadly categorized into three directions: value-based methods, policy-based methods, and hybrid methods. Value-based methods indirectly derive the optimal policy by learning the state-action value function [33]; policy-based methods directly optimize the policy function and are particularly suited for continuous action spaces [29]; and hybrid methods combine the strengths of both approaches to more efficiently handle complex tasks [34]. These methods not only reflect the technical advancements of DRL but also provide flexible solutions for addressing various types of tasks. Value-based methods, such as the deep Q-network (DQN) [13], have demonstrated outstanding performance in gaming tasks, particularly in Atari games. Hybrid methods, such as the deep deterministic policy gradient (DDPG) [16] and twin delayed deep deterministic policy gradient (TD3) [35], have shown exceptional effectiveness in robotic continuous control tasks, as they are capable of learning precise control strategies for high-dimensional continuous action spaces in a stable manner.
In recent years, research in DRL has gradually shifted from theoretical modeling to practical applications. Classical algorithms such as DQN, proximal policy optimization (PPO), and soft actor-critic (SAC) have not only undergone continuous optimization but also addressed critical challenges such as overestimation bias, computational complexity, and exploration efficiency [29,36,37]. The innovations in these methods have established a foundation for the development of DRL in multiple fields and have led the way for future research.
2.2. Comparison of DRL with Classical and Other Learning-Based Navigation Methods
Before conducting an in-depth analysis of different DRL methods, it is essential to first compare DRL with other commonly used navigation approaches to provide a more comprehensive perspective. Classical control methods, including Dijkstra’s algorithm, A*, RRT, and the dynamic window approach (DWA), have long been employed for robot navigation due to their deterministic properties and well-defined mathematical foundations. These methods excel in structured and static environments, offering precise path planning with minimal computational requirements. However, their performance deteriorates in highly dynamic scenarios, as they require frequent re-planning and struggle with real-time adaptability [38]. In contrast, DRL enables robots to learn adaptive policies through continuous interaction with the environment, allowing real-time decision-making and improved generalization in non-stationary settings [39]. Nevertheless, DRL’s training complexity and reliance on extensive computational resources, as well as the necessity of well-designed reward functions, pose significant challenges [40]. To provide a more comprehensive comparison of different navigation approaches, Table 2 summarizes the key characteristics, advantages, disadvantages, and applicable scenarios of classical control methods, DRL, hybrid learning approaches, and multi-agent reinforcement learning (MARL).
Table 2.
Comparison of robot navigation strategies.
Beyond classical control, hybrid learning-based approaches such as Neuro-SLAM and imitation learning have been explored to enhance navigation efficiency. Neuro-SLAM integrates deep learning with SLAM, leveraging neural networks to refine perception and improve localization accuracy [48]. While it enhances mapping capabilities, its reliance on pre-trained models may limit adaptability to previously unseen environments [49]. Imitation learning, on the other hand, enables robots to mimic expert demonstrations, facilitating faster learning and reducing the need for explicit reward engineering in reinforcement learning [50]. However, its dependency on high-quality expert data makes it difficult to generalize beyond pre-collected scenarios, contrasting with DRL’s ability to autonomously optimize policies through trial-and-error learning [51].
MARL extends DRL’s capabilities to collaborative multi-robot navigation, where multiple agents interact within the same environment to achieve coordinated objectives. Unlike single-agent DRL, MARL allows robots to exchange information, adapt to dynamic teammates or adversaries, and develop cooperative strategies [52]. While MARL significantly enhances multi-robot efficiency in applications such as swarm navigation and task allocation, it also introduces additional challenges, including communication constraints, increased computational complexity, and the difficulty of achieving policy convergence in large-scale systems.
Comparing these approaches highlights the trade-offs between different navigation strategies. Classical control methods provide strong theoretical guarantees but suffer from poor adaptability in dynamic settings, whereas DRL enhances decision-making flexibility at the cost of increased training complexity. Hybrid learning methods bridge perception and control but often require domain-specific optimization, while MARL expands reinforcement learning to collaborative scenarios but faces scalability issues [53]. These comparisons illustrate the need for integrating multiple approaches to fully exploit their advantages and enhance robot navigation in complex and dynamic environments.
While DRL provides flexible and adaptive strategies for robot navigation, it is important to recognize that not all navigation problems require or benefit from DRL-based solutions. In practice, the suitability of DRL depends heavily on task complexity, environmental dynamics, computational constraints, and safety requirements. To provide additional context for the following algorithm-specific discussions, Table 3 summarizes typical navigation scenarios and assesses whether DRL is appropriate, based on representative task characteristics.
Table 3.
DRL suitability for common navigation problems.
Table 3.
DRL suitability for common navigation problems.
| Ref. | Problem Type | Typical Scenario | DRL Suitability | Rationale |
|---|---|---|---|---|
| [54] | Static path planning | Indoor warehouse with fixed layout | Not preferred | Traditional planners (A*, Dijkstra) are fast and optimal |
| [55] | Dynamic obstacle avoidance | Urban sidewalk with moving pedestrians | Suitable | DRL adapts to real-time interactions and uncertainties |
| [56] | Multi-agent coordination | Multi-robot logistics in shared space | Suitable | DRL enables decentralized adaptive policies |
| [57] | Deterministic rules and known map | Factory floor with preprogrammed signals | Not preferred | Rule-based FSMs are reliable, interpretable, and efficient |
| [58] | Sensor-constrained, low-latency tasks | Drone hovering in tight indoor space | Not preferred | DRL inference may exceed timing constraints; PID or MPC preferable |
| [59] | Unstructured outdoor exploration | Off-road navigation with unknown terrain | Suitable | DRL handles partial observability and long-term reasoning |
As summarized above, DRL demonstrates particular strengths in managing complex, uncertain, and dynamic scenarios. The next section presents a comparative analysis of representative DRL algorithms, focusing on their theoretical characteristics and performance trade-offs across such scenarios.
2.3. Comparison of Classical DRL Methods
DRL, as an intelligent technology combining the strengths of deep learning and reinforcement learning, has achieved remarkable success in addressing complex decision-making problems. However, different DRL algorithms emphasize distinct aspects in their design, each possessing unique advantages and corresponding limitations. A comparative analysis of classical DRL methods provides a comprehensive understanding of their applicable scenarios and constraints, offering guidance for algorithm selection in practical applications. Table 4 summarizes the main characteristics of several classical DRL algorithms, including value-based, policy-based, and hybrid methods.
Value-based methods learn optimal policies by estimating state-action value functions, characterized by high learning efficiency and stability. Despite this, they may encounter issues such as overestimation bias and scalability in complex environments. DQN, proposed by Mnih et al., is a pioneering method in this category. It combines deep neural networks with experience replay to address the scalability challenges of traditional Q-learning in high-dimensional state spaces. DQN also introduces target networks to enhance learning stability. Nevertheless, it is prone to overestimation bias in Q-value estimation and exhibits high sensitivity to hyperparameters [33].
To mitigate the overestimation bias of DQN, Hasselt et al. proposed the double deep Q-network (double DQN), which separates action selection from value estimation, significantly reducing bias and improving policy stability [60]. Despite its improved accuracy, double DQN increases computational complexity due to maintaining two Q-networks. Subsequently, Wang et al. introduced the dueling deep Q-network (dueling DQN), which optimizes Q-value estimation by decomposing it into state value and action advantage to enable more effective evaluation of critical states [60]. However, this improvement introduces additional network architecture complexity and sensitivity to hyperparameter settings.
Policy-based methods directly optimize the policy function, making them particularly suitable for tasks involving high-dimensional and continuous action spaces. Trust region policy optimization (TRPO), proposed by Schulman et al., is a significant advancement in this domain. TRPO ensures policy improvement stability and monotonicity by introducing a Kullback–Leibler (KL) divergence constraint, performing well in high-dimensional action spaces [37,61]. Nonetheless, its high computational complexity limits its applicability in real-time tasks. To simplify the policy optimization process, Schulman et al. proposed PPO, which employs a clipping mechanism to reduce computational costs while achieving a good balance between policy stability and learning efficiency [29]. This makes PPO highly suitable for real-time tasks, although its performance may degrade in highly complex environments and is sensitive to hyperparameters such as the clipping threshold.
Hybrid methods combine the advantages of value-based and policy-based approaches to better address challenges in complex tasks. Asynchronous advantage actor-critic (A3C), proposed by Mnih et al., improves training efficiency through asynchronous updates, eliminating the need for experience replay memory [33]. However, A3C exhibits relatively low sample efficiency and demands significant computational resources. DDPG, proposed by Lillicrap et al., represents another milestone for hybrid methods, excelling in continuous action spaces by leveraging the actor-critic architecture to integrate policy optimization and Q-value estimation [17]. Despite this, DDPG is highly sensitive to hyperparameters and prone to policy overfitting issues.
To further address overestimation bias, Fujimoto et al. proposed TD3 [35]. TD3 employs techniques such as dual Q-networks, delayed policy updates, and target policy smoothing, significantly enhancing training stability and accuracy. Similarly, Haarnoja et al. introduced SAC, which enhances policy exploration through a maximum entropy framework and reduces bias using dual Q-networks. However, both SAC and TD3 increase computational complexity due to the need to train multiple networks and are highly sensitive to hyperparameter adjustments [62].
Table 4.
Advantages and disadvantages of classic DRL technology.
Table 4.
Advantages and disadvantages of classic DRL technology.
| Category | Ref. | Method | Year | Computation Cost | Sample Efficiency | Robustness to Environmental Changes | Advantages | Disadvantages |
|---|---|---|---|---|---|---|---|---|
| Value-Based DRL Methods | [33,58,63] | Deep Q-Network (DQN) | 2013 | Medium | Low | Low | Addresses the scalability issue of Q-learning by using deep neural networks for function approximation. Stabilizes learning through techniques like experience replay and a target network. | Prone to overestimation bias in Q-value estimation. High computational complexity due to the use of neural networks. Sensitive to hyperparameter settings, such as learning rate and replay buffer size. |
| [57,60,64] | Double Deep Q-Network (Double DQN) | 2015 | Medium | Medium | Medium | Reduces overestimation bias by decoupling action selection and value estimation. Improves stability and accuracy in policy learning compared to DQN. | Increases computational complexity due to maintaining two Q-networks. Sensitive to hyperparameter tuning, especially learning rates and update frequencies. | |
| [61,65] | Dueling Deep Q-Network (Dueling DQN) | 2016 | Medium | Medium | Medium | Separates state value and action advantage, enabling more precise Q-value estimation. Enhances learning efficiency by focusing on state evaluation in scenarios with minimal action-value differences. | Increased computational complexity due to the additional network architecture. Sensitive to hyperparameter tuning and network design choices. | |
| Policy-Based DRL Methods | [8,37] | Trust Region Policy Optimization (TRPO) | 2015 | High | Medium | High | Ensures stable and monotonic policy improvement through trust region constraints. Effective for high-dimensional or continuous action spaces. | Computationally expensive due to solving constrained optimization problems. Complex implementation compared to simpler policy gradient methods. |
| [37,66] | Proximal Policy Optimization (PPO) | 2017 | Medium | High | Medium | Simplifies trust region optimization with a clipping mechanism, improving computational efficiency. Balances policy stability and learning efficiency, making it suitable for real-time tasks. | Sensitive to hyperparameter tuning, particularly the clipping threshold. May still face performance degradation in highly complex environments. | |
| Hybrid-Based DRL Methods | [67,68] | Asynchronous Advantage Actor-Critic (A3C) | 2016 | Low | Medium | Medium | Improves training efficiency through asynchronous updates from multiple parallel agents. Eliminates the need for experience replay, reducing memory requirements. | Lower sample efficiency compared to methods using experience replay. Requires significant computational resources for parallel processing. |
| [16,69] | Deep Deterministic Policy Gradient (DDPG) | 2015 | Medium | Medium | Low | Handles high-dimensional continuous action spaces effectively. Combines the strengths of policy gradient and value-based methods using an actor-critic framework. | Prone to overfitting and instability due to deterministic policies. Requires extensive hyperparameter tuning and is sensitive to noise settings. | |
| [36,70] | Soft Actor-Critic (SAC) | 2018 | High | Medium | High | Encourages exploration with the maximum entropy framework, improving robustness. Reduces overestimation bias using dual Q-networks, enhancing stability. | Computationally intensive due to simultaneous training of multiple networks. Performance is highly sensitive to the entropy weighting coefficient. | |
| [35,71] | Twin Delayed Deep Deterministic Policy Gradient (TD3) | 2018 | Medium | Medium | High | Reduces overestimation bias with twin Q-networks for more accurate value estimation. Improves stability with delayed policy updates and target policy smoothing. | Computationally expensive due to training multiple networks. Sensitive to hyperparameter tuning, such as update delays and noise settings. |
Note: For visual representations of the algorithms summarized in this table, please refer to Appendix A.
As shown in Table 4, different DRL methods exhibit unique characteristics in terms of computational complexity, learning stability, and policy optimization efficiency. To further enhance the understanding of representative reinforcement learning algorithms, this study presents a systematic performance analysis of value-based, policy-based, and hybrid methods in robot path navigation tasks under dynamic environments. By incorporating empirical results from recent studies, key distinctions in path efficiency, convergence rate, and robustness are examined in greater detail.
Value-based methods, which rely on value function estimation for policy evaluation, have demonstrated varying degrees of effectiveness in navigation tasks. For instance, Arce et al. reported that DQN achieved an average success rate of approximately 86% in indoor mobile robot experiments, with an average path length of 21.3 m [72]. However, policy stability was limited and the model required around 20,000 training steps to converge. Despite their structural simplicity and ease of implementation, value-based methods are constrained by their reliance on discrete action space modeling, making them less suitable for continuous control tasks. To address overestimation bias, double DQN introduces decoupled action selection and evaluation mechanisms, thereby improving convergence stability. Meanwhile, dueling DQN incorporates advantage functions to better capture the importance of actions in complex states. Nevertheless, both methods remain confined to discrete control domains and are best suited for virtual simulations or environments with clearly defined states and transitions [60].
In contrast, policy-based methods, such as PPO, exhibit stronger generalization and adaptability to continuous control scenarios. In a recent study by Reda, PPO achieved a 90% average success rate in dynamic path planning tasks, with the average path length reduced to 18.5 m and convergence achieved within 15,000 steps [73]. These methods offer more flexible learning in high-dimensional action spaces. TRPO, although it enhances policy stability via KL divergence constraints, incurs high computational overhead and lacks real-time applicability in physical robotic platforms. Asynchronous advantage A3C accelerates convergence through parallel multi-threaded training, but its dependence on synchronous environment updates may introduce instability in real-world deployments, making it more suitable for simulation settings [67]. PPO, in particular, strikes a practical balance between training efficiency and policy robustness and has been widely applied in tasks such as path planning and multi-goal obstacle avoidance.
Hybrid methods, which integrate the strengths of both value-based and policy-based approaches, have become a key research direction in DRL for robotics. DDPG, an actor-critic framework, demonstrated a 95% success rate and the lowest path planning error in Cai et al.’s AGV navigation experiments, converging in under 10,000 training steps [74]. SAC, by introducing a maximum entropy objective, enhances policy diversity and exploration, making it particularly effective in highly uncertain environments such as human-crowd navigation. TD3 further improves control stability and convergence reliability through delayed policy updates and twin Q-network structures, and has shown strong performance in multi-goal navigation and robust trajectory tracking tasks [35,36]. Despite these advancements, hyperparameter sensitivity and high training complexity remain key challenges. Future studies should prioritize improvements in algorithmic robustness, computational efficiency, and transferability to better meet the complex requirements of real-world robotic applications.
5. Conclusions
This paper systematically explores the theoretical foundations, classical algorithms, and key technologies of DRL in dynamic environment navigation. Through a comparative analysis of classical algorithms such as DQN, PPO, and SAC, the study reveals the strengths and limitations of different methods in terms of adaptability, robustness, and computational efficiency. Furthermore, the paper provides an in-depth analysis of the technical advancements of DRL in addressing core challenges of dynamic environments, including adaptability, multimodal perception data fusion, and multi-robot collaboration.
The findings indicate that DRL, by combining the feature extraction capabilities of deep learning with the decision-making optimization of reinforcement learning, offers efficient and flexible solutions for navigation systems in dynamic environments. Nonetheless, existing technologies still face significant challenges in handling high-dimensional dynamic environments, achieving sim-to-real transfer, and ensuring system safety and interpretability.
Looking ahead, with the advancement of technology and increasing demands for autonomous systems, the potential of DRL in dynamic navigation will continue to expand. To support reliable deployment, future research should focus on breakthroughs in multimodal data fusion, collaborative learning frameworks, sim-to-real transfer, and algorithm safety and interpretability. These efforts will not only drive theoretical progress, but also lay the foundation for DRL’s practical application in intelligent robotics, industrial automation, and smart transportation.
Author Contributions
All authors contributed to the study conception and design. Y.Z. wrote the paper. W.Z.W.H., H.R.H.R., N.M.H.N. and M.S.M.K. were the supervisors of the paper, and Y.Y. oversaw the translation. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
No new data were generated or analyzed in this study. Data sharing is not applicable to this article.
Acknowledgments
The authors wish to express sincere gratitude to the Universiti of Putra Malaysia for its generous support and provision of resources throughout this research. Deep appreciation is extended to the professors and academic mentors whose valuable guidance, constructive feedback, and encouragement have been instrumental in the successful completion of this work. The authors also acknowledge the assistance and collaboration of fellow students and staff during various stages of the study.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript.
| DRL | Deep reinforcement learning |
| SLAM | Simultaneous localization and mapping |
| RRT | Rapidly exploring random tree |
| MDPs | Markov decision processes |
| RL | Reinforcement learning |
| DNNs | Deep neural networks |
| DQN | Deep Q-network |
| DDPG | Deep deterministic policy gradient |
| TD3 | Twin delayed deep deterministic policy gradient |
| PPO | Proximal policy optimization |
| SAC | Soft actor-critic |
| DWA | Dynamic window approach |
| MARL | Multi-agent reinforcement learning |
| Double DQN | Double deep Q-network |
| Dueling DQN | Dueling deep Q-network |
| TRPO | Trust region policy optimization |
| A3C | Asynchronous advantage actor-critic |
| KL divergence | Kullback–Leibler divergence |
| A* | A-Star |
Appendix A
This appendix presents structural frameworks and flowcharts of several representative DRL-based navigation methods discussed in the review. These visualizations are provided to help readers better understand the architectural design and information flow of the respective approaches.
Table A1.
Summary of structural frameworks and flowcharts of selected DRL-based navigation methods.
Table A1.
Summary of structural frameworks and flowcharts of selected DRL-based navigation methods.
| Ref. | Method | Framework/Flowchart |
|---|---|---|
| [33,58] | Deep Q-Network (DQN) | ![]() |
| [57,60] | Double Deep Q-Network (Double DQN) | ![]() |
| [61,65] | Dueling Deep Q-Network (Dueling DQN) | ![]() |
| [8,37] | Trust Region Policy Optimization (TRPO) | ![]() |
| [37,66] | Proximal Policy Optimization (PPO) | ![]() |
| [67,68] | Asynchronous Advantage Actor-Critic (A3C) | ![]() |
| [16,69] | Deep Deterministic Policy Gradient (DDPG) | ![]() |
| [36,70] | Soft Actor-Critic (SAC) | ![]() |
| [35,71] | Twin Delayed Deep Deterministic Policy Gradient (TD3) | ![]() |
| [79] | DWA-RL | ![]() |
| [89] | LND3QN | ![]() |
| [91] | A2C | ![]() |
| [82] | A New Reward Function | ![]() |
| [83] | GA3C | ![]() |
| [92] | DQN and DDQN | ![]() |
| [10] | DDPG and DQN | ![]() |
| [93] | DS-DSAC | ![]() |
| [94] | GD-RL | ![]() |
| [95] | DQN + PTZ | ![]() |
| [87] | Context-Aware DRL Policy | ![]() |
| [96] | β-Decay TL | ![]() |
| [86] | DDPG + PER | ![]() |
| [97] | Improved TD3 | ![]() |
| [80] | HGAT-DRL | ![]() |
| [98] | TERP | ![]() |
| [99] | re-DQN | ![]() |
| [89] | TCAMD | ![]() |
| [100] | RL-Based MIPP | ![]() |
| [79] | CAM-RL | ![]() |
| [92] | DODPG | ![]() |
References
- Prasuna, R.G.; Potturu, S.R. Deep Reinforcement Learning in Mobile Robotics—A Concise Review. Multimed. Tools Appl. 2024, 83, 70815–70836. [Google Scholar] [CrossRef]
- Siciliano, B.; Khatib, O. Robotics and the Handbook. In Springer Handbook of Robotics; Siciliano, B., Khatib, O., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 1–6. ISBN 978-3-319-32552-1. [Google Scholar]
- Chen, L.; Jiang, Z.; Cheng, L.; Knoll, A.C.; Zhou, M. Deep Reinforcement Learning Based Trajectory Planning Under Uncertain Constraints. Front. Neurorobot. 2022, 16, 883562. [Google Scholar] [CrossRef]
- Nasti, S.M.; Chishti, M.A. A Review of AI-Enhanced Navigation Strategies for Mobile Robots in Dynamic Environments. In Proceedings of the 2024 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems (ICETSIS), Manama, Bahrain, 28–29 January 2024; pp. 1239–1244. [Google Scholar]
- Duan, C.; Junginger, S.; Huang, J.; Jin, K.; Thurow, K. Deep Learning for Visual SLAM in Transportation Robotics: A Review. Transp. Saf. Environ. 2019, 1, 177–184. [Google Scholar] [CrossRef]
- LaValle, S.M. Planning Algorithms; Cambridge University Press: Cambridge, UK, 2006; ISBN 978-1-139-45517-6. [Google Scholar]
- Bhagat, S.; Banerjee, H.; Tse, Z.T.H.; Ren, H. Deep Reinforcement Learning for Soft, Flexible Robots: Brief Review with Impending Challenges. Robotics 2019, 8, 4. [Google Scholar] [CrossRef]
- Sun, H.; Zhang, W.; Yu, R.; Zhang, Y. Motion Planning for Mobile Robots—Focusing on Deep Reinforcement Learning: A Systematic Review. IEEE Access 2021, 9, 69061–69081. [Google Scholar] [CrossRef]
- Devo, A.; Mezzetti, G.; Costante, G.; Fravolini, M.L.; Valigi, P. Towards Generalization in Target-Driven Visual Navigation by Using Deep Reinforcement Learning. IEEE Trans. Robot. 2020, 36, 1546–1561. [Google Scholar] [CrossRef]
- Quiroga, F.; Hermosilla, G.; Farias, G.; Fabregas, E.; Montenegro, G. Position Control of a Mobile Robot through Deep Reinforcement Learning. Appl. Sci. 2022, 12, 7194. [Google Scholar] [CrossRef]
- Sangiovanni, B.; Incremona, G.P.; Piastra, M.; Ferrara, A. Self-Configuring Robot Path Planning With Obstacle Avoidance via Deep Reinforcement Learning. IEEE Control Syst. Lett. 2021, 5, 397–402. [Google Scholar] [CrossRef]
- Candra, A.; Budiman, M.A.; Hartanto, K. Dijkstra’s and A-Star in Finding the Shortest Path: A Tutorial. In Proceedings of the 2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA), Medan, Indonesia, 16–17 July 2020; pp. 28–32. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-Level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Kober, J.; Bagnell, J.A.; Peters, J. Reinforcement Learning in Robotics: A Survey. Int. J. Robot. Res. 2013, 32, 1238–1274. [Google Scholar] [CrossRef]
- Zhang, H.; Liu, L.Z.; Xie, H.; Jiang, Y.; Zhou, J.; Wang, Y. Deep Learning-Based Robot Vision: High-End Tools for Smart Manufacturing. IEEE Instrum. Meas. Mag. 2022, 25, 27–35. [Google Scholar] [CrossRef]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep Reinforcement Learning. arXiv 2019, arXiv:1509.02971. [Google Scholar]
- Miranda, V.R.F.; Neto, A.A.; Freitas, G.M.; Mozelli, L.A. Generalization in Deep Reinforcement Learning for Robotic Navigation by Reward Shaping. IEEE Trans. Ind. Electron. 2024, 71, 6013–6020. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning, 2nd ed.; An Introduction; MIT Press: Cambridge, MA, USA, 2018; ISBN 978-0-262-35270-3. [Google Scholar]
- Nguyen, H.; La, H. Review of Deep Reinforcement Learning for Robot Manipulation. In Proceedings of the 2019 Third IEEE International Conference on Robotic Computing (IRC), Naples, Italy, 25–27 February 2019; pp. 590–595. [Google Scholar]
- Garaffa, L.C.; Basso, M.; Konzen, A.A.; De Freitas, E.P. Reinforcement Learning for Mobile Robotics Exploration: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 3796–3810. [Google Scholar] [CrossRef]
- Sharma, G.; Jain, S.; Sharma, R.S. Path Planning for Fully Autonomous UAVs-a Taxonomic Review and Future Perspectives. IEEE Access 2025, 13, 13356–13379. [Google Scholar] [CrossRef]
- Le, H.; Saeedvand, S.; Hsu, C.-C. A Comprehensive Review of Mobile Robot Navigation Using Deep Reinforcement Learning Algorithms in Crowded Environments. J. Intell. Robot. Syst. 2024, 110, 158. [Google Scholar] [CrossRef]
- Pierson, H.A.; Gashler, M.S. Deep Learning in Robotics: A Review of Recent Research. Adv. Robot. 2017, 31, 821–835. [Google Scholar] [CrossRef]
- Li, C.; Wu, F.; Zhao, J. A Review of Deep Reinforcement Learning Exploration Methods: Prospects and Challenges for Application to Robot Attitude Control Tasks. In Cognitive Systems and Information Processing, Proceedings of the 7th International Conference, ICCSIP 2022, Fuzhou, China, 17–18 December 2022; Sun, F., Cangelosi, A., Zhang, J., Yu, Y., Liu, H., Fang, B., Eds.; Springer Nature: Singapore, 2023; pp. 247–273. [Google Scholar]
- Zhao, Y.; Zhang, Y.; Wang, S. A Review of Mobile Robot Path Planning Based on Deep Reinforcement Learning Algorithm. J. Phys. Conf. Ser. 2021, 2138, 012011. [Google Scholar] [CrossRef]
- Zhu, K.; Zhang, T. Deep Reinforcement Learning Based Mobile Robot Navigation: A Review. Tsinghua Sci. Technol. 2021, 26, 674–691. [Google Scholar] [CrossRef]
- Hu, Y.; Ye, D.; Kang, J.; Wu, M.; Yu, R. A Cloud-Edge Collaborative Architecture for Multimodal LLMs-Based Advanced Driver Assistance Systems in IoT Networks. IEEE Internet Things J. 2024, 12, 13208–13221. [Google Scholar] [CrossRef]
- Jiang, H.; Wang, H.; Yau, W.-Y.; Wan, K.-W. A Brief Survey: Deep Reinforcement Learning in Mobile Robot Navigation. In Proceedings of the 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA), Kristiansand, Norway, 9–13 November 2020; pp. 592–597. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Bellman, R. A Markovian Decision Process. J. Math. Mech. 1957, 6, 679–684. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing Atari with Deep Reinforcement Learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.P.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous Methods for Deep Reinforcement Learning. arXiv 2016, arXiv:1602.01783. [Google Scholar]
- Qian, T.; Wang, M. Optimal Ancillary Service Disaggregation for EV Charging Station Aggregators: A Hybrid On–Off Policy Reinforcement Learning Framework, 2024. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5100232 (accessed on 16 August 2024).
- Fujimoto, S.; van Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. arXiv 2018, arXiv:1802.09477. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv 2018, arXiv:1801.01290. [Google Scholar]
- Schulman, J.; Levine, S.; Moritz, P.; Jordan, M.I.; Abbeel, P. Trust Region Policy Optimization. arXiv 2017, arXiv:1502.05477. [Google Scholar]
- Nam, S.; Nguyen, T.A.; Choi, E.; Min, D. Multi-Head Fusion-Based Actor-Critic Deep Reinforcement Learning with Memory Contextualisation for End-to-End Autonomous Navigation. TechRxiv 2025. [Google Scholar] [CrossRef]
- Yang, G.; Guo, Y. Deep Reinforcement Learning Based Mobile Robot Navigation in Crowd Environments. In Proceedings of the 2024 21st International Conference on Ubiquitous Robots (UR), New York, NY, USA, 24–27 June 2024; pp. 513–519. [Google Scholar]
- Parooei, M.; Tale Masouleh, M.; Kalhor, A. MAP3F: A Decentralized Approach to Multi-Agent Pathfinding and Collision Avoidance with Scalable 1D, 2D, and 3D Feature Fusion. Intell. Serv. Robot. 2024, 17, 401–418. [Google Scholar] [CrossRef]
- Dijkstra, E.W. A Note on Two Problems in Connexion with Graphs. In Edsger Wybe Dijkstra: His Life, Work, and Legacy; Association for Computing Machinery: New York, NY, USA, 2022; Volume 45, pp. 287–290. ISBN 978-1-4503-9773-5. [Google Scholar]
- Hart, P.E.; Nilsson, N.J.; Raphael, B. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 100–107. [Google Scholar] [CrossRef]
- Fox, D.; Burgard, W.; Thrun, S. The Dynamic Window Approach to Collision Avoidance. IEEE Robot. Autom. Mag. 1997, 4, 23–33. [Google Scholar] [CrossRef]
- Gupta, S.; Tolani, V.; Davidson, J.; Levine, S.; Sukthankar, R.; Malik, J. Cognitive Mapping and Planning for Visual Navigation. arXiv 2019, arXiv:1702.03920v3. [Google Scholar]
- Cadena, C.; Carlone, L.; Carrillo, H.; Latif, Y.; Scaramuzza, D.; Neira, J.; Reid, I.; Leonard, J.J. Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age. IEEE Trans. Robot. 2016, 32, 1309–1332. [Google Scholar] [CrossRef]
- Chen, J.; Ma, R.; Oyekan, J. A Deep Multi-Agent Reinforcement Learning Framework for Autonomous Aerial Navigation to Grasping Points on Loads. Robot. Auton. Syst. 2023, 167, 104489. [Google Scholar] [CrossRef]
- Yu, C.; Yang, X.; Gao, J.; Yang, H.; Wang, Y.; Wu, Y. Learning Efficient Multi-Agent Cooperative Visual Exploration. In Computer Vision—ECCV 2022; Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., Eds.; Springer Nature: Cham, Switzerland, 2022; pp. 497–515. [Google Scholar]
- Sarfi, M.H. Autonomous Exploration and Mapping of Unknown Environments. Master’s Thesis, University of Calgary, Calgary, AB, Canada, 2024. [Google Scholar]
- Ahmed, S.; Azar, A.T.; Sardar, M.Z.; Haider, Z.; Kamal, N.A. Exploring Reinforcement Learning Techniques in the Realm of Mobile Robotics. IJAAC 2024, 18, 10062261. [Google Scholar] [CrossRef]
- Sharma, R. Optimizing Deep Reinforcement Learning for Real-World Robotics: Challenges and Solutions. Int. J. Artif. Intell. Comput. Sci. Manag. Technol. 2024, 1, 35–42. [Google Scholar]
- Tsiotras, P.; Gombolay, M.; Foerster, J. Editorial: Decision-Making and Planning for Multi-Agent Systems. Front. Robot. AI 2024, 11, 1422344. [Google Scholar] [CrossRef]
- Karwowski, J.; Szynkiewicz, W. Human-Aware Robot Trajectory Planning with Hybrid Candidate Generation: Leveraging a Pedestrian Motion Model for Diverse Trajectories. In Proceedings of the 2024 13th International Workshop on Robot Motion and Control (RoMoCo), Poznań, Poland, 2–4 July 2024; pp. 82–89. [Google Scholar]
- Vaidya, H.; Dhabliya, D.; Jweeg, M.; Almusawi, M.; Naser, Z.L.; Hashem, A.; Jawad, A.Q. An Empirical Analysis of Various Techniques of Solving Obstacles through Artificial Intelligence. In Proceedings of the 2024 4th International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 14–15 May 2024; pp. 1169–1174. [Google Scholar]
- Rong, S.; Meng, R.; Guo, J.; Cui, P.; Qiao, Z. Multi-Vehicle Collaborative Planning Technology under Automatic Driving. Sustainability 2024, 16, 4578. [Google Scholar] [CrossRef]
- Sorokin, M.; Tan, J.; Liu, C.K.; Ha, S. Learning to Navigate Sidewalks in Outdoor Environments. arXiv 2021, arXiv:2109.05603. [Google Scholar] [CrossRef]
- Li, P.; An, Z.; Abrar, S.; Zhou, L. Large Language Models for Multi-Robot Systems: A Survey. arXiv 2025, arXiv:2502.03814. [Google Scholar]
- Alsadie, D. A Comprehensive Review of AI Techniques for Resource Management in Fog Computing: Trends, Challenges, and Future Directions. IEEE Access 2024, 12, 118007–118059. [Google Scholar] [CrossRef]
- Fan, T.; Long, P.; Liu, W.; Pan, J. Distributed Multi-Robot Collision Avoidance via Deep Reinforcement Learning for Navigation in Complex Scenarios. Int. J. Robot. Res. 2020, 39, 856–892. [Google Scholar] [CrossRef]
- You, K.; Zhou, C.; Ding, L. Deep Learning Technology for Construction Machinery and Robotics. Autom. Constr. 2023, 150, 104852. [Google Scholar] [CrossRef]
- Hasselt, H.v.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. Proc. AAAI Conf. Artif. Intell. 2016, 30, 2094–2100. [Google Scholar] [CrossRef]
- Wang, Y.; Fang, Y.; Lou, P.; Yan, J.; Liu, N. Deep Reinforcement Learning Based Path Planning for Mobile Robot in Unknown Environment. J. Phys. Conf. Ser. 2020, 1576, 012009. [Google Scholar] [CrossRef]
- Haarnoja, T.; Ha, S.; Zhou, A.; Tan, J.; Tucker, G.; Levine, S. Learning to Walk via Deep Reinforcement Learning. arXiv 2019, arXiv:1812.11103v3. [Google Scholar]
- Deep Reinforcement Learning for Scheduling in an Edge Computing-Based Industrial Internet of Things. Available online: https://www.researchgate.net/publication/355377739_Deep_Reinforcement_Learning_for_Scheduling_in_an_Edge_Computing-Based_Industrial_Internet_of_Things (accessed on 26 November 2024).
- Hu, M.; Zhang, J.; Matkovic, L.; Liu, T.; Yang, X. Reinforcement Learning in Medical Image Analysis: Concepts, Applications, Challenges, and Future Directions. J. Appl. Clin. Med. Phys. 2023, 24, e13898. [Google Scholar] [CrossRef]
- Wang, Y.; Li, X.; Wan, P.; Chang, L.; Deng, X. Dueling Deep Q-Networks for Social Awareness-Aided Spectrum Sharing. Complex Intell. Syst. 2022, 8, 1975–1986. [Google Scholar] [CrossRef]
- Wang, L.; Feng, X.; Zhang, R.; Hou, Z.; Wang, G.; Zhang, H. Energy Management of Integrated Energy System in the Park under Multiple Time Scales. AIMS Energy 2024, 12, 639–663. [Google Scholar] [CrossRef]
- Wang, Z.; Schaul, T.; Hessel, M.; Hasselt, H.V.; Lanctot, M.; de Freitas, N. Dueling Network Architectures for Deep Reinforcement Learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML), New York, NY, USA, 19–24 June 2016; Volume 48, pp. 1995–2003. Available online: http://proceedings.mlr.press/v48/wangf16.html (accessed on 10 August 2024).
- Nakabi, T.A.; Toivanen, P. Deep Reinforcement Learning for Energy Management in a Microgrid with Flexible Demand. Sustain. Energy Grids Netw. 2021, 25, 100413. [Google Scholar] [CrossRef]
- Dong, R.; Du, J.; Liu, Y.; Heidari, A.A.; Chen, H. An Enhanced Deep Deterministic Policy Gradient Algorithm for Intelligent Control of Robotic Arms. Front. Neuroinform. 2023, 17, 1096053. [Google Scholar] [CrossRef] [PubMed]
- Shi, J.; Du, J.; Wang, J.; Wang, J.; Yuan, J. Priority-Aware Task Offloading in Vehicular Fog Computing Based on Deep Reinforcement Learning. IEEE Trans. Veh. Technol. 2020, 69, 16067–16081. [Google Scholar] [CrossRef]
- Liu, S.; Yang, Z.; Zhang, Z.; Jiang, R.; Ren, T.; Jiang, Y.; Chen, S.; Zhang, X. Application of Deep Reinforcement Learning in Reconfiguration Control of Aircraft Anti-Skid Braking System. Aerospace 2022, 9, 555. [Google Scholar] [CrossRef]
- Arce, D.; Solano, J.; Beltrán, C. A Comparison Study between Traditional and Deep-Reinforcement-Learning-Based Algorithms for Indoor Autonomous Navigation in Dynamic Scenarios. Sensors 2023, 23, 9672. [Google Scholar] [CrossRef]
- Reda, D. Physics-Based Character Controllers with Reinforcement Learning. Ph.D. Thesis, University of British Columbia, Vancouver, BC, Canada, 2025. [Google Scholar]
- Cai, B.; Wei, C.; Ji, Z. Deep Reinforcement Learning with Multiple Unrelated Rewards for AGV Mapless Navigation. IEEE Trans. Autom. Sci. Eng. 2025, 22, 4323–4340. [Google Scholar] [CrossRef]
- Zhu, Z.; Lin, K.; Jain, A.K.; Zhou, J. Transfer Learning in Deep Reinforcement Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 13344–13362. [Google Scholar] [CrossRef]
- Tai, L.; Paolo, G.; Liu, M. Virtual-to-Real Deep Reinforcement Learning: Continuous Control of Mobile Robots for Mapless Navigation. arXiv 2017, arXiv:1703.00420. [Google Scholar]
- Kuwata, Y.; Teo, J.; Fiore, G.; Karaman, S.; Frazzoli, E.; How, J.P. Real-Time Motion Planning With Applications to Autonomous Urban Driving. IEEE Trans. Control Syst. Technol. 2009, 17, 1105–1118. [Google Scholar] [CrossRef]
- Cao, X.; Sun, C.; Yan, M. Target Search Control of AUV in Underwater Environment With Deep Reinforcement Learning. IEEE Access 2019, 7, 96549–96559. [Google Scholar] [CrossRef]
- Patel, U.; Kumar, N.K.S.; Sathyamoorthy, A.J.; Manocha, D. DWA-RL: Dynamically Feasible Deep Reinforcement Learning Policy for Robot Navigation among Mobile Obstacles. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 6057–6063. [Google Scholar]
- Zhou, Z.; Zeng, Z.; Lang, L.; Yao, W.; Lu, H.; Zheng, Z.; Zhou, Z. Navigating Robots in Dynamic Environment With Deep Reinforcement Learning. IEEE Trans. Intell. Transp. Syst. 2022, 23, 25201–25211. [Google Scholar] [CrossRef]
- Feng, S.; Sebastian, B.; Ben-Tzvi, P. A Collision Avoidance Method Based on Deep Reinforcement Learning. Robotics 2021, 10, 73. [Google Scholar] [CrossRef]
- Wenzel, P.; Schön, T.; Leal-Taixé, L.; Cremers, D. Vision-Based Mobile Robotics Obstacle Avoidance With Deep Reinforcement Learning. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 14360–14366. [Google Scholar]
- Beomsoo, H.; Ravankar, A.A.; Emaru, T. Mobile Robot Navigation Based on Deep Reinforcement Learning with 2D-LiDAR Sensor Using Stochastic Approach. In Proceedings of the 2021 IEEE International Conference on Intelligence and Safety for Robotics (ISR), Nagoya, Japan, 4–6 March 2021; pp. 417–422. [Google Scholar]
- Kaymak, Ç.; Uçar, A.; Güzeliş, C. Development of a New Robust Stable Walking Algorithm for a Humanoid Robot Using Deep Reinforcement Learning with Multi-Sensor Data Fusion. Electronics 2023, 12, 568. [Google Scholar] [CrossRef]
- An, G.; Zhang, S. Pruning Replay Buffer for Efficient Training of Deep Reinforcement Learning. J. Emerg. Investig. 2023. [Google Scholar] [CrossRef]
- Chai, R.; Niu, H.; Carrasco, J.; Arvin, F.; Yin, H.; Lennox, B. Design and Experimental Validation of Deep Reinforcement Learning-Based Fast Trajectory Planning and Control for Mobile Robot in Unknown Environment. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 5778–5792. [Google Scholar] [CrossRef]
- Liang, J.; Wang, Z.; Cao, Y.; Chiun, J.; Zhang, M.; Sartoretti, G.A. Context-Aware Deep Reinforcement Learning for Autonomous Robotic Navigation in Unknown Area. In Proceedings of the 7th Conference on Robot Learning, Atlanta, GA, USA, 6–9 November 2023; pp. 1425–1436. [Google Scholar]
- Samsani, S.S.; Mutahira, H.; Muhammad, M.S. Memory-Based Crowd-Aware Robot Navigation Using Deep Reinforcement Learning. Complex Intell. Syst. 2023, 9, 2147–2158. [Google Scholar] [CrossRef]
- Xu, T.; Meng, Z.; Lu, W.; Tong, Z. End-to-End Autonomous Driving Decision Method Based on Improved TD3 Algorithm in Complex Scenarios. Sensors 2024, 24, 4962. [Google Scholar] [CrossRef]
- Montero, E.E.; Mutahira, H.; Pico, N.; Muhammad, M.S. Dynamic Warning Zone and a Short-Distance Goal for Autonomous Robot Navigation Using Deep Reinforcement Learning. Complex Intell. Syst. 2024, 10, 1149–1166. [Google Scholar] [CrossRef]
- Dobrevski, M.; Skočaj, D. Deep Reinforcement Learning for Map-Less Goal-Driven Robot Navigation. Int. J. Adv. Robot. Syst. 2021, 18, 1729881421992621. [Google Scholar] [CrossRef]
- Lee, M.-F.R.; Yusuf, S.H. Mobile Robot Navigation Using Deep Reinforcement Learning. Processes 2022, 10, 2748. [Google Scholar] [CrossRef]
- Wu, K.; Wang, H.; Esfahani, M.A.; Yuan, S. Learn to Navigate Autonomously Through Deep Reinforcement Learning. IEEE Trans. Ind. Electron. 2022, 69, 5342–5352. [Google Scholar] [CrossRef]
- Cimurs, R.; Suh, I.H.; Lee, J.H. Goal-Driven Autonomous Exploration Through Deep Reinforcement Learning. IEEE Robot. Autom. Lett. 2022, 7, 730–737. [Google Scholar] [CrossRef]
- Zheng, J.; Mao, S.; Wu, Z.; Kong, P.; Qiang, H. Improved Path Planning for Indoor Patrol Robot Based on Deep Reinforcement Learning. Symmetry 2022, 14, 132. [Google Scholar] [CrossRef]
- Kumaar, A.A.N.; Kochuvila, S. Mobile Service Robot Path Planning Using Deep Reinforcement Learning. IEEE Access 2023, 11, 100083–100096. [Google Scholar] [CrossRef]
- Li, P.; Chen, D.; Wang, Y.; Zhang, L.; Zhao, S. Path Planning of Mobile Robot Based on Improved TD3 Algorithm in Dynamic Environment. Heliyon 2024, 10, e32167. [Google Scholar] [CrossRef]
- Weerakoon, K.; Sathyamoorthy, A.J.; Patel, U.; Manocha, D. TERP: Reliable Planning in Uneven Outdoor Environments Using Deep Reinforcement Learning. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 9447–9453. [Google Scholar]
- Wang, Y.; He, Z.; Cao, D.; Ma, L.; Li, K.; Jia, L.; Cui, Y. Coverage Path Planning for Kiwifruit Picking Robots Based on Deep Reinforcement Learning. Comput. Electron. Agric. 2023, 205, 107593. [Google Scholar] [CrossRef]
- Wei, Y.; Zheng, R. Multi-Robot Path Planning for Mobile Sensing through Deep Reinforcement Learning. In Proceedings of the IEEE INFOCOM 2021—IEEE Conference on Computer Communications, Vancouver, BC, Canada, 10 May 2021; pp. 1–10. [Google Scholar]
- Mehmood, A.; Shaikh, I.U.H.; Ali, A. Application of Deep Reinforcement Learning for Tracking Control of 3WD Omnidirectional Mobile Robot. Inf. Technol. Control 2021, 50, 507–521. [Google Scholar] [CrossRef]
- Fan, F.; Xu, G.; Feng, N.; Li, L.; Jiang, W.; Yu, L.; Xiong, X. Spatiotemporal Path Tracking via Deep Reinforcement Learning of Robot for Manufacturing Internal Logistics. J. Manuf. Syst. 2023, 69, 150–169. [Google Scholar] [CrossRef]
- Politi, E.; Stefanidou, A.; Chronis, C.; Dimitrakopoulos, G.; Varlamis, I. Adaptive Deep Reinforcement Learning for Efficient 3D Navigation of Autonomous Underwater Vehicles. IEEE Access 2024, 12, 178209–178221. [Google Scholar] [CrossRef]
- Xiaoyang, T.; Zhang, M.; Zhang, S.C. Traffic-Cognitive Slicing for Resource-Efficient Offloading with Dual-Distillation DRL in Multi-Edge Systems. arXiv 2024, arXiv:2412.04192. [Google Scholar]
- Chu, S.; Lin, M.; Li, D.; Lin, R.; Xiao, S. Adaptive Reward Shaping Based Reinforcement Learning for Docking Control of Autonomous Underwater Vehicles. Ocean. Eng. 2025, 318, 120139. [Google Scholar] [CrossRef]
- Wang, D.; Yin, H.; Guo, X.; Wu, J. Energy-Saving Optimization of Urban Rail Transit Timetable: A Deep Reinforcement Learning Approach. 2024. Available online: https://www.researchgate.net/publication/388478545_Energy-Saving_Optimization_of_Urban_Rail_Transit_Timetable_A_Deep_Reinforcement_Learning_Approach (accessed on 6 May 2024).
- Khaitan, S. Exploring Reinforcement Learning Approaches for Safety Critical Environments. Master’s Thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 2023. [Google Scholar]
- Fan, T.; Long, P.; Liu, W.; Pan, J.; Yang, R.; Manocha, D. Learning Resilient Behaviors for Navigation under Uncertainty. arXiv 2020, arXiv:1910.09998. [Google Scholar]
- Chen, Y.; Ji, C.; Cai, Y.; Yan, T.; Su, B. Deep Reinforcement Learning in Autonomous Car Path Planning and Control: A Survey. arXiv 2024, arXiv:2404.00340. [Google Scholar]
- Günster, J.; Liu, P.; Peters, J.; Tateo, D. Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning. arXiv 2024, arXiv:2409.12045. [Google Scholar]
- Aali, M. Learning-Based Safety-Critical Control Under Uncertainty with Applications to Mobile Robots. Ph.D. Thesis, University of Waterloo, Waterloo, ON, Canada, 2025. [Google Scholar]
- Kwon, R.; Kwon, G. Safety Constraint-Guided Reinforcement Learning with Linear Temporal Logic. Systems 2023, 11, 535. [Google Scholar] [CrossRef]
- Wang, J.; Elfwing, S.; Uchibe, E. Modular Deep Reinforcement Learning from Reward and Punishment for Robot Navigation. Neural Netw. 2021, 135, 115–126. [Google Scholar] [CrossRef]
- Singh, J. Robust AI Algorithms for Autonomous Vehicle Perception: Fusing Sensor Data from Vision, LiDAR, and Radar for Enhanced Safety. J. AI-Assist. Sci. Discov. 2024, 4, 118–157. [Google Scholar]
- Nissov, M.; Khattak, S.; Edlund, J.A.; Padgett, C.; Alexis, K.; Spieler, P. ROAMER: Robust Offroad Autonomy Using Multimodal State Estimation with Radar Velocity Integration. In Proceedings of the 2024 IEEE Aerospace Conference, Big Sky, MT, USA, 2–9 March 2024; pp. 1–10. [Google Scholar]
- Mitta, N.R. AI-Enhanced Sensor Fusion Techniques for Autonomous Vehicle Perception: Integrating Lidar, Radar, and Camera Data with Deep Learning Models for Enhanced Object Detection, Localization, and Scene Understanding. J. Bioinform. Artif. Intell. 2024, 4, 121–162. [Google Scholar]
- Cheng, W.-C.; Ni, Z.; Zhong, X.; Wei, M. Autonomous Robot Goal Seeking and Collision Avoidance in the Physical World: An Automated Learning and Evaluation Framework Based on the PPO Method. Appl. Sci. 2024, 14, 11020. [Google Scholar] [CrossRef]
- Yao, C.; Ge, Y.; Shi, G.; Wang, Z.; Yang, N.; Zhu, Z.; Wei, H.; Zhao, Y.; Wu, J.; Jia, Z. TAIL: A Terrain-Aware Multi-Modal SLAM Dataset for Robot Locomotion in Deformable Granular Environments. IEEE Robot. Autom. Lett. 2024, 9, 6696–6703. [Google Scholar] [CrossRef]
- Romanelli, F. Multi-Sensor Fusion for Autonomous Resilient Perception. Ph.D. Thesis, University of Rome Tor Vergata, Roma, Italy, 2024. [Google Scholar]
- Merveille, F.F.R.; Jia, B.; Xu, Z.; Fred, B. Advancements in Sensor Fusion for Underwater SLAM: A Review on Enhanced Navigation and Environmental Perception. Sensors 2024, 24, 7490. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.; Zhou, A. RDDRL: A Recurrent Deduction Deep Reinforcement Learning Model for Multimodal Vision-Robot Navigation. Appl. Intell. 2023, 53, 23244–23270. [Google Scholar] [CrossRef]
- Huang, X.; Deng, H.; Zhang, W.; Song, R.; Li, Y. Towards Multi-Modal Perception-Based Navigation: A Deep Reinforcement Learning Method. IEEE Robot. Autom. Lett. 2021, 6, 4986–4993. [Google Scholar] [CrossRef]
- Hassan, N.A. Big Data and Machine Learning in Autonomous Vehicle Navigation: Challenges and Opportunities. J. Appl. Cybersecur. Anal. Intell. Decis.-Mak. Syst. 2024, 14, 54–64. [Google Scholar]
- Tiwari, R.; Srinivaas, A.; Velamati, R.K. Adaptive Navigation in Collaborative Robots: A Reinforcement Learning and Sensor Fusion Approach. Appl. Syst. Innov. 2025, 8, 9. [Google Scholar] [CrossRef]
- Cole, E.J.; Thompson, D.R.; Nguyen, J.T.; Wright, B.A. A Sensor-Fused Deep Reinforcement Learning Framework for Multi-Agent Decision-Making in Urban Driving Environments. Int. J. Eng. Adv. 2025, 2, 101–108. [Google Scholar]
- Raettig, T.N. Heterogeneous Collaborative Robotics: Multi-Robot Navigation in Dynamic Environments. Master’s Thesis, Florida Institute of Technology, Melbourne, FL, USA, 2024. [Google Scholar]
- Gao, Y.; Zhou, D.; Shen, Y.; Yang, X. Dual Experience Replay-Based TD3 for Single Intersection Signal Control. J. Supercomput. 2024, 80, 15161–15182. [Google Scholar] [CrossRef]
- Chen, J. Reinforcement Learning and Swarm Intelligence for Cooperative Aerial Navigation and Payload Transportation. Ph.D. Thesis, University of Sheffield, Sheffield, UK, 2024. [Google Scholar]
- Koradiya, G. Reinforcement Learning Based Planning and Control for Robotic Source Seeking Inspired by Fruit Flies. Master’s Thesis, San Jose State University, San Jose, CA, USA, 2024. [Google Scholar]
- Wickenden Domingo, À. Training Cooperative and Competitive Multi-Agent Systems. Bachelor’s Thesis, Universitat Politècnica de Catalunya, Barcelona, Spain, 2024. [Google Scholar]
- Dong, L.; He, Z.; Song, C.; Yuan, X.; Zhang, H. Multi-Robot Social-Aware Cooperative Planning in Pedestrian Environments Using Attention-Based Actor-Critic. Artif. Intell. Rev. 2024, 57, 108. [Google Scholar] [CrossRef]
- McClusky, B. Dynamic Graph Communication for Decentralised Multi-Agent Reinforcement Learning. arXiv 2024, arXiv:2501.00165. [Google Scholar]
- Egorov, V.; Shpilman, A. Scalable Multi-Agent Model-Based Reinforcement Learning. arXiv 2022, arXiv:2205.15023. [Google Scholar]
- Gronauer, S.; Diepold, K. Multi-Agent Deep Reinforcement Learning: A Survey. Artif. Intell. Rev. 2022, 55, 895–943. [Google Scholar] [CrossRef]
- Wang, R. Towards Efficient Cooperation Within Learning Agents. Ph.D. Thesis, Nanyang Technological University, Singapore, 2023. [Google Scholar]
- Chen, R. Cooperative and Competitive Multi-Agent Deep Reinforcement Learning. In Proceedings of the 2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2022), Zhuhai, China, 25–27 February 2022; Volume 12348, pp. 599–613. [Google Scholar]
- Nekoei, H.; Badrinaaraayanan, A.; Sinha, A.; Amini, M.; Rajendran, J.; Mahajan, A.; Chandar, S. Dealing with Non-Stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning. In Proceedings of the 2nd Conference on Lifelong Learning Agents, Montreal, QC, Canada, 20 November 2023; pp. 376–398. [Google Scholar]
- Oroojlooy, A.; Hajinezhad, D. A Review of Cooperative Multi-Agent Deep Reinforcement Learning. Appl. Intell. 2023, 53, 13677–13722. [Google Scholar] [CrossRef]
- Serra Gomez, A. Motion Planning in Dynamic Environments with Learned Scalable Policies. Ph.D. Thesis, TU Delft, Delft, The Netherlands, 2025. [Google Scholar]
- Cooperative Localization of UAVs in Multi-Robot Systems Using Deep Learning-Based Detection|AIAA SciTech Forum. Available online: https://arc.aiaa.org/doi/abs/10.2514/6.2025-1537 (accessed on 15 January 2025).
- Jeong, E.; Gwak, J.; Kim, T.; Kang, D.-O. Distributed Deep Learning for Real-World Implicit Mapping in Multi-Robot Systems. In Proceedings of the 2024 24th International Conference on Control, Automation and Systems (ICCAS), Jeju, Republic of Korea, 29 October–1 November 2024; pp. 1619–1624. [Google Scholar]
- Wang, C. Robust AI Based Perception and Guidance for Autonomous Vehicles. Ph.D. Thesis, University of London, London, UK, 2024. [Google Scholar]
- Muratore, F.; Ramos, F.; Turk, G.; Yu, W.; Gienger, M.; Peters, J. Robot Learning from Randomized Simulations: A Review. Front. Robot. AI 2022, 9, 799893. [Google Scholar] [CrossRef]
- Kang, K.; Belkhale, S.; Kahn, G.; Abbeel, P.; Levine, S. Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019. [Google Scholar]
- Yavas, M.U.; Kumbasar, T.; Ure, N.K. A Real-World Reinforcement Learning Framework for Safe and Human-like Tactical Decision-Making. IEEE Trans. Intell. Transp. Syst. 2023, 24, 11773–11784. [Google Scholar] [CrossRef]
- Albuquerque, P.L.F. Domain Adaptation in Unmanned Aerial Vehicles Landing Using Reinforcement Learning. Master’s Thesis, University of Nebraska-Lincoln, Lincoln, Nebraska, 2019. [Google Scholar]
- Wu, J.; Zhou, Y.; Yang, H.; Huang, Z.; Lv, C. Human-Guided Reinforcement Learning with Sim-to-Real Transfer for Autonomous Navigation. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 14745–14759. [Google Scholar] [CrossRef]
- Jang, Y.; Baek, J.; Jeon, S.; Han, S. Bridging the Simulation-to-Real Gap of Depth Images for Deep Reinforcement Learning. Expert Syst. Appl. 2024, 253, 124310. [Google Scholar] [CrossRef]
- Zhao, W.; Queralta, J.P.; Westerlund, T. Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, ACT, Australia, 1–4 December 2020; pp. 737–744. [Google Scholar]
- Chukwurah, N.; Adebayo, A.S.; Ajayi, O.O. Sim-to-Real Transfer in Robotics: Addressing the Gap between Simulation and Real-World Performance. JFMR 2024, 5, 33–39. [Google Scholar] [CrossRef]
- Muratore, F. Randomizing physics simulations for robot learning. Ph.D. Thesis, Technische Universität Darmstadt, Darmstadt, Germany, 2021. [Google Scholar]
- Josifovski, J.; Malmir, M.; Klarmann, N.; Žagar, B.L.; Navarro-Guerrero, N.; Knoll, A. Analysis of Randomization Effects on Sim2Real Transfer in Reinforcement Learning for Robotic Manipulation Tasks. arXiv 2022, arXiv:2206.06282. [Google Scholar]
- Dong, Q.; Zeng, P.; Wan, G.; He, Y.; Dong, X. Kalman Filter-Based One-Shot Sim-to-Real Transfer Learning. IEEE Robot. Autom. Lett. 2024, 9, 311–318. [Google Scholar] [CrossRef]
- Yu, Y.; Liu, L. Neural Fidelity Calibration for Informative Sim-to-Real Adaptation. arXiv 2025, arXiv:2504.08604. [Google Scholar]
- Ju, H.; Juan, R.; Gomez, R.; Nakamura, K.; Li, G. Transferring Policy of Deep Reinforcement Learning from Simulation to Reality for Robotics. Nat. Mach. Intell. 2022, 4, 1077–1087. [Google Scholar] [CrossRef]
- Narkarunai Arasu Malaiyappan, J.; Mani Krishna Sistla, S.; Jeyaraman, J. Advancements in Reinforcement Learning Algorithms for Autonomous Systems. Int. J. Innov. Sci. Res. Technol. 2024, 9, 1941–1946. [Google Scholar] [CrossRef]
- Szolc, H.; Desnos, K.; Kryjak, T. Tangled Program Graphs as an Alternative to DRL-Based Control Algorithms for UAVs. In Proceedings of the 2024 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA), Poznan, Poland, 25–27 September 2024; pp. 36–41. [Google Scholar]
- Mehta, S. From AI Safety Gridworlds to Reliable Safety Unit Tests for Deep Reinforcement Learning in Computer Systems. Master’s Thesis, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany, 2020. [Google Scholar]
- Yau, H. On the Interpretability of Reinforcement Learning. Available online: https://www.surrey.ac.uk/events/20240626-interpretability-reinforcement-learning (accessed on 23 May 2024).
- Zhou, Z.; Liu, G.; Tang, Y. Multiagent Reinforcement Learning: Methods, Trustworthiness, Applications in Intelligent Vehicles, and Challenges. IEEE Trans. Intell. Veh. 2024, 1–23. [Google Scholar] [CrossRef]
- Terven, J. Deep Reinforcement Learning: A Chronological Overview and Methods. AI 2025, 6, 46. [Google Scholar] [CrossRef]
- Wang, S.; Zhang, S.; Zhang, J.; Hu, R.; Li, X.; Zhang, T.; Li, J.; Wu, F.; Wang, G.; Hovy, E. Reinforcement Learning Enhanced LLMs: A Survey. arXiv 2025, arXiv:2412.10400. [Google Scholar]
- Hickling, T.; Zenati, A.; Aouf, N.; Spencer, P. Explainability in Deep Reinforcement Learning, a Review into Current Methods and Applications. arXiv 2023, arXiv:2207.01911. [Google Scholar] [CrossRef]
- Lu, Y.; Sun, W.; Sun, M. Towards Mutation Testing of Reinforcement Learning Systems. J. Syst. Archit. 2022, 131, 102701. [Google Scholar] [CrossRef]
- Murad, S.A.; Muzahid, A.J.M.; Azmi, Z.R.M.; Hoque, M.I.; Kowsher, M. A Review on Job Scheduling Technique in Cloud Computing and Priority Rule Based Intelligent Framework. J. King Saud. Univ.-Comput. Inf. Sci. 2022, 34, 2309–2331. [Google Scholar] [CrossRef]
- Bao, L.; Humphreys, J.; Peng, T.; Zhou, C. Deep Reinforcement Learning for Bipedal Locomotion: A Brief Survey. arXiv 2024, arXiv:2404.17070. [Google Scholar]
- Cai, W.; Cheng, G.; Kong, L.; Dong, L.; Sun, C. Robust Navigation with Cross-Modal Fusion and Knowledge Transfer. arXiv 2023, arXiv:2309.13266. [Google Scholar]
- Kalenberg, K.; Müller, H.; Polonelli, T.; Schiaffino, A.; Niculescu, V.; Cioflan, C.; Magno, M.; Benini, L. Stargate: Multimodal Sensor Fusion for Autonomous Navigation on Miniaturized UAVs. IEEE Internet Things J. 2024, 11, 21372–21390. [Google Scholar] [CrossRef]
- Zhu, F.; Zhu, Y.; Lee, V.C.; Liang, X.; Chang, X. Deep Learning for Embodied Vision Navigation: A Survey. arXiv 2021, arXiv:2108.04097. [Google Scholar]
- Hua, J.; Zeng, L.; Li, G.; Ju, Z. Learning for a Robot: Deep Reinforcement Learning, Imitation Learning, Transfer Learning. Sensors 2021, 21, 1278. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).





























