4.3.1. Performance Metrics
To evaluate the performance of the proposed SAC-P algorithm, we quantify its effectiveness using a comprehensive set of metrics: success rate, collision rate, out-of-bounds rate, timeout rate, step count, and flight distance.
(1) Success Rate (SR): The percentage of episodes in which the UAV successfully reaches the target position.
(2) Collision Rate (CR): The percentage of episodes where the UAV collides with an obstacle.
(3) Out-of-bound Rate (OR): The percentage of times the drone flies out of the mission area.
(4) Timeout Rate (TR): The percentage of episodes where the UAV fails to reach the target within the maximum time steps.
(5) Average Time Steps (ATS): The average number of steps required for the UAV to reach the target position from the starting position when the mission is successfully completed. In addition, since the energy consumed by the UAV during a mission is generally positively correlated with the number of time steps required, this study also considers the ATS as an indirect metric for evaluating the UAV’s energy consumption.
4.3.2. Comparative Testing
To enable an objective evaluation of the framework’s performance across different algorithms, we designed the experiments using the controlled variable approach, thereby ensuring both fairness and consistency in testing conditions. In this setup, all algorithms are implemented within the same proposed framework, with the only variation being the decision-making algorithm; consequently, the data preprocessing procedure remains identical for all cases. Moreover, performance assessments are conducted under an identical obstacle distribution, where the obstacle coordinates are fixed at [[4.0, 4.0], [7.5, 2.0], [−4.0, −5.0], [2.0, 0.5], [−7.0, 3.0], [3.0, −4.0], [−3.0, 0.5], [−2.0, 4.5], [−7.0, −2.0], [8.0, −3.0]]. The UAV’s starting positions are set to three representative points: [−11.0, 0.0], [−11.0, 4.0], [−11.0, −4.0], with corresponding target positions at [11.0, 0.0], [11.0, −4.0], [11.0, 4.0]. Each start-target pair undergoes 100 independent trials, resulting in a total of 300 episodes. In addition, each experiment was independently repeated three times with different random seeds.
Using the trained model, we first benchmarked our approach against representative DRL baselines in three distinct mission scenarios. Performance was evaluated using five metrics: SR, CR, OR, TR and ATS. Overall statistics are summarized in
Table 5 (results are presented as mean ± standard deviation (SD)), with comparative SR performance illustrated in
Figure 10. The low SD values—generally below ±2% for rate-based metrics and ±0.5 time steps for ATS—demonstrate that the performance differences are statistically stable and not caused by stochastic variations inherent to DRL training or UAV simulation environments.
In Mission 1, the SAC-P-based framework achieved the highest performance across all metrics (SR = 81.23 ± 1.20%, CR = 15.02 ± 1.01%, OR = 1.02 ± 0.40%, TR = 3.01 ± 0.49%, ATS = 61.26 ± 0.35), outperforming the strongest baseline TD3 (SR = 78.07 ± 1.78%) by approximately 3.16 percentage points. Notably, SAC-P required the fewest time steps, completing missions ~1.59 steps faster than DDPG, while maintaining low collision and failure rates. The small SD values (<±1.3%) indicate that these improvements were consistently observed across all trial runs.
Mission 2 results further validated the robustness of SAC-P. Its maintained superiority with an SR of 77.15 ± 1.28%, exceeding TD3 by ~10 percentage points. CR dropped to 18.04 ± 1.02%, a relative reduction of 16.08 and 19.14 percentage points compared with DDPG (34.12 ± 1.48%) and PPO (37.18 ± 1.59%), respectively. Although OR (2.01 ± 0.48%) and TR (3.05 ± 0.41%) were slightly higher than PPO’s values, they remained negligible in operational terms. SAC-P achieved the shortest ATS (66.33 ± 0.42), highlighting its navigation efficiency and path-planning robustness. Low SD values again support the reproducibility of these gains.
In Mission 3, SAC-P achieved SR = 76.13 ± 1.26%, maintaining a 6.02 percentage point margin over TD3. It recorded the lowest CR (17.07 ± 1.03%), dramatically outperforming PPO (38.07 ± 1.54%) and DDPG (32.10 ± 1.37%). Although OR (2.02 ± 0.47%) and TR (5.04 ± 0.48%) were marginally higher than PPO in certain scenarios, they remained within acceptable limits for safe UAV navigation. SAC-P again required the fewest average steps (66.22 ± 0.40). The consistently small SD values confirm that performance differences were not due to random instability.
Taken together, results from all three missions demonstrate that the SAC-P framework consistently attains the dual advantage of the highest SR and shortest ATS. The statistical stability, as evidenced by small standard deviations across three independent runs, underscores the robustness and reproducibility of SAC-P in UAV autonomous navigation and obstacle avoidance tasks. Algorithmically, SAC-P preserves exploratory behavior while achieving more stable learning of environmental dynamics, enabling a favorable balance between success rate and collision avoidance. The comparison of the UAV trajectories of the proposed method with the baselines of different DRL algorithms in each task is shown in
Figure 11a,
Figure 11b and
Figure 11c, respectively.
- 2.
Comparative test with the baseline of Non-DRL algorithm
To further assess how the proposed SAC-P-based framework and decision-making strategy perform relative to conventional navigation and obstacle avoidance approaches in mission execution, we compared it against counterparts implemented with the APF and DWA algorithms under the same experimental settings described earlier. The overall results are reported in
Table 6, with success rate comparisons illustrated in
Figure 12. A detailed analysis of the outcomes for the three mission scenarios is presented as follows.
In Mission 1, the SAC-P achieved the highest SR (81.23 ± 1.20%), representing a relative improvement of 9.08 and 11.09 percentage points over APF (72.15 ± 1.65%) and DWA (70.14 ± 1.58%), respectively. The CR was limited to 15.02 ± 1.01%, significantly lower than APF (22.08 ± 1.28%) and DWA (23.06 ± 1.25%). Both OR (1.02 ± 0.40%) and TR (3.01 ± 0.49%) were lower than those of the baselines. SAC-P also achieved the shortest ATS (61.26 ± 0.35), outperforming APF by 4.09 steps and DWA by 5.50 steps. The consistently low SD values confirm that these performance gains were stable across repeated runs, underscoring SAC-P’s robustness and superior navigation efficiency in this mission.
In the more challenging Mission 2, SAC-P maintained its dominance with an SR of 77.15 ± 1.28%, outperforming APF (66.10 ± 1.50%) by 11.05 percentage points and DWA (67.12 ± 1.46%) by 10.03 percentage points. It also recorded the lowest CR (18.04 ± 1.02%), OR (2.01 ± 0.48%), and TR (3.05 ± 0.41%) among all methods, indicating reduced mission failure risks. Furthermore, SAC-P achieved the shortest ATS (66.33 ± 0.42), which translates into 1.72 and 1.61 fewer steps than APF and DWA, respectively. The narrow SD margins again confirm that its performance advantages were consistent and not the result of experimental variance.
For Mission 3, SAC-P sustained its lead with an SR of 76.13 ± 1.26%, compared to APF (68.08 ± 1.49%) and DWA (66.05 ± 1.44%). Its CR (17.07 ± 1.03%) was substantially lower than that of APF (24.04 ± 1.26%) and DWA (24.06 ± 1.27%). Although APF achieved a marginally lower OR (2.01 ± 0.46% vs. SAC-P’s 2.02 ± 0.47%), SAC-P retained the advantage in TR (5.04 ± 0.48%) and ATS (66.22 ± 0.40), outperforming APF and DWA by 2.69 and 3.34 steps, respectively. This reflects SAC-P’s ability to retain high success rates while minimizing travel time even in stochastic scenarios. Across all missions, the SAC-P-based UAV navigation and obstacle avoidance framework consistently achieved the highest SR and delivered optimal results across CR, OR, TR, and ATS. Compared with APF- and DWA-based counterparts, it demonstrated clear advantages in mission efficiency, safety, and robustness. These outcomes affirm the method’s superior performance in completing tasks efficiently, reducing failure likelihood, and maintaining robust operation under identical environmental conditions. The comparison of the UAV trajectories of the proposed method with the baselines of different Non-DRL algorithms in each task is shown in
Figure 13a,
Figure 13b,
Figure 13c, respectively.
4.3.3. Generalization Performance Testing
To examine the generalization capability of the proposed SAC-P-based UAV navigation and obstacle avoidance framework, we performed a dedicated generalization test. A single experiment comprised 100 episodes, which were repeated three times independently with different random seeds. At the start of each episode, the obstacle configuration was randomly regenerated. To analyze performance across varying conditions, we designed two distinct task scenarios. Scenario 1 featured sparsely distributed homogenous obstacles. Scenario 2, inspired by the configurations in [
39,
40], contained a dense arrangement of heterogeneous obstacles with circular, square, and rectangular cross-sections. Because the initial and target positions, as well as obstacle arrangements, varied between episodes, the required number of time steps also varied. As such, the evaluation focused exclusively on four metrics: SR, CR, OR, and TR.
- 3.
Simple Obstacle Scenario Test
Table 7 summarizes the results for Scenario 1, where the proposed SAC-P-based autonomous navigation and obstacle avoidance framework was evaluated in an environment featuring randomly distributed, homogeneous obstacles. achieved an SR of 79.12 ± 1.85%, with CR (17.06 ± 1.42%), OR (2.01 ± 0.45%), and TR (2.04 ± 0.47%) all maintained at low levels. The relatively small SD values (≤±1.85%) indicate that performance remained stable and reproducible across all unseen map configurations, demonstrating strong adaptability to novel environments. Importantly, the task settings, obstacle positions, and spatial configurations in this scenario were entirely unseen during training, making the results a direct measure of the framework’s adaptability and transferability to unfamiliar environments.
The combination of low CR and OR indicates that the UAV could reliably detect and avoid obstacles under a completely new map topology, while the low TR reflects efficient path planning and decision-making even in untrained scenarios. Together, these findings underscore the framework’s strong generalization capability, demonstrating that navigation policies learned in training can be effectively and robustly applied to new, more uncertain task conditions. Representative UAV trajectories and corresponding obstacle configurations for selected episodes in Scenario 1 are depicted in
Figure 14.
- 4.
Complex Obstacle Scenario Test
As environmental complexity increases—particularly when obstacles are heterogeneous, densely packed, and randomly distributed—UAV navigation and obstacle avoidance place heightened demands on environmental perception, path planning, and decision stability.
Table 8 reports the results for Scenario 2.
The SAC-P-based framework achieved an SR of 72.08 ± 2.14%, lower than the 79.12% attained in Scenario 1, reflecting the increased difficulty of trajectory planning in the presence of dense and irregular obstacle fields. The CR rose to 22.04 ± 1.68% (+5 percentage points compared with Scenario 1), indicating that denser obstacle configurations increased the frequency of local avoidance maneuvers and collision risk. Nevertheless, both OR (2.02 ± 0.48%) and TR (4.01 ± 0.52%) remained low, demonstrating that the framework could maintain operational safety and execution stability even in high-uncertainty environments.
The relatively small SD values across all metrics (≤±2.14%) confirm that these results are statistically stable, and that SAC-P’s performance degradation from Scenario 1 to Scenario 2 is moderate and consistent, rather than the result of isolated failures. This indicates that the proposed framework preserves robust generalization capability and reliable decision-making in complex, previously unseen operational contexts.
Trajectory analysis in
Figure 15 further reveals that, even in the complex scenario, the UAV maintained smooth path generation without excessive turning or futile backtracking. This indicates that the framework’s high-level decision module can preserve stable, efficient global planning despite pronounced local environmental changes, avoiding both decision oscillation and overly conservative maneuvers. Collectively, these results reinforce the framework’s robustness in maintaining path continuity and decision consistency in challenging environments, confirming its strong global planning capabilities.
4.3.4. Performance Evaluation of Single-Sensor and Sensor Fusion Frameworks
To assess the relative merits of the proposed UAV DRL framework—which fuses depth camera and LiDAR perception—against single-sensor counterparts, we performed a sensor configuration comparison while keeping the decision-making algorithm (SAC-P) fixed across all tests. Three input configurations for environmental observation were evaluated: depth camera only, LiDAR only, and the fusion of depth camera with LiDAR. The same preprocessing pipeline was applied to all sensory data to ensure fairness and comparability, and the experimental scenario was configured identically to Mission 1 in the aforementioned comparative tests, with each sensor configuration tested over 100 episodes. Each experiment was independently repeated three times with different random seeds. Performance was evaluated across six metrics: SR, CR, OR, TR, ATS, and an additional measure—Average single-step decision time (ASDT)—to gauge real-time decision-making efficiency.
The results, summarized in
Table 9, show that sensor selection substantially affects mission performance even under identical decision-making policies. The consistently small SD values (generally ≤±1.8% for rate-based metrics, ≤±0.5 steps for ATS, and ≤±0.004 s for ASDT) indicate stable and reproducible performance across runs. The proposed Depth camera + LiDAR fusion configuration achieved the highest success rate (SR) of 81.23 ± 1.20%, outperforming the Depth camera–only setup (68.15 ± 1.65%) by 13.08 percentage points and the LiDAR–only setup (72.14 ± 1.58%) by 9.09 percentage points. It also yielded the lowest collision rate (CR) (15.02 ± 1.01%), minimal out-of-bounds rate (OR) (1.02 ± 0.40%), and the lowest timeout rate (TR) (3.01 ± 0.49%) among all configurations.
In terms of navigation efficiency, the fusion system achieved the shortest average time steps (ATS) (61.26 ± 0.35), reducing task completion time by 2.89 steps compared with Depth camera–only and by 1.89 steps compared with LiDAR–only configurations. This improvement reflects the enhanced environmental awareness and more optimal route selection achievable through the complementary sensing modalities. Regarding decision-making latency, the fusion setup exhibited a slightly higher ASDT (0.043 ± 0.002 s) than the single-sensor configurations (Depth camera: 0.031 ± 0.002 s, LiDAR: 0.029 ± 0.002 s), attributable to the additional computational overhead from sensor fusion and feature integration. However, this latency increase is negligible relative to the UAV control cycle and does not adversely affect real-time execution.
Overall, these findings confirm that combining depth camera and LiDAR perception within the proposed DRL framework delivers superior performance and robustness over single-sensor approaches, enabling more reliable, higher-success navigation in complex and uncertain environments.