3.1. Establishment of Experimental Platform
This study addresses the core issue of insufficient obstacle avoidance efficiency of robotic arms in dynamic environments under the background of Industry 5.0. A collaborative optimization framework combining improved RRT algorithm and deep reinforcement learning is proposed, as shown in
Figure 4.
The experimental platform, built upon ROS and Gazebo, aims to replicate the physical properties and sensor noise characteristics of mechanical tubes in dynamic environments. It provides a verification environment for improving the collaborative optimization framework of RRT* and deep reinforcement learning [
35,
36,
37]. The core mathematical models are as follows:
1. Dynamic model of robotic arm:
The dynamic equation of the UR5 collaborative robotic arm is modeled using the Lagrangian method:
where is the M(q) symmetric positive definite mass matrix,
, and its elements are calculated from the mass of the robot arm link
, the inertia tensor
, and geometric parameters:
and are the Jacobian matrices of the linear velocity and angular velocity of the ith link, respectively.
For the matrix of Coriolis and centrifugal forces, it satisfies
For the gravity term, it is calculated from the pose of the robotic arm and the gravitational acceleration
:
The parameters of the robotic arm obtained from this study reveal the core characteristics of its structural design and motion capabilities. As shown in
Table 1, the Denavit–Hartenberg parameters of the six joints indicate that the twist angles α of joints 1 and 4 are both π/2 radians, indicating that these two joints adopt an orthogonal axis layout, enabling multi-plane motion switching in three-dimensional space. The link lengths of joints 2 and 3 are 0.4318 m and 0.0203 m, respectively, defining the horizontal reach and longitudinal precision of the robotic arm. It is worth noting that the rotational degree of freedom of joint 6 covers −266° to 26°, far exceeding the ±180° limit of conventional six-axis robotic arms, providing a larger pose adjustment space for the end effector. The differences in joint angle limits indicate that the robotic arm has optimized its configuration space for complex obstacle avoidance tasks, avoiding motion singularities. This combination of parameters enables the robotic arm to exhibit high flexibility in dynamic environments. For example, the 0.15 m link offset of joint 3 enhances obstacle avoidance capabilities in the vertical direction, while the 0.4318 m link length of joint 4 expands the horizontal working range.
2. Dynamic obstacle motion model:
The motion trajectory of dynamic obstacles is described by stochastic differential equations:
where
is Gaussian white noise, and the covariance matrix is
The parameters are set to to simulate the uncertainty of obstacle movement in real-world environments.
1. LiDAR point cloud generation model:
The point cloud data of 3D LiDAR
is generated by a ray-casting algorithm:
where
is the sensor origin,
is the ray direction vector,
is the obstacle distance, and
is the total number of rays.
2. Depth camera noise model:
The deep observation values
are affected by multiple noises, and the modeling is as follows:
Among them, multipath noise:
3.2. Evaluation Metrics and Comparative Schemes
To comprehensively evaluate the performance of dynamic obstacle avoidance algorithms, this study defines core evaluation metrics from four dimensions: task success rate, path quality, computational efficiency, and dynamic adaptability [
38,
39,
40]. The task success rate refers to the ratio of the robotic arm reaching the target position without collision within a preset time constraint; path quality is comprehensively measured by average path length, path smoothness, and joint torque fluctuation; computational efficiency encompasses the time consumption per planning and the occupation rate of computational resources; and dynamic adaptability is reflected by the success rate decay rate when the obstacle’s movement speed changes.
To verify the superiority of the collaborative optimization framework proposed in this paper, four representative algorithms were selected for comparative analysis: traditional methods include the sampling-based RRT algorithm and the artificial potential field method based on potential fields; improved methods include the dynamic reconnection-type RRTSmart-CD algorithm and the deep Q-learning algorithm; and the industrial benchmark adopts the default obstacle avoidance strategy of the ABB YuMi collaborative robotic arm. All algorithms were tested in the same experimental scenario, with an obstacle density ranging from 0.5 to 2.0 m per second.
The collaborative framework proposed in this study demonstrates significant performance advantages in dynamic obstacle avoidance tasks. As shown in
Table 2, the task success rate of the collaborative framework is 93.8%, with a standard deviation of 1.8 percentage points.
Figure 5 shows that the traditional RRT star algorithm’s task success rate is significantly better than the traditional RRT algorithm’s 72.3% and the deep Q-learning algorithm’s 78.5%. Statistical tests show that the differences are highly significant (t = 18.2 and t = 12.4, with
p-values less than 0.001).
Figure 6 shows that in terms of path quality, the average path length of the collaborative framework is 1.97 m with a standard deviation of 0.22 m, which is 29.1% shorter than the 2.78 m of the artificial potential field method.
Figure 7 shows that the planning time is only 28.3 milliseconds with a standard deviation of 3.9 milliseconds, which is 20.5% faster than the RRT dynamic reconnection algorithm. The 95% confidence interval for its task success rate is 93.2% to 94.4%, indicating high stability of the results. Compared with the industrial grade ABB YuMi solution, the collaboration framework has improved the success rate by 12.6 percentage points, optimized the path smoothness to 0.08 radians per meter with a standard deviation of 0.02 radians, reduced system energy consumption to 12.6 kilowatt hours with a standard deviation of 0.9 kilowatt hours, and achieved industry-leading comprehensive performance.
The impact of obstacle density on algorithm performance exhibits significant nonlinear characteristics. As shown in
Table 3, when the obstacle density increased from 0.5/cubic meter to 2.0/cubic meter, the task success rate of the collaborative framework nonlinearly declined from 98.2% to 77.1%, and the standard deviation expanded from 0.9 percentage points to 3.6 percentage points. One way analysis of variance showed significant differences between groups (F = 286.4,
p < 0.001). Tukey’s post hoc test showed that under the highest density condition, the success rate of the collaborative framework was still 21.1 percentage points higher than traditional methods, belonging to the category of large effects. According to the data in
Figure 8a, the planning time increases sublinear with density from 23.1 milliseconds to 42.7 milliseconds, the standard deviation expands from 2.7 milliseconds to 6.8 milliseconds, and the CPU utilization rate increases from 62.4% to 89.5%, still below the real-time threshold of 95%.
Figure 8b shows that the dynamic reconnection mechanism and spatiotemporal attention module can effectively cope with high-density obstacle scenes. When the obstacle density is 2.0 per cubic meter, the path length only increases by 30.3%, verifying the robustness of the algorithm in complex industrial environments (
Figure 8c).
To achieve reproducibility of experimental results, the key hyperparameter configurations are shown in the table below, and all parameters are optimized using Bayesian optimization to determine their optimal values within a predefined search space, as shown in
Table 4:
3.3. Analysis of Experimental Results
This section systematically analyzes the experimental results from three dimensions: comprehensive performance, dynamic adaptability, and load robustness. By comparing traditional RRT*, artificial potential field method, improved RRT* dynamic reconnection, deep Q-learning algorithm, and ABB industrial benchmark strategy, the collaborative framework proposed in this paper demonstrates significant advantages in task success rate, path quality, and resource efficiency.
The data obtained from this study’s experiments indicate that the speed of obstacle movement has a significant nonlinear impact on algorithm performance. As shown in
Table 5, when the obstacle speed increases from 0.5 m/s to 2.5 m/s, the task success rate decreases from 98.2% to 68.9%, representing a 15.9% decrease in success rate. However, the collaborative framework still outperforms traditional methods at the same speed. Planning time increases exponentially with speed, from 23.1 milliseconds to 51.3 milliseconds, but it remains below the real-time threshold of 100 milliseconds for industrial scenarios. CPU utilization reaches 94.7% at a speed of 2.5 m/s, indicating that the system needs to sacrifice some computational resources to maintain obstacle avoidance accuracy in a highly dynamic environment.
The results obtained from the research experiments indicate that an increase in payload weight systematically reduces the dynamic obstacle avoidance capability of the robotic arm. The data in
Table 6 (
Figure 9a) reveals a strong correlation between load and energy efficiency: energy consumption reaches 24.7 kilowatt hours at a load of 10 kg and an increase of 99.2% compared to the no-load state.
Figure 9b shows that the joint tracking error has increased from 0.8 mm to 3.1 mm, but it is still lower than the industrial standard threshold of 5 mm. The torque fluctuation index increases linearly with the load, from 8.3 Newton-square to 16.1 Newton-square, verifying the necessity of the dynamic torque compensation algorithm.
Figure 9c shows that the task success rate is 93.8% when the load is empty. When the load increases to 10 kg, the success rate drops to 72.4%, and the path length increases from 1.97 m to 2.45 m, with an increase of 24.4%. These data dynamic load fluctuation test results jointly prove that the collaborative framework has strong robustness in physical uncertainty scenarios.
To quantify the independent contributions of each technical component in the collaborative framework, see the data in
Table 7. This study designed a systematic ablation experiment in a dynamic environment with an obstacle density of 1.0/m
3 and a motion speed of 1.5 m/s. The experiment sequentially employed six configuration combinations and fixed the remaining parameters: firstly, the original RRT* algorithm without improvement and the basic deep reinforcement learning framework were used as benchmark models; subsequently, a dynamic reconnection mechanism will be gradually added, which includes spatiotemporal corridor constraints and isolation of failure path segments; next, integrate a lightweight collision detection module and adopt a hybrid process based on Mahalanobis distance pre-screening and GJK accurate detection; re-enable the spatiotemporal attention computing unit in the deep reinforcement learning strategy network; afterwards, activate a multi-objective composite reward function to balance obstacle avoidance success rate, path quality, and energy consumption; and finally, a dynamic weight scheduling strategy is introduced to achieve hybrid architecture collaboration and form a complete framework. Each configuration undergoes 500 repeated experiments under random obstacle motion trajectories, strictly using the evaluation metrics defined in
Section 3.2, including task success rate, average path length, planning time, system energy consumption, and dynamic obstacle prediction error. At the same time, the sensor noise model is kept consistent with the obstacle motion parameters to ensure comparability of experimental results.
The ablation experiment data clearly reveals the step-by-step improvement effect of various technical components on system performance. The benchmark model has a task success rate of only 72.3% without introducing any improvements, a path length of 2.45 m, and a planning time of up to 48.7 milliseconds. The addition of the dynamic reconnection mechanism brings the most significant performance leap, with a success rate increase of 18.3% to 85.6%, a path length reduction of 9.8%, and a planning time reduction of 26.9%, verifying the core role of spatiotemporal corridor constraints in improving dynamic environment adaptability. The lightweight collision detection module further optimizes real-time performance, reducing planning time by 12.4% to 31.2 milliseconds and energy consumption by 4.2%, reflecting the synergistic efficiency of Mahalanobis distance pre-screening and GJK precise detection. The introduction of the spatiotemporal attention module significantly improves the accuracy of dynamic obstacle prediction, with a sudden drop of 24.5% to 0.37 m in prediction error and a synchronous improvement of 4.2% in path smoothness, confirming the effectiveness of spatiotemporal feature fusion. The composite reward function, through balanced multi-objective optimization, compresses energy consumption to 12.6 kWh while achieving a success rate of over 93%, highlighting the engineering value of multi-objective balancing strategies. The final complete framework achieves deep collaboration of components through dynamic weight scheduling, achieving an industrial grade performance improvement of 93.8% with a marginal success rate of only 0.7%. The obstacle prediction error remains stable at 0.25 m, and the planning time is strictly controlled within 28.3 milliseconds. The superposition of various components presents significant nonlinear gains, such as the combination of dynamic reconnection and spatiotemporal attention, which reduces the cumulative prediction error by 59.7%. This confirms the system innovation of algorithm fusion design and provides a quantifiable technological breakthrough path for the complex dynamic obstacle avoidance requirements of industrial 5.0 scenarios.
3.4. Parameter Sensitivity Analys
To verify the rationality of hyperparameter selection in
Section 2.3, this study conducted sensitivity tests on key parameters in a typical industrial scenario with obstacle density of 1.5/m
3 and speed of 1.2 m/s. As shown in
Table 8, each parameter exhibits significant performance differences within the preset search space, demonstrating that the original parameter selection achieves Pareto optimality.
Parameter sensitivity analysis reveals the significant impact of key hyperparameters on system performance and the engineering rationality of their optimal configuration. When the initial learning rate is increased from 3 × 10−4 to 1 × 10−3, the task success rate decreases by 8.2% to 85.6%, and the planning time increases by 23.7% to 35.1 milliseconds, confirming that excessively high learning rates can cause policy networks to oscillate and become unstable; the discount factor γ = 0.99 achieves the best balance between long-term returns and immediate rewards, with a path length (1.97 m) reduced by 9.6% and energy consumption reduced by 5.6% compared to γ = 0.90, reflecting effective modeling of dynamic obstacle motion trends. When the safety reward coefficient α2 = 2.5, the safety violation rate is the lowest (0.12 times/task), but excessive emphasis on safety (α2 = 4.0) will increase the path detour rate by 14.3%, resulting in a 1.6% increase in energy consumption to 12.8 kWh, verifying the necessity of multi-objective trade-offs. By adjusting the collaborative weights of RRT* and DRL in real time, the dynamic weight decay rate λ = 0.8 reduces the system’s collaborative error by 37.6% (compared to λ = 0.5), and the planning time is strictly controlled within the industrial threshold of 28.3 milliseconds. All parameters exhibit a unimodal distribution within the preset search space, demonstrating that the original parameter selection achieves Pareto optimality. Its robustness maintains a success rate of 93.2% even in extreme tests with obstacle velocity fluctuations of ±40%, providing reliable tuning boundaries and stability guarantees for industrial deployment