1. Introduction
Robotic systems have evolved from simple, pre-programmed machines into intelligent, adaptive entities that play a critical role across numerous domains, including industry, healthcare, defense, and domestic environments. Initially restricted to executing repetitive tasks in structured settings, modern robots are now capable of operating autonomously in complex and dynamic environments. This transformation has been driven by significant advancements in sensing, actuation, computation, and artificial intelligence, particularly in areas such as robotic arms and autonomous manipulation. As robotic technology continues to advance, its integration into daily life and specialized sectors has become increasingly widespread. In industrial settings such as automotive manufacturing, aerospace, and logistics, robots perform intricate operations with precision and consistency. Simultaneously, in healthcare and service industries, robotic systems assist in surgeries, patient care, and hospitality. In domestic environments, the long-term vision is to develop robotic assistants capable of executing routine tasks with minimal human intervention, thereby enhancing convenience and efficiency [
1,
2].
Achieving accurate and robust trajectory tracking in robotic arms remains a fundamental challenge in advanced robotic systems. These manipulators typically exhibit nonlinear dynamics, strong joint coupling, multiple degrees of freedom, and complex multi-input multi-output behavior. In real-world applications, additional factors such as sensor noise, external disturbances, actuator saturation, and time-varying uncertainties further complicate precise motion control. Traditional control strategies, including PID controllers, have been widely adopted due to their simplicity and real-time responsiveness. However, they often struggle to maintain optimal performance in dynamically changing or uncertain environments. To address these limitations, researchers have explored various control methodologies, including adaptive control, fuzzy logic, neural networks, sliding mode control, and model-based optimization, yet consistently achieving high precision under uncertainty remains an open issue. In recent years, DRL has emerged as a promising paradigm for robotic control, offering the ability to learn optimal policies through interaction with the environment. DRL algorithms such as Deep Deterministic Policy Gradient (DDPG) and its advanced variant TD3 have demonstrated strong performance in continuous control tasks. TD3, in particular, addresses several stability and overestimation issues inherent in earlier RL methods, making it suitable for high-precision control applications. Nevertheless, pure RL approaches often face challenges in early-stage learning stability and safe exploration [
3].
Recent advancements in DRL have significantly improved control strategies for robotic systems, especially through the development of the TD3 algorithm, which addresses key limitations of the original DDPG, such as overestimation bias and training instability. One of the earliest contributions, Dankwa and Zheng [
4] (2019), introduced TD3 for modeling continuous robotic motion, demonstrating its advantages over conventional RL approaches. Subsequently, Kim et al. (2020) [
5] applied TD3 with hindsight experience replay to enhance motion planning in robotic manipulators, achieving smoother and more efficient trajectories. Hou et al. (2021) [
6] introduced a TD3-based control framework with a rebirth mechanism to enhance learning efficiency and adaptability in multi-DOF robotic manipulators. Similarly, Khoi et al. (2021) [
7] developed a control and simulation method for a 6-DOF biped robot using TD3, confirming its effectiveness in improving gait stability. Joshi et al. (2021) [
8] extended TD3 for industrial process control by proposing the Twin Actor Twin Delayed DDPG (TATD3) algorithm, which demonstrated improved control precision in chemical batch processes. The expansion of TD3 applications continued with Mosali et al. (2022) [
9], who employed a multistage TD3-based learning approach with achievement-based rewards for target tracking in UAVs, achieving robust performance under dynamic and uncertain conditions. In the field of space robotics, Song et al. (2024) [
10] proposed a trajectory planning strategy based on DRL, confirming the feasibility of DRL in highly complex and nonlinear aerospace systems. Most recently, Hazem et al. (2025) [
11,
12] conducted a comprehensive comparative study on reinforcement learning algorithms, DDPG, LC-DDPG, and TD3-ADX, for trajectory tracking of a 5-DOF Mitsubishi robotic arm. Their results highlighted the benefits of advanced RL-based control in terms of accuracy, adaptability, and robustness.
Despite the potential of TD3 in continuous control, standalone implementations often suffer from high early-training variance and poor disturbance rejection in unmodeled situations. Zhang et al. (2020) [
13] proposed the PID-Guided TD3 algorithm for spacecraft attitude control; by using a PID controller to steer learning, the system achieves faster convergence and higher precision amid torque saturation and external disturbances. In the context of vehicle platooning, Chen et al. (2023) [
14] introduced a two-layer hybrid where a TD3-tuned PID controller simplifies tuning and delivers smooth acceleration while maintaining tight inter-vehicle distance control. For autonomous navigation, Joglekar et al. (2022) [
15] fused deep RL-based longitudinal control with PID-based lateral control, balancing adaptability with stability. For humanoid cable-driven robots, Liu et al. (2024) [
16] combined a decoupling PID with deep RL, enhancing tracking performance in dynamic tasks. In HEV energy management, Zhou et al. (2021) [
17] demonstrated that an improved TD3 framework outperforms traditional RL strategies in terms of fuel efficiency, convergence speed, and robustness across diverse conditions. Lastly, Liu et al. (2024) [
16] designed a PID-augmented deep RL controller for target tracking with obstacle avoidance, achieving enhanced safety and precision in complex environments.
Table 1 presents the capabilities and limitations of DDPG, LC-DDPG, TD3-ADX, and hybrid PID + TD3 in robotic applications.
In this paper, a DRL-based hybrid control strategy is developed to address the trajectory tracking problem of a 5-DOF Mitsubishi RV-2AJ robotic arm (Mitsubishi Electric Corporation, Tokyo, Japan), leveraging advancements in model-free learning methods to overcome the inherent limitations of conventional model-based control. Traditional approaches such as PID controllers and ANFIS have been widely used due to their simplicity and interpretability; however, they often struggle to maintain robustness and accuracy in the presence of nonlinear dynamics, modeling uncertainties, and varying payloads. Instead of relying on explicit system dynamics, which are often difficult to derive accurately for highly nonlinear and coupled robotic systems, the proposed approach employs DDPG and its enhanced variants (LC-DDPG and TD3-ADX) as the foundation for learning precise and adaptive control policies. While standard DDPG provides a baseline solution for continuous control tasks such as robotic arm manipulation, its performance can be limited by sample inefficiency, overestimation bias, and early-stage instability. LC-DDPG addresses these issues by enforcing stability through Lyapunov-based constraints, and TD3-ADX mitigates overestimation bias while promoting dynamic exploration for faster and more reliable learning. Building on these model-free strategies, the framework is further enhanced with a hybrid PID + TD3 architecture, which integrates the short-term stability and deterministic disturbance rejection of PID control with the adaptive capabilities of TD3. This hybrid approach ensures safe exploration, faster convergence, and improved tracking accuracy under dynamic perturbations, unmodeled nonlinearities, and payload variations. In this study, the reward function is carefully designed using principles inspired by artificial potential field theory, balancing attraction toward the desired trajectory with repulsion from tracking deviations. This design promotes efficient convergence and robust policy learning, echoing modern reinforcement learning strategies for handling complex robotic tasks.
To enable safe and realistic agent training, a high-fidelity 3D simulation environment is developed using MATLAB/Simulink’s Simscape Multibody toolbox (MathWorks, Natick, MA, USA; R2024b). The simulation accurately replicates real-world physics, providing real-time feedback on joint positions, velocities, and end-effector coordinates essential for effective policy optimization. Simulation results validate the effectiveness of the proposed control strategies, demonstrating that hybrid PID + TD3 outperforms both LC-DDPG and TD3-ADX in terms of convergence speed, tracking precision, and disturbance rejection. These findings highlight the critical role of hybrid DRL-based controllers in equipping robotic systems with the flexibility, robustness, and autonomy required for operation in uncertain real-world environments. The key contributions of this paper are summarized as follows:
Proposes a hybrid PID + TD3 control framework that combines the deterministic stability of PID control with the adaptive capabilities of TD3, enabling safe exploration, faster convergence, and improved trajectory tracking accuracy for a 5-DOF Mitsubishi RV-2AJ robotic arm under dynamic and uncertain conditions.
Introduces an artificial potential field-inspired reward mechanism within the DRL framework to balance attraction toward the target trajectory and avoidance of tracking deviations, supporting robust and efficient learning.
Develops a high-fidelity 3D simulation platform using MATLAB/Simulink’s Simscape Multibody toolbox, ensuring accurate, real-time physics-based interactions essential for training and validating reinforcement learning agents in robotic control tasks.
Demonstrates through simulations that the hybrid PID + TD3 approach outperforms conventional model-based controllers and standard RL variants, including previously studied DDPG-based methods, in convergence speed, tracking precision, disturbance rejection, and overall robustness.
The remainder of this paper is organized as follows:
Section 2 presents the dynamic modeling of the Mitsubishi RV-2AJ robotic arm, deriving its nonlinear equations of motion.
Section 3 outlines the fundamentals of reinforcement learning and details the implementation of the hybrid PID + TD3 framework, while briefly referencing the previously studied DDPG variants (LC-DDPG and TD3-ADX) for context.
Section 4 presents the simulation results, evaluating the performance of the proposed hybrid controller under dynamic and uncertain conditions, with a focus on convergence speed, tracking precision, disturbance rejection, and overall robustness. Finally,
Section 5 concludes the paper by summarizing the findings and discussing future research directions in hybrid reinforcement learning-based control strategies for robotic systems.
2. Kinematic Analysis and Dynamic Modeling of the 5-DOF Mitsubishi RV-2AJ Robotic Arm
The 5-DOF Mitsubishi RV-2AJ robotic arm used in this study, shown in
Figure 1, consists of rigid links connected by revolute joints, providing precise industrial manipulation. Its five joints, base, shoulder, elbow, and two wrist axes, enable full positioning and orientation, with servo motors controlling joint angles (θ
1–θ
5) accurately. The arm is modeled as a serial-chain manipulator, with forward kinematics derived via the Denavit-Hartenberg (D-H) convention (
Table 2 presents the physical and kinematic parameters of the robotic arm). Its dynamic model, based on the Euler-Lagrange formulation, captures nonlinearities, joint coupling, and external disturbances, creating a realistic simulation environment. These accurate kinematic and dynamic models are crucial for implementing and training the hybrid PID + TD3 controller, ensuring safe exploration, faster convergence, and high-precision trajectory tracking in the simulated environment that closely reflects real-world robotic interaction. Detailed derivations follow the work of Z. B. Hazem et al. [
11,
12,
18].
The forward kinematic model of the robot can be calculated using Equation (1), where
is the homogeneous transformation matrix derived from the D-H parameters provided in Equation (2):
The dynamic model of the 5-DOF Mitsubishi RV-2AJ robot, derived from its kinematic representation, relates joint motions to the forces and torques acting on the system and can be expressed in matrix form as
where
, and
denote joint positions, velocities, and accelerations, respectively. D(θ) is the inertia matrix capturing the robot’s mass properties, C(
) represents Coriolis and centrifugal forces, and G(θ) accounts for gravitational effects. The applied joint torques are
represents friction torques, including viscous and potentially nonlinear components such as Coulomb or Stribeck effects. This compact representation provides a complete framework for simulation and control, serving as the basis for implementing the hybrid PID + TD3 controller. The control inputs for each joint are the torques (
,
,
,
,
), generated by the servo motors. These torques are implemented in the MATLAB/Simscape model, enabling accurate simulation and analysis of the robot’s dynamic behavior under various control scenarios. A corresponding CAD-based dynamic model was also developed in MATLAB/Simscape to validate the mathematical formulation.
Figure 2a,b illustrate the Simscape model and the CAD model, respectively. Both models were simulated under identical initial joint positions, showing excellent agreement and confirming the accuracy and reliability of the dynamic model (see
Figure 3).
5. Conclusions
In this study, a hybrid PID + TD3 trajectory tracking control framework was proposed and rigorously evaluated for the 5-DOF Mitsubishi RV-2AJ robotic arm. The controller combines the fast error correction and stabilizing capability of PID with the adaptive policy learning of the TD3 algorithm. Implemented in MATLAB/Simulink using a Simscape-based robotic model, the hybrid PID + TD3 controller was benchmarked against state-of-the-art DRL methods (TD3-ADX, LC-DDPG, and DDPG) across complex 3D trajectories, including N-shaped, helical, and spiral paths. The results demonstrate that hybrid PID + TD3 consistently achieved the lowest tracking errors across all evaluation metrics (IAE, ISE, ITAE, ITSE, and RMSE), with improvements of up to 50% compared to DDPG, 44% compared to LC-DDPG, and 30% compared to TD3-ADX. This superior performance stems from the synergistic integration of PID and TD3, where the PID component provides robust stability and rapid error correction, while the TD3 agent learns and adapts to the nonlinear coupled dynamics of the robotic arm. Furthermore, robustness analysis under internal (model uncertainties) and external (random torque disturbances) perturbations confirms the ability of the hybrid PID + TD3 to maintain smooth and stable control signals, outperforming all benchmark controllers in terms of disturbance rejection and control efficiency. While TD3-ADX exhibits notable improvements over LC-DDPG and DDPG due to adaptive exploration and bias reduction, its purely learning-based structure remains prone to transient oscillations and accumulated error in highly dynamic environments. By contrast, the hybrid PID + TD3 framework effectively balances fast transient response, adaptability, and robustness, making it the most reliable solution among the tested approaches. Nevertheless, it should be acknowledged that the current evaluation is limited to simulation, and challenges such as the Sim-to-Real transfer gap and the sensitivity of PID tuning parameters may affect real-world deployment. These factors highlight the importance of cautious translation of simulation results to physical platforms. Overall, the findings position hybrid PID + TD3 as a highly promising control strategy for advanced robotic manipulation and industrial automation applications, where precision, robustness, and adaptability are critical. Future research will focus on extending this framework to multi-robot collaboration, real-world implementation on physical robotic systems, and the incorporation of advanced learning paradigms such as meta-learning and attention-based control to further enhance generalization and performance in unstructured environments.