Abstract
In high-precision fields such as advanced manufacturing, semiconductor processing, aerospace assembly, and precision machining, motion control systems often face challenges such as large tracking errors and low control efficiency due to complex dynamic environments. To address this, this paper innovatively proposes a data-driven feedforward compensation control strategy based on a Parallel Gated Recurrent Unit (GRU)–Transformer. This method does not require an accurate model of the controlled object but instead uses motion error data and controller output data collected from actual operating conditions to complete network training and real-time prediction, thereby reducing data requirements. The proposed feedforward control strategy consists of three main parts: first, a Parallel GRU–Transformer prediction model is constructed using real-world data collected from high-precision sensors, enabling precise prediction of system motion errors after a single training session; second, a nonlinear PD controller is introduced, using the prediction errors output by the Parallel GRU–Transformer network as input to generate the primary correction force, thereby significantly reducing reliance on the main controller; and finally, the output of the nonlinear PD controller is combined with the output of the main controller to jointly drive the precision motion platform. Verification on a permanent magnet synchronous linear motor motion platform demonstrates that the control strategy integrating Parallel GRU–Transformer feedforward compensation significantly reduces the tracking error and fluctuations under different trajectories while minimizing moving average (MA) and moving standard deviation (MSD), enhancing the system’s robustness against environmental disturbances and effectively alleviating the load on the main controller. The proposed method provides innovative insights and reliable guarantees for the widespread application of precision motion control in industrial and research fields.
    Keywords:
                                                                    Parallel GRU–Transformer;                    nonlinear PD controller;                    feedforward compensation;                    precision motion platform        MSC:
                68T07
            1. Introduction
With the continuous development of automated manufacturing and high-precision assembly technologies, precision motion control has been playing an increasingly important role in modern industry [,]. In applications such as semiconductor production and micro/nano-manipulation, positioning accuracy at the micrometer or even nanometer level directly impacts product quality and system performance [,,]. However, the increasing complexity of control systems makes many high-performance control algorithms difficult to directly apply to actual equipment. Therefore, developing a control strategy that is structurally simple, easy to implement, and highly performant has become particularly necessary [,].
Traditional PID control, including adaptive PID and fuzzy PID, has been widely applied in precision motion platforms due to its simple structure and ease of tuning [,,,]. However, under conditions such as high-frequency noise, nonlinearity, and time-varying loads, PID control is often susceptible to disturbances []. To enhance system robustness, researchers have proposed various robust control methods, such as sliding mode control (SMC), which addresses uncertainties and disturbances by designing a sliding surface, making it suitable for strongly nonlinear systems []. However, SMC suffers from chattering issues, which affect system stability and equipment lifespan. Nguyen et al.’s adaptive sliding mode control mitigated chattering to some extent and accelerated system trajectory convergence []; H∞ control enhances robustness by minimizing worst-case gains and is widely applied in precision positioning and tracking [,]. Jose et al. [] designed a multivariable controller for a high-precision 6-DOF magnetic levitation positioner and employed a discrete hybrid H2/H∞ filter as the observer. Chen et al. [] proposed a new observer-based adaptive robust controller (obARC) to address the lack of velocity measurements and compensate for dynamic uncertainties. However, these robust control methods often face limitations such as chattering, sensitivity to modeling errors, and high design complexity. In repetitive tasks, iterative learning control (ILC) and repeated control have demonstrated good trajectory tracking performance [,,]. Zhang et al. proposed an accelerated convergence-based PD-type ILC to improve trajectory tracking for permanent magnet synchronous motors (PMLSMs) []. Zheng et al. parallelly integrated adaptive sliding mode control (ASMC) with ILC, achieving both robustness and repeatability in control without an accurate dynamics model []. However, in actual engineering applications, repeatedly learning each new trajectory remains time-consuming and labor-intensive.
Besides the above strategies, advanced control methods have also been proposed to enhance robustness, convergence speed, and tracking accuracy under uncertainties. Meng et al. developed an adaptive fixed-time stabilization approach, ensuring convergence within a fixed time and avoiding singularity issues []. Lan and Zhao combined Padé approximation-based preview repetitive control with equivalent input disturbance compensation to improve tracking precision and disturbance rejection []. Wang et al. introduced a prescribed performance adaptive robust control scheme for robotic manipulators, keeping tracking errors within predefined bounds despite uncertainties []. While these approaches have proven effective, the rise of artificial intelligence offers new opportunities to further enhance adaptability and performance in complex, nonlinear, and highly dynamic environments. Feng et al. proposed an adaptive sliding mode control (SMC-RBF) based on radial basis function (RBF) neural networks, effectively compensating for system uncertainties and improving dynamic performance []; Hasan et al. designed an adaptive neural network (ANNFOPID) structure combining a nonlinear fractional-order PID controller, utilizing RBF estimation of unknown disturbances to enhance the robustness of the control system []; Yang et al. proposed an adaptive dual-neural network sliding mode control (ADNSMC) by integrating recurrent neural networks (RNNs) with RBF neural networks and combining them with non-singular fast terminal sliding mode control (NFTSMC) to improve accuracy and convergence speed []; Hu et al. utilized neural networks for feedforward compensation to achieve pre-correction of tracking errors [,,]; and Zhou proposed an intelligent gated recurrent unit (GRU) real-time iterative compensation (RIC) position-loop feedforward compensation control method, balancing offline compensation and real-time iterative compensation, effectively reducing residual error []. However, these neural network methods often face challenges such as high computational complexity and insufficient real-time performance in high-speed, high-dynamic environments, especially in embedded systems or resource-constrained scenarios, where latency and computational overhead can easily lead to degraded control performance.
This paper proposes a novel data-driven feedforward compensation control strategy that utilizes a Parallel GRU–Transformer network for efficient prediction of motion errors in precision motion control systems. Compared to traditional prediction methods based on single GRU or Transformer models, the proposed Parallel GRU–Transformer network combines the local sequence feature capture capability of a GRU with the global dependency modeling advantage of a Transformer [], enabling accurate and efficient prediction of the next moment’s motion error using two types of simplified input data: motion error and controller output. Subsequently, by feeding the prediction error into a nonlinear PD controller [] to generate a feedforward compensation signal and driving the motion platform together with the main controller’s output, the system completes the compensation action before the actual error occurs, significantly reducing the amplitude of the motion error. This method effectively alleviates the real-time adjustment burden of the feedback controller and further improves the response speed and accuracy of the control system by adapting to the inherent nonlinear characteristics of the system through the nonlinear PD controller. Additionally, the proposed feedforward compensation scheme demonstrates significant versatility and convenience, enabling seamless integration with existing controller architectures. Experimental validation on a permanent magnet synchronous linear motor motion platform confirms that the method effectively reduces system tracking error and variability metrics, such as MA (moving average) and MSD (moving standard deviation), under various operating conditions, thereby enhancing the system’s robustness against external disturbances and control stability.
The main contributions of this paper are as follows:
(1) A Parallel GRU–Transformer prediction model is proposed. By reasonably constructing the training dataset, the model can accurately model the temporal dynamic characteristics of the system without requiring an accurate model of the controlled object. It only uses the motion error and controller output under actual operating conditions as network inputs, requires a small amount of data, and can effectively predict the motion error at future time instants.
(2) An efficient feedforward compensation control strategy based on a nonlinear PD controller is designed. The proposed method can be directly deployed in actual engineering systems and can be efficiently integrated with existing control strategies, significantly simplifying the implementation difficulty in actual industrial sites.
(3) Through experiments conducted on a permanent magnet synchronous linear motor motion platform under different operating conditions, the effectiveness and robustness of the proposed control method under actual operating conditions are verified, demonstrating its excellent industrial practical value.
The structure of this paper is as follows: Section 1 models and analyzes the system and introduces the main characteristics of the motion platform. Section 2 discusses the feedforward compensation control strategy based on the Parallel GRU–Transformer neural network prediction, detailing the network structure, training process, feedforward compensation method, and system stability analysis. Section 3 verifies the prediction effect through comparative experiments and the performance of the proposed control strategy under different operating conditions. Finally, Section 4 summarizes the research results.
2. System Modeling
The PMLSM is a highly coupled, multivariable, and intricate nonlinear system necessitating decoupling analysis. In the - stationary coordinate system, the voltage equation of the stator’s two-phase winding is articulated as in (1).
      
        
      
      
      
      
    
Here,  and  are the stator voltages in the - coordinate system and  and  are the stator currents in the - coordinate system; R is the phase resistance;  and  are the stator inductance;  represents the differential operator;  is the angular velocity; and  and  are the extended back electromotive force, expressed as in (2).
      
        
      
      
      
      
    
The extended back electromotive force contains positional information, from which the rotor electrical angular velocity  and electrical angle  can be extracted, represented as (3).
      
        
      
      
      
      
    
For ease of control, the - coordinate system is often transformed through rotation to the rotor synchronous rotation coordinate system (d-q coordinates). The mathematical model in d-q rotating coordinates is represented as in (4).
      
        
      
      
      
      
    
Here,  and  are the stator voltages on the d- and q-axes,  and  are the stator currents on the d- and q-axes, and  and  are the induced electromotive forces on the d- and q-axes. When selecting an appropriate coordinate system such that the magnetic flux  of the permanent magnet is completely on the d-axis,  can be set to 0, .  contains speed information, and the position information of the rotor can be obtained by integrating the speed , represented as (5).
      
        
      
      
      
      
    
The electromagnetic torque and mechanical motion equation of the motor are expressed as (6) and (7).
      
        
      
      
      
      
    
      
        
      
      
      
      
    
      where  is the number of motor poles,  is the stator permanent magnet flux,  is the mechanical angular velocity of the motor,  is the electromagnetic torque,  is the load torque, B is the viscous resistance coefficient, and J is the rotor moment of inertia.
The above model can achieve an accurate description of the electromagnetic and motion characteristics of the PMLSM, laying a theoretical foundation for subsequent control system design. The next section will introduce the detailed design process of the control architecture.
2.1. Control Architecture Based on Parallel GRU–Transformer Feedforward Compensation
2.1.1. Structure of Parallel GRU–Transformer
This paper proposes a Parallel GRU–Transformer neural network architecture that aims to combine the advantages of gated recurrent units (GRUs) and Transformers to simultaneously capture local dynamics and global dependencies in time series. As shown in Figure 1, the architecture consists of two parallel branches, namely the GRU branch and the Transformer branch, which extract complementary features through different mechanisms.
      
    
    Figure 1.
      Structure of Parallel GRU–Transformer.
  
A GRU is an improved version of the Long Short-Term Memory (LSTM) network [], simplifying the gating mechanism to enhance training efficiency while retaining strong sequence modeling capabilities. The GRU structure only includes an update gate and a reset gate, omitting explicit memory units. The update gate controls the extent to which information from the previous hidden state is retained in the current state, as shown in (8).
      
        
      
      
      
      
    
In this equation,  is the sigmoid activation function,  is the learnable weight matrix,  denotes the concatenation of the previous hidden state and the current input, and the reset gate is used to control the degree of selective forgetting of the previous hidden state, as shown in (9).
      
        
      
      
      
      
    
Under reset door control, the candidate hidden state can be represented as (10).
      
        
      
      
      
      
    
Here, ⊙ denotes element-wise multiplication and  is the hyperbolic tangent activation function. The final hidden state is obtained by weighting and fusing the historical states and candidate states through the update gate, as shown in (11).
      
        
      
      
      
      
    
The Transformer architecture is based on a self-attention mechanism, which explicitly introduces sequence position information through position encoding to capture long-range dependencies. The position encoding calculation methods are shown in (12) and (13).
      
        
      
      
      
      
    
      
        
      
      
      
      
    
In this context,  denotes the sequence position, i denotes the index of the encoding dimension, and  denotes the feature dimension of the model. The input feature matrix X undergoes linear mapping to obtain the query Q, key K, and value V, which are represented as (14)–(16).
      
        
      
      
      
      
    
      
        
      
      
      
      
    
      
        
      
      
      
      
    
Here, , , and  are the corresponding weight matrices. Then, the attention output is obtained using the scaled point-wise attention mechanism, expressed as in (17).
      
        
      
      
      
      
    
In the equation,  is the scaling factor to prevent the dot product from becoming too large, which would cause the gradient to disappear. To further improve the model’s expressive power, the Transformer uses a multi-head attention mechanism to divide the input into multiple subspaces, calculate the attention separately, and then concatenate them to form (18).
      
        
      
      
      
      
    
Here, the calculation formula for the i-th head is expressed as (19).
      
        
      
      
      
      
    
, , and  are the projection matrices of the i-th head and  is the output mapping matrix, which maps the multi-head attention outputs back to the original model feature space.
In the Parallel GRU–Transformer architecture, the input sequence is fed into two branches, GRU and Transformer, for parallel processing. In the GRU branch, the input sequence is flattened and then enters a GRU layer containing 10 units, followed by a ReLU activation function and a fully connected layer. A specific indexing layer extracts the features at the last moment of the sequence to capture local dynamic information. The Transformer branch fuses the original sequence features through position encoding and uses two self-attention layers (each with four attention heads and 64-dimensional key channels) to capture global dependency features in the sequence. Finally, an indexing layer extracts the feature representations at the end of the sequence.
The features from both branches are then concatenated and fused, and through ReLU activation and fully connected mapping to the target output space, the final prediction results are generated. This parallel structure effectively overcomes the shortcomings of a single GRU in capturing long-range dependencies, while also addressing the limitations of Transformers in perceiving fine-grained local dynamics, providing a complex time series prediction method that is both efficient and robust.
2.1.2. Nonlinear PD Controller
Traditional linear PD control often struggles to balance fine control in small-error regions with fast convergence in large-error regions when dealing with systems with large-scale variations and strong disturbances. Therefore, a PD control form based on nonlinear gain scheduling has been proposed in the literature, whose core idea is to introduce a nonlinear function  to the error and its derivative and to characterize the gain requirements of different error intervals in a segmented or nonlinear manner. This is expressed as (20).
      
        
      
      
      
      
    
          where  and , respectively, represent the position error and the differentiated position error,  and  are the gain coefficients, and , , and  determine the shape and threshold of the segmented nonlinearity. The function  is represented as (21).
      
        
      
      
      
      
    
The nonlinear PD controller adjusts its effective gain according to the magnitude of the tracking error. In the small-error region (), the gain is proportionally reduced to limit overshoot and improve noise tolerance. In the large-error region (), the gain is increased following a nonlinear power law, enabling rapid suppression of large deviations. Parameters  and  determine the degree of nonlinearity: smaller values provide smoother control near the origin, while values closer to 1 improve responsiveness for large errors. Gains  and  balance the contributions from position and velocity feedback, and  defines the boundary between the small- and large-error regions, chosen based on sensor resolution and noise level. Compared with a conventional PD controller with fixed gains, the nonlinear PD applies a higher gain only when necessary, achieving faster recovery from large disturbances while maintaining stability and high accuracy near the target.
2.1.3. Tracking Error Prediction Based on Parallel GRU–Transformer Neural Network
This study collected a high-quality dataset of 100 s of data on a high-precision motion platform driven by a permanent magnet synchronous motor for training a Parallel GRU–Transformer network. The platform is equipped with a m resolution optical grating displacement sensor, and the output signal is decoded orthogonally by a  sampling FPGA board to obtain high-precision real-time displacement information. The main controller collects the actual position at a frequency of , compares it with the fourth-order polynomial reference trajectory point by point, calculates the tracking error, and generates the controller output through a feedback loop. The complete trajectory is composed of multiple segments of fourth-order polynomials, as shown in Figure 2. The load is kept constant during the experiment.
      
    
    Figure 2.
      (a) Snap trajectory. (b) Jerk trajectory. (c) Acceleration trajectory. (d) Velocity trajectory. (e) Position trajectory.
  
All raw data are first normalized to [−1, 1] using Min–Max normalization after acquisition to ensure that all variables are uniformly distributed within similar numerical intervals, thereby improving the convergence speed and stability of network training. After normalization, the tracking error sequence  and the corresponding control output sequence  for the first N time steps are combined in time order to form a complete input matrix , which is used as the input for the Parallel GRU–Transformer training set. The  at time step N+1 is used as the output. Its format is defined as (22) and (23).
      
        
      
      
      
      
    
      
        
      
      
      
      
    
          where  denotes the tracking error at the ith () discrete time step, whereas  signifies the controller output at the same time step, together encapsulating the dynamic evolution of the system during the historical phase. Combine the error  over N time steps and the controller output  into a matrix  with dimensions . The comprehensive input data matrix  can be derived in the following precise format (24).
      
        
      
      
      
      
    
After multiple experiments, the window length N was determined to be 10. Finally, the normalized data was divided into training and validation sets in a ratio of 8:2 to comprehensively evaluate the performance of the Parallel GRU–Transformer in error prediction and feedforward compensation.
2.1.4. Parallel GRU–Transformer-Based Feedforward Compensation Framework
This section gives the servo control architecture of the PMLSM motion platform, as shown in Figure 3. The total control input of the system consists of the following three parts, expressed as (25).
      
        
      
      
      
      
    
          where  is the feedback controller output,  is the feedforward compensation output, and  is the control quantity of the next time error predicted based on the Parallel GRU–Transformer network and corrected by the nonlinear PD module.  acts on the plant, and the output is the actual position . The error signal  of the system is defined by the difference between the reference trajectory  and the actual output  as in (26).
      
        
      
      
      
      
    
      
    
    Figure 3.
      Servo control architecture based on the proposed Parallel GRU–Transformer feedforward compensation.
  
The feedback controller adopts a PID controller, whose output is expressed as (27).
      
        
      
      
      
      
    
Feedforward compensation consists of acceleration and velocity feedforward terms, expressed as (28).
      
        
      
      
      
      
    
          where  is the acceleration trajectory,  is the velocity trajectory, and  and  are the acceleration and velocity feedforward gains, respectively, used to quickly compensate for the dynamic changes in high-order motion. The nonlinear PD controller receives the next time error output predicted by the Parallel GRU–Transformer network and performs nonlinear correction on it. The Parallel GRU–Transformer network takes  and  as inputs and generates a prediction error, expressed as (29).
      
        
      
      
      
      
    
          where  represents the next time position error predicted by the Parallel GRU–Transformer network. Based on this, the output of the nonlinear PD controller is expressed as (30).
      
        
      
      
      
      
    
The total system control input  combines , , and  to act on the controlled object to ensure the stability and accuracy of the system when dealing with complex dynamic characteristics. The results of the prediction and tracking error will be shown in the next chapter.
2.1.5. Stability Analysis
According to the previous subsection, the linearized error dynamics equation is expressed as (31).
      
        
      
      
      
      
    
          where A and B are the state matrix and input matrix of the system (linearized near the equilibrium point). Define the candidate Lyapunov function as (32).
      
        
      
      
      
      
    
When and only when ,  and . For any non-zero state,  is obvious, so  satisfies the positive definiteness condition of the Lyapunov function. Differentiating  yields (33).
      
        
      
      
      
      
    
Substituting into the system error dynamics equation yields (34).
      
        
      
      
      
      
    
Here,  is approximated by  from (31), neglecting higher-order nonlinear terms near the equilibrium point for analytical tractability. Let the state vector be (35).
      
        
      
      
      
      
    
Therefore,  is expressed as (36).
      
        
      
      
      
      
    
By appropriately choosing , , , and , the symmetric matrix Q can be made negative definite, which yields the existence of  such that (37) holds.
      
        
      
      
      
      
    
This guarantees exponential convergence of the tracking error. Although this is not strict finite-time convergence, it provides sufficiently fast decay in practice to meet the application requirements. Here,  represents the lower bound of the convergence rate; a larger  results in faster error decay. The matrix Q is expressed as (38).
      
        
      
      
      
      
    
The  function in the previous subsection has global Lipschitz continuity, so it satisfies global boundedness; i.e., there exists a normal number  that satisfies (39).
      
        
      
      
      
      
    
Based on the training process of the Parallel GRU–Transformer network, the neural network outputs a one-step-ahead position error prediction, denoted by  (aligned to time t). The nonlinear PD compensator uses  and its discrete-time derivative, as given in (40).
      
        
      
      
      
      
    
          where  is the sampling interval. Since the network is trained on bounded trajectories and operates within the plant’s physical limits, it is reasonable to assume the boundedness for some positive constants , as given in (41).
      
        
      
      
      
      
    
With a globally Lipschitz  nonlinearity (previous subsection), the nonlinear PD input satisfies (42).
      
        
      
      
      
      
    
          where  and  is a positive constant determined by . Therefore, under the globally Lipschitz  nonlinearity, the nonlinear PD compensation control law yields a globally bounded input, which can be regarded as a bounded disturbance to the closed-loop system.
Considering the closed-loop system error state , the overall closed-loop dynamics can be expressed as (43).
      
        
      
      
      
      
    
Since the system is locally asymptotically stable when there is no disturbance (), there exists an Input-to-State Stability (ISS)-type Lyapunov function  satisfying (44).
      
        
      
      
      
      
    
          where  is a  class function and  is a  class function. If the controller parameters are chosen appropriately, the closed-loop system gain is sufficiently small to satisfy the small gain condition; i.e., there exists a constant  satisfying (45).
      
        
      
      
      
      
    
Then, according to the small gain theorem, the overall system is ISS-stable.
3. Experimental Investigation
3.1. Experimental Setup
An experimental platform was constructed to validate the real-time performance and predictive accuracy of the proposed method, incorporating essential modules, including an upper computer, a real-time simulator, an FPGA board, an analog output board, a driver, and a motion platform, as depicted in Figure 4. Bidirectional exchange of high-speed data and instructions between the host computer and the real-time simulator is facilitated via the IP protocol. The real-time simulator performs the primary functions of real-time control and algorithmic computation in this process: it implements control strategies derived from the gathered sensor data, turns control signals into current via the analog output board, and transmits it to the driver. The driver accurately maneuvers the motion platform using current regulation. A high-precision grating ruler serves as a position sensor on the motion platform, with its output signal being gathered and preprocessed by an FPGA board before being relayed to the real-time simulator, therefore establishing a closed-loop control system. The system incorporates high-bandwidth, low-latency communication techniques and real-time data processing capabilities, guaranteeing the precision and immediacy of motion control, while also offering robust experimental validation for the reliability and efficacy of the proposed predictive model. The hardware specifications are presented in Table 1. The controller executes the algorithm at a sampling frequency of 5 KHz, and the standard deviation of the system position noise measured using an accelerometer is 6.03 × 10−8 m, as shown in Figure 5. The experimental research is divided into two parts: prediction verification and control verification.
      
    
    Figure 4.
      Experimental setup of motion platform.
  
       
    
    Table 1.
    Experimental configuration.
  
      
    
    Figure 5.
      Position noise of the Y-axis.
  
3.2. Prediction Validation
In this section, we will validate the effectiveness of the Parallel GRU–Transformer model in error prediction tasks based on two sets of reference trajectories. During the training process, Mean Squared Error (MSE) is selected as the loss function to minimize the sum of squared differences between predicted and true values to guide the updating of network parameters. The loss function is set to (46).
      
        
      
      
      
      
    
        where  is the predicted output of the model,  is the true error, and T is the data length of the current batch. To improve training efficiency and maintain good convergence performance, an ADAM optimizer is used with a fixed number of epochs of 1000, batch size of 128, and an Intel Core Ultra 9 CPU. Dropout with a rate of 0.2 is applied to prevent overfitting, and regularization of L2 with a coefficient of 0.0001 is used to penalize large weights. The Adam optimizer is adopted with an initial learning rate of 0.001, , , and a weight decay coefficient of 0.0001. An exponential decay schedule is applied every 500 epochs to reduce the learning rate by a factor of 0.1. The momentum is set to 0.96 to help accelerate convergence. In the testing phase, to measure the accuracy and relative error level of the prediction results, this study used two indicators, the coefficient of determination  [] and the symmetric mean absolute percentage error (sMAPE) [], defined as follows in (47) and (48).
      
        
      
      
      
      
    
      
        
      
      
      
      
    
        where  is the average of the true errors. The closer  is to 1, the better the fit of the model to the error trend; the smaller the sMAPE, the lower the deviation of the prediction from the true value.
Case 1—Fourth-Order Polynomial Trajectory: A fourth-order polynomial motion trajectory was chosen as a reference input to model the dynamics of a system exhibiting segmented acceleration and deceleration features. Despite the overall trajectory exhibiting relative smoothness, substantial transitions between acceleration and jerk will occur during important intervals, offering valuable temporal properties for the predictive model. This trajectory exemplifies typical acceleration and deceleration switching conditions in industrial processes, allowing for an effective evaluation of the model’s overall predictive capability throughout both stationary and transitional phases. Figure 6a,b, respectively, illustrate the displacement trajectory and the variation in tracking error during the motion process. During the segmented intervals of acceleration and deceleration, the system error exhibits fluctuations with appropriate amplitudes. As shown in Figure 6c, the feedback controller will dynamically modify in response to variations in inaccuracy to sustain high-precision positioning to the greatest extent practicable. Figure 6d juxtaposes the predicted values of the Parallel GRU–Transformer model with the actual tracking error, whereas Figure 6e elucidates the specifics of the localized region. The findings achieved are presented in Table 2 in comparison to mainstream prediction models such as LSTM, GRU, Transformer, Parallel LSTM–Transformer, and GRU–Transformer architectures from Zheng et al. and Zhou et al., similar to our proposed structure [,].
      
    
    Figure 6.
      (a) Reference displacement trajectory. (b) Tracking error. (c) Feedback controller output. (d) Comparison between predicted and true values. (e) Magnified plot of (d).
  
       
    
    Table 2.
    R2 and smape metrics for different models.
  
Case 2—Random Multi-Sine Trajectory: To further examine the model’s adaptability in scenarios characterized by random and high-frequency disturbances, this study employs three sine functions with varying frequencies and phases to superimpose and create reference trajectories, thus constructing a complex input sequence with multi-scale fluctuations. The expression is displayed below (49). Subsequent experiments demonstrate that the Parallel GRU–Transformer continues to accurately forecast the error curve, exhibiting a strong correlation with the actual error, as illustrated in Figure 7, in comparison to the state-of-the-art prediction model, as illustrated in Table 3.
      
        
      
      
      
      
    
      
    
    Figure 7.
      (a) Reference displacement trajectory. (b) Tracking error. (c) Feedback controller output. (d) Comparison between predicted and true values. (e) Magnified plot of (d).
  
       
    
    Table 3.
    R2 and smape metrics for different models.
  
The experimental findings indicate that the Parallel GRU–Transformer network achieves high predictive accuracy across two distinct temporal modalities: a fourth-order polynomial trajectory and a random multi-sine trajectory. The trained network can generalize to different trajectory types and sustain stable error estimation under high-amplitude or high-frequency transitions, providing a promising basis for precise feedforward compensation and online dynamic correction. Furthermore, we conducted direct comparisons with the GRU–Transformer architectures proposed by Zheng et al. and Zhou et al., which adopt deeper encoder blocks and larger hidden dimensions, resulting in an inference latency of around . While these models perform well in general scenarios, our approach achieves higher accuracy on the present dataset with a substantially lower inference latency of only , making it more suitable for real-time embedded deployment in precision motion control.
3.3. Control Validation
The prior training results indicate that the developed neural network is capable of forecasting the subsequent error signal for the motion platform at a specified time. This research presents an extra feedforward compensation mechanism within the closed-loop control framework to attain dynamic compensation based on its predictive performance. The feedback loop employs a conventional PID controller to maintain high precision despite fluctuating operating conditions, owing to its strong regulatory capacity. The error information forecasted by the neural network for the subsequent moment is integrated into the nonlinear PD controller to deliver feedforward correction for the feedback output, thereby facilitating prompt compensation for potential deviations during the motion process.
The following common performance indices will be used to evaluate the quality of the control algorithms:
(1) The maximum value of the error and the absolute mean (AM), expressed as (50) and (51).
      
        
      
      
      
      
    
      
        
      
      
      
      
    
(2) The maximum value of the moving average (MA) and the absolute mean of the MA. MA is a dynamic metric employed to ascertain the average variance of data within a predetermined window, emphasizing the overarching trend, expressed as (52) and (53).
      
        
      
      
      
      
    
      
        
      
      
      
      
    
(3) The maximum value of moving standard deviation (MSD) and the absolute mean of MSD. The MSD represents the standard deviation of data inside a sliding window, reflecting the extent of data variability. A greater standard deviation indicates more significant oscillations in the data, expressed as (54) and (55).
      
        
      
      
      
      
    
      
        
      
      
      
      
    
        where T represents the total running time of the relevant experimental section,  represents the sampling time, , M is the number of sampling points, N is the size of the sliding window,  is the kth error value in the time series, i is the discrete sampling point index, and  represents the  at the ith sampling point.
Based on the actual needs of the laboratory, this study set the window size to  and selected PID feedback control and a combination of model-based feedforward compensation and PID feedback control strategies as control benchmarks to compare and evaluate the performance of the proposed Parallel GRU–Transformer-based feedforward compensation method. This research also illustrated the effect of Parallel GRU–Transformer-based feedforward compensation on the main controller’s output.
S1: PID Feedback. This method employs a typical parallel PID controller, represented by its transfer function as (56).
      
        
      
      
      
      
    
The PID parameters are configured as , , and , which were determined through extensive experimental tuning to achieve an optimal balance between tracking accuracy, stability, and robustness under various operating conditions.
S2: Model-based feedforward compensation utilizing PID feedback. The feedforward structure employs velocity and acceleration feedforward compensation derived from the system’s inverse model, optimizing dynamic performance by suitably adjusting the compensation coefficient, thereby enhancing trajectory tracking of the controlled object.
S3: Parallel GRU–Transformer-based feedforward compensation with PID feedback. The feedforward compensation control algorithm provided in this paper is used, and the speed and acceleration feedforward compensation are the same as in S2. To achieve a balance between high sensitivity and rapid convergence in both small- and large-error domains, the parameter ranges were determined based on the prior literature and preliminary simulations, followed by a grid search with performance evaluations on multiple reference trajectories to ensure stability, accuracy, and actuator smoothness. Extensive experimentation led to the final determination of the nonlinear PD controller parameters: the proportional gain coefficient  is 10,000, the differential gain coefficient  is 200, and the nonlinear amplitude adjustment parameters are  and , with a segmentation threshold  of .
This presents the control effects of the identical fourth-order polynomial trajectory and random multiple-sine trajectory as utilized in the prediction validation.
Case 1—Fourth-order polynomial trajectory: Figure 8 illustrates the tracking error along with its respective moving averages (MAs) and mean square deviations (MSDs) across various control methods, while Table 4 enumerates the different indicators. The results showed that S3 was superior to S2 and S1 in all indications, with an  of  m2 for S3, which was nearly 1/7 lower than S2’s  m2. Figure 9a illustrates that the feedforward compensation output derived from Parallel GRU–Transformer predictions supplants the primary control responsibilities of the PID controller during most intervals, thereby markedly diminishing the workload and error magnitude of the main controller. M1 represents feedforward compensation output based on the Parallel GRU–Transformer, and M2 represents the PID controller output.
      
    
    Figure 8.
      (a) Tracking error curve. (b) MA curve. (c) MSD curve.
  
       
    
    Table 4.
    Tracking performance indexes.
  
      
    
    Figure 9.
      (a) Comparison of PID output and feedforward compensation output based on Parallel GRU–Transformer under fourth-order polynomial trajectory. (b) Comparison of PID output and feedforward compensation output based on Parallel GRU–Transformer under random multiple-sine trajectory.
  
Case 2—Random Multiple-Sine Trajectory: This case involves conducting comparative experiments to evaluate the performance of tracking random multiple-sine curves. Similarly, Figure 10 displays the error curves, MA, and MSD under different techniques; Table 5 summarizes the important performance measures. From the data, it can be seen that S3 still has significant advantages, with an  value of only , which is less than half of S2’s . As shown in Figure 9b, the Parallel GRU–Transformer-based feedforward output still bears most of the control energy in this environment, effectively reducing the output pressure of the PID.
      
    
    Figure 10.
      (a) Tracking error curve. (b) MA curve. (c) MSD curve.
  
       
    
    Table 5.
    Tracking performance indexes.
  
4. Conclusions
In conclusion, the nonlinear PD feedforward compensation strategy based on Parallel GRU–Transformer prediction can significantly reduce the tracking error as well as the moving average (MA) and moving standard deviation (MSD) of motion errors under two completely different trajectory conditions, without requiring an accurate model of the controlled system. Additionally, the strategy effectively reduces the workload of the main controller. Due to the low dimensionality of network input features, the training process is straightforward and efficient. On our experimental device, the trained model has a small computational footprint, and inference completes within , fully within the control-cycle budget, indicating that the method can be deployed without significant additional resources while meeting real-time requirements. These advantages suggest that the method has a certain level of robustness and good generalization capability, making it suitable for high-precision motion control tasks in fields such as semiconductor equipment and precision robotics, with broad prospects for practical engineering use and potential for extension to other industrial applications.
Author Contributions
Conceptualization, J.W.; Methodology, Y.W. and B.L.; Software, Y.W.; Validation, J.W.; Data curation, K.G.; Writing — original draft, Y.W.; Writing — review & editing, J.X.; Visualization, K.G.; Supervision, B.L. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Wang, Z.; Zhou, R.; Hu, C.; Zhu, Y. Online iterative learning compensation method based on model prediction for trajectory tracking control systems. IEEE Trans. Ind. Inform. 2022, 18, 415–425. [Google Scholar] [CrossRef]
 - Iwasaki, M.; Seki, K.; Maeda, M. High-precision motion control techniques. IEEE Ind. Electron. Mag. 2012, 6, 32–40. [Google Scholar] [CrossRef]
 - Li, L.; Liu, Y.; Li, L.; Tan, J. Kalman-filtering-based iterative feedforward tuning in presence of stochastic noise: With application to a wafer stage. IEEE Trans. Ind. Inform. 2019, 15, 5816–5826. [Google Scholar] [CrossRef]
 - Kuang, Z.A.; Sun, L.T.; Gao, H.J.; Tomizuka, M. Practical fractional-order variable-gain supertwisting control with application to wafer stages of photolithography systems. IEEE/ASME Trans. Mechatron. 2022, 27, 214–224. [Google Scholar] [CrossRef]
 - Li, L.; Zhao, H.Y.; Liu, Y. Self-tuning nonlinear iterative learning for a precision testing stage: A set-membership approach. IEEE Trans. Ind. Inform. 2023, 19, 7995–8006. [Google Scholar] [CrossRef]
 - Zhao, H.; Li, L.; Cui, N.; Liu, Y.; Tan, J. Iterative control tuning: With application to MIMO precision motion systems. IEEE/ASME Trans. Mechatron. 2024, 29, 1–12. [Google Scholar] [CrossRef]
 - Guc, A.F.; Yumrukcal, Z.; Ozcan, O. Nonlinear identification and optimal feedforward friction compensation for a motion platform. Mechatronics 2020, 71, 102408. [Google Scholar] [CrossRef]
 - Khodayari, M.H.; Balochian, S. Modeling and control of autonomous underwater vehicle (AUV) in heading and depth attitude via self-adaptive fuzzy PID controller. J. Mar. Sci. Technol. 2015, 20, 559–578. [Google Scholar] [CrossRef]
 - Wu, B.; Han, X.; Hui, N. System identification and controller design of a novel autonomous underwater vehicle. Machines 2021, 9, 109. [Google Scholar] [CrossRef]
 - Anderlini, E.; Parker, G.G.; Thomas, G. Control of a ROV carrying an object. Ocean Eng. 2018, 165, 307–318. [Google Scholar] [CrossRef]
 - Chin, C.S.; Lau, M.W.S.; Low, E.; Seet, G.G.L. Robust controller design method and stability analysis of an underactuated underwater vehicle. Int. J. Appl. Math. Comput. Sci. 2006, 16, 345–356. [Google Scholar]
 - Bingul, Z.; Gul, K. Intelligent-PID with PD feedforward trajectory tracking control of an autonomous underwater vehicle. Machines 2023, 11, 300. [Google Scholar] [CrossRef]
 - Wu, L.; Liu, J.; Vazquez, S.; Mazumder, S.K. Sliding mode control in power converters and drives: A review. IEEE/CAA J. Autom. Sin. 2022, 9, 392–406. [Google Scholar] [CrossRef]
 - Nguyen, T.H.; Nguyen, T.T.; Nguyen, V.Q.; Le, K.M.; Tran, H.N.; Jeon, J.W. An adaptive sliding-mode controller with a modified reduced-order proportional integral observer for speed regulation of a permanent magnet synchronous motor. IEEE Trans. Ind. Electron. 2022, 69, 7181–7191. [Google Scholar] [CrossRef]
 - Doyle, J.C.; Francis, B.A.; Tannenbaum, A. Feedback Control Theory; Macmillan: New York, NY, USA, 1998. [Google Scholar]
 - Zhou, K.; Doyle, J.C. Essentials of Robust Control; Prentice Hall: Upper Saddle River, NJ, USA, 1998. [Google Scholar]
 - Silva-Rivas, J.C.; Kim, W. Multivariable control and optimization of a compact 6-DOF precision positioner with hybrid/and digital filtering. IEEE Trans. Control Syst. Technol. 2013, 21, 1641–1651. [Google Scholar] [CrossRef]
 - Chen, Z.; Zhou, S.; Shen, C.; Lyu, L.; Zhang, J.; Yao, B. Observer-based adaptive robust precision motion control of a multi-joint hydraulic manipulator. IEEE/CAA J. Autom. Sin. 2024, 11, 1213–1226. [Google Scholar] [CrossRef]
 - Tien, S.; Zou, Q.; Devasia, S. Iterative control of dynamics-coupling-caused errors in piezoscanners during high-speed AFM operation. IEEE Trans. Control Syst. Technol. 2005, 13, 921–931. [Google Scholar] [CrossRef]
 - Eielsen, A.A.; Gravdahl, J.T.; Leang, K.K. Low-order continuous-time robust repetitive control: Application in nanopositioning. Mechatronics 2015, 30, 231–243. [Google Scholar] [CrossRef]
 - Kim, K.S.; Zou, Q. A modeling-free inversion-based iterative feedforward control for precision output tracking of linear time-invariant systems. IEEE/ASME Trans. Mechatron. 2013, 18, 1767–1777. [Google Scholar] [CrossRef]
 - Zhang, T.; Yan, H.; Jia, D.; Zhang, H.; Zhao, B.; Lu, X.; Liu, Y. An accelerated iterative learning control approach for X-Y precision plane motion stage. In Proceedings of the 2021 6th International Conference on Automation, Control and Robotics Engineering (CACRE), Dalian, China, 15–17 July 2021. [Google Scholar] [CrossRef]
 - Zheng, T.; Xu, X.; Lu, X.; Hao, L.; Xu, F. Learning adaptive sliding mode control for repetitive motion tasks in maglev rotary table. IEEE Trans. Ind. Electron. 2022, 69, 1836–1846. [Google Scholar] [CrossRef]
 - Meng, Q.; Ma, Q.; Shi, Y. Adaptive fixed-time stabilization for a class of uncertain nonlinear systems. IEEE Trans. Autom. Control 2023, 68, 6929–6936. [Google Scholar] [CrossRef]
 - Lan, Y.; Zhao, J. Improving track performance by combining Padé-approximation-based preview repetitive control and equivalent-input-disturbance. J. Electr. Eng. Technol. 2024, 19, 3781–3794. [Google Scholar] [CrossRef]
 - Wang, F.; Chen, K.; Zhen, S.; Chen, X.; Zheng, H.; Wang, Z. Prescribed performance adaptive robust control for robotic manipulators with fuzzy uncertainty. IEEE Trans. Fuzzy Syst. 2024, 32, 1318–1330. [Google Scholar] [CrossRef]
 - Feng, H.; Song, Q.; Ma, S.; Ma, W.; Yin, C.; Cao, D.; Yu, H. A new adaptive sliding mode controller based on the RBF neural network for an electro-hydraulic servo system. ISA Trans. 2022, 129, 472–484. [Google Scholar] [CrossRef] [PubMed]
 - Hasan, M.W.; Abbas, N.H. An adaptive neural network with nonlinear FOPID design of underwater robotic vehicle in the presence of disturbances, uncertainty, and obstacles. Ocean Eng. 2023, 279, 114451. [Google Scholar] [CrossRef]
 - Yang, X.; Zhao, Z.; Li, Y.; Yang, G.; Zhao, J.; Liu, H. Adaptive neural network control of manipulators with uncertain kinematics and dynamics. Eng. Appl. Artif. Intell. 2024, 133, 107935. [Google Scholar] [CrossRef]
 - Hu, C.; Ou, T.; Chang, H.; Zhu, Y.; Zhu, L. Deep GRU neural network prediction and feedforward compensation for precision multiaxis motion control systems. IEEE/ASME Trans. Mechatron. 2020, 25, 1377–1388. [Google Scholar] [CrossRef]
 - Hu, C.; Ou, T.; Zhu, Y.; Zhu, L.; Chang, L. GRU-type LARC strategy for precision motion control with accurate tracking error prediction. IEEE Trans. Ind. Electron. 2021, 68, 812–820. [Google Scholar] [CrossRef]
 - Ou, T.; Hu, C.; Zhu, Y.; Zhang, M.; Zhu, L. Intelligent feedforward compensation motion control of maglev planar motor with precise reference modification prediction. IEEE Trans. Ind. Electron. 2021, 68, 7768–7777. [Google Scholar] [CrossRef]
 - Zhou, R.; Hu, C.; Ou, T.; Wang, Z.; Zhu, Y. Intelligent GRU-RIC position-loop feedforward compensation control method with application to an ultraprecision motion stage. IEEE Trans. Ind. Inform. 2024, 20, 5609–5621. [Google Scholar] [CrossRef]
 - Zhao, J.; Wang, Z.; Wu, Y.; Burke, A.F. Predictive pretrained transformer (PPT) for real-time battery health diagnostics. Appl. Energy 2025, 377, 124746. [Google Scholar] [CrossRef]
 - Nguyen, V.T.; Duong, D.N.; Phan, D.H.; Bui, T.L.; HoangVan, X.; Tan, P.X. Adaptive Nonlinear PD Controller of Two-Wheeled Self-Balancing Robot with External Force. Comput. Mater. Contin. 2024, 81, 2337. [Google Scholar] [CrossRef]
 - Dao, F.; Zeng, Y.; Qian, J. Fault diagnosis of hydro-turbine via the incorporation of bayesian algorithm optimized CNN-LSTM neural network. Energy 2024, 290, 130326. [Google Scholar] [CrossRef]
 - El-Shorbagy, H.I.; Belal, F. Eco-scale, blueness, ComplexMoGAPI, and AGREEprep comparison of developed UPLC-fluorescence method with a UPLC-PDA method for remdesivir determination in human plasma. Sustain. Chem. Pharm. 2025, 44, 101965. [Google Scholar] [CrossRef]
 - Khan, S.; Muhammad, Y.; Jadoon, I.; Awan, S.E.; Raja, M.A.Z. Leveraging LSTM-SMI and ARIMA architecture for robust wind power plant forecasting. Appl. Soft Comput. 2025, 170, 112765. [Google Scholar] [CrossRef]
 - Zheng, W.; Zheng, K.; Gao, L.; Zhangzhong, L.; Lan, R.; Xu, L.; Yu, J. GRU–Transformer: A Novel Hybrid Model for Predicting Soil Moisture Content in Root Zones. Agronomy 2024, 14, 432. [Google Scholar] [CrossRef]
 - Zhou, Y.; Lei, Z.; Mao, N.; Liao, W.; Lian, X.; Wu, C. Remaining Useful Life Prediction of Li-Ion Batteries Based on GRU–Transformer. In Proceedings of the 2025 IEEE 14th Data Driven Control and Learning Systems (DDCLS), Wuxi, China, 9–11 May 2025; pp. 2058–2062. [Google Scholar] [CrossRef]
 
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.  | 
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).