PMSM Speed Control Based on Particle Swarm Optimization and Deep Deterministic Policy Gradient under Load Disturbance

: Proportional integral-based particle swarm optimization (PSO) and deep deterministic policy gradient (DDPG) algorithms are applied to a permanent-magnet synchronous motor to track speed control. The proposed methods, based on notebooks, can deal with time delay challenges, imprecise mathematical models, and unknown disturbance loads. First, a system identiﬁcation method is used to obtain an approximate model of the motor. The load and speed estimation equations can be determined using the model. By adding the estimation equations, the PSO algorithm can determine the sub-optimized parameters of the proportional-integral controller using the predicted speed response; however, the computational time and consistency challenges of the PSO algorithm are extremely dependent on the number of particles and iterations. Hence, an online-learning method, DDPG, combined with the PSO algorithm is proposed to improve the speed control performance. Finally, the proposed methods are implemented on a real platform, and the experimental results are presented and discussed. communicating to the actuator affected the tuning performance. To solve the mentioned challenges, the DDPG method was proposed. The sub-optimized solutions of K p and K i determined by the PSO algorithm were added into the DDPG training, which caused the training to converge faster and more safely. Through the use of Tensorﬂow Lite, the computational time challenge could be solved. The experimental results show that the proposed DDPG method was able to effectively decrease the settling time, overshoot, and undershoot under the load disturbance. The proposed technique can be used in the ﬁeld of machining. In the cutting process, the speed of the motor usually decreases when the external load suddenly becomes large. To obtain a smooth surface and high quality of the parts, the fast compensation method of the motor speed can be utilized.


Introduction
Motors are widely utilized as the main power source in modern industries, and can be roughly classified into two types: direct current (DC) and alternating current (AC) motors. DC motors operate on the principle of basic magnetism, and the speed and torque are easy to control; however, DC motors have the disadvantages of high operation and maintenance costs. Furthermore, DC motors cannot operate under explosive and hazardous conditions because sparking occurs at the brush. Currently, AC motors are more widely utilized in industrial applications. AC motors can be classified into two parts: induction motors [1] and synchronous motors [2]. This study analyzes permanent magnet synchronous motors (PMSMs). Because of the advantages of high power density, high reliability, and low motor volume, PMSMs are used in computer numerical control (CNC) machines, electric vehicle drive systems, and robots. Field-oriented control (FOC) [3] is a popular strategy for controlling AC motors. FOC decouples the torque and flux by transforming the stationary phase currents to a rotating d-q frame. By using the FOC method, the AC motor control challenge can be simplified to a DC motor control challenge.
The control accuracy of the motor directly affects the performance of the system. The literature [4] indicates that the rotation speed and propulsive force are two important parameters in an air-borne bolting system. The breakage of the drill pipe and the inability of the drill bit to cut the coal adequately can be caused by the unreasonable rotation speed and propulsive force. In [5], it was found that, to produce precision parts, precise rotational accuracy and a constant machine tool spindle speed are required. However, inevitable changes in the spindle speed caused by the cutting force will lead to low-quality surfaces. In addition, the rotation speed of the pin tool is an important controlled variable in friction stir welding machines [6]. By controlling the rotating speed, a constant vertical force is obtained during the welding process.
Generally, the proportional derivative (PD), proportional-integral (PI), and proportionalintegral-derivative (PID) are the most common controllers used to design the control system [7,8]. In addition, a few studies have proposed alternative algorithms for motor speed control. The literature [9] proposes an on-line tuning fuzzy proportional derivative controller based on the equivalent errors to deal with the contouring control of robot manipulators. The results show that the controller can effectively deal with highly non-linear dynamic systems following an analytic/non-analytic path. A hybrid fuzzy-PI controller is proposed to address the challenges of chattering effects and large execution times [10]. Nonlinear controllers such as feedback linear control [11], sliding mode control [12], and adaptive control [13] are presented. In addition, model prediction control has been successfully used for the current and torque control of motors [14,15]. However, the aforementioned algorithms require precise knowledge of the nonlinear model of the motor, which is difficult to obtain. This study adopts the particle swarm optimization (PSO) and deep deterministic policy gradient (DDGP) algorithms to address the motor speed tracking challenge. In 1995, the PSO algorithm was proposed by Kennedy and Eberhart [16]. The PSO algorithm is inspired by the intelligent collective behavior of some animals, and is initialized with a group of random particles, where each individual particle represents a single solution. The particles move through the search space to obtain the optimized solution. The movement direction and velocity of the particles are updated according to the local and global best solutions. Numerous studies have indicated that the PSO algorithm is a powerful optimization method used in various applications. In [17], a learning model is used to determine the distance between the mobile sensor node and the anchor node. A PSO-based artificial neural network is proposed to improve the estimation accuracy. In contrast to the conventional optimization method of feed-forward backpropagation, PSO is utilized to update the weights of the network. In [18], a PSO-based PID controller is proposed to address the position control of the hydraulic system. The PSO algorithm is utilized to tune the parameters of the PID controller, and the results indicate that the rising time, settling time, and integral time absolute error can be reduced. The literature [19] reveals that the PSO algorithm can also be implemented in the field of control. In [19], a PSO-IAC algorithm, integrated by the adaptive inertia weight and constriction factor, is proposed to increase the convergence velocity. The results indicate that the challenge of reaching the avoidance problem can be addressed. A PSO, combined with a radial basis function neural network, is proposed to deal with the pitch control of a wind turbine [20]. In addition, a graphical approach is utilized to visualize the 2D or 3D boundaries of the PID controller. Under the delay time effects, the stochastic inertia weight PSO can effectively handle pitch control.
Recently, machine learning has become a popular technique. Generally, unsupervised learning [21], supervised learning [22], and reinforcement learning [23] are the three main types of machine learning strategies. First, unsupervised learning is typically utilized in the fields of data clustering and dimensionality reduction. It does not require a complete and clean labeled dataset, such as principle component analysis [24], autoencoder [25], and t-SNE [26]. Supervised learning deals mainly with classification and regression problems. It requires well-labelled dataset, such as a convolutional neural network [27], long shortterm memory [28], and random forest [29]. In addition, in the literature [30], a two-layer recursive neural network was used to approximate a motor model, and the network was considered to be a motor speed predictor.
Reinforcement learning is a framework for learning the relationship between states and actions. Ultimately, the agent can maximize the expected reward and learn how to complete a mission. Q-learning [31] and deep Q network (DQN) [32] are common value-based reinforcement learning algorithms. The policy usually takes the largest action with the largest action value. For the DQN, the Q table is replaced by a deep neural network to estimate the action value. Experience replay is utilized to break the correlations between the samples, which prevents the agent from overfitting to the sequence of training. In [33], unlike the conventional control method, the complex nonlinear model of the PMSM does not need to be known. A DQN-based controller is proposed to control the direct torque of the PMSM; however, DQN can only deal with discrete and low-dimension action spaces. In contrast to value-based learning, policy-based learning uses a stochastic policy that gives the probability of action in a specific state. The policy gradient [34] is a basic policy-based learning algorithm. The advantage of policy-based learning is that it can effectively deal with high-dimensional and continuous action spaces; however, the learning methods tend to converge to the local optima. To improve the performance of reinforcement learning, DDPG [35], TD3 [36], and SAC [37], combining Q-learning and policy gradient algorithms, are proposed. In [38], DDPG is used to solve the road-keeping challenge. The agent can determine the steering angle of the vehicle by observing the distance between the vehicle and the road boundaries. In [39], an optimal torque distribution control for multi-axle electric vehicles with in-wheel motors is proposed. The DDPG algorithm can optimize the drive torque for each motor to reduce the energy consumption of the vehicle.
The contributions of this paper are as follows: 1. This paper proposes an online-learning DDPG with the PSO method to control the motor speed under a load disturbance.

2.
The PSO algorithm is used to find a sub-optimized solution of the PI controller using an approximate motor model. To safely train the model online and converge faster, the sub-optimized solutions are added into the policy function of the DDPG as a constraint.

3.
The proposed network is a model-free method; hence, a precise mathematical model of the motor is not required for speed control. 4.
The proposed model can deal with the delay problem as well.
The experimental results demonstrate that the proposed method can effectively reduce the settling time and overshoot of the speed response when a load disturbance occurs. Furthermore, the results indicate that the proposed method performs better than the conventional PSO and PI controllers.
The remainder of this paper is organized as follows. Section 2 presents the structure of the motor testing platform. Section 3 describes the methodology, in which the formulation and implementation of the system and reinforcement learning solution are discussed in detail. In Section 4, the experimental results obtained from a real platform are presented. The experimental results indicate that the proposed method can effectively handle speed tracking control. Section 5 concludes this study.

Motor Testing Platform
In this study, the motor testing platform illustrated in Figure 1 was built. The motor testing platform comprised two PMSMs and one speed sensor. The analyzed PMSM was ECM-A3L-0807, developed by Delta Electronics. The rated speed and torque of the motor are 2.39 Nm and 3000 rpm, respectively. The speed sensor used was DATAFLEX ® 16, which was manufactured by KTR Systems GmbH. The speed sensor outputs a voltage ranging from 0 to 10 V, and the maximum measured speed can be set from 10 to 10,000 rpm. The Hall sensor, ACS711EX, can handle bidirectional currents from −31 to +31 A with a 100 kHz bandwidth. The data acquisition unit used was USB-2405, manufactured by ADLINK Technology. It utilizes the 24-bit Sigma-Delta ADC with a built-in 165 anti-aliasing filter and four simultaneous sampling analog input channels up to 128 kS/s. As illustrated in Figure 1, the two motors were mounted on the motor testing platform. Motor I was set in speed control mode and was controlled at a constant speed, Motor II was set in torque control mode, and it could provide instant and inversed torque as a disturbance. This study utilized a notebook as the computational platform, and it communicated to the actuator through RS-485. The speed sensor was connected between the two motors to measure the rotating speed of Motor I. Three Hall sensors were individually clamped to the three-phase wires of Motor I. To acquire a more stable current signal, the power supplier was utilized to feed a power of 5 V to the Hall sensors. Four signals were acquired by USB-2405, including the three-phase current signals and the speed signal of Motor I. The sampling rate was 4000 Hz. When the direction of the torque was opposite to the direction of rotation, the energy generated returned to the servo drive from the load. A regenerative resistor with 60 Ohm and 1000 Watt was used to consume the energy. Important peripheral devices are illustrated in Figure 2. The computer used for training in this work was a notebook equipped with an NVIDIA GeForce GTX 1050 Ti and an Intel ® Core™ I7-8750H. The notebook had an 8G memory. This work used Python as a development tool owing to its convenient libraries, such as Keras and TensorFlow. C++ was used for human interface design. The drawings in this article were produced in MATLAB owing to the convenient drawing functions and esthetics of its figures.

Methods
This section describes the proposed speed control methods. Three methods were analyzed in this study including the fixed method, PSO algorithm, and DDPG. For the fixed method, the Delta actuator has a function that can provide the parameters of PI controller based on motor status and the settings in the actuator. However, the value does not change when the external load happens. For the PSO algorithm, the approximate model of the motor should be determined using the system identification method. Subsequently, the load and speed estimation equations are derived; however, the approximate models of the PI controller and motor cannot fully describe the real platform. Using both estimation equations, PSO can solely determine the sub-optimized values of K p and K i for the PI controller. Hence, the proposed method, which combines the DDPG and PSO, was used to learn the optimized solution online. The advantage of DDPG is that the method is a model-free algorithm. The performance of the proposed method is discussed in Section 4. Figure 3 illustrates a control block diagram of the motor speed control system. C(s) and G(s) are the PI current controller and motor model, respectively. w * and w are the speed command and output speed, respectively. t c is the torque command obtained by multiplying the torque constant K t by the current i. t L is the external load disturbance. Using the system identification methods proposed in [40,41], the estimation model is a second-order transfer function G(s), as follows:

System Description
The PI controller has the following form: where K p and K i are the proportional and integral gains.

Load and Speed Estimation Equation
In the analysis of the methods, unknown external load was seen as a disturbance to the system, which can cause the motor speed response to change. The load estimation equation was derived to estimate the unknown disturbance. Furthermore, a speed estimation equation was derived. Both estimation equations were adopted for predicting the speed response and to enable the PSO algorithm to determine the sub-optimized solution.

Load Estimation
According to Equation (1), the external load can be expressed as follows: taking the inverse Laplace transform and assuming that the initial condition is zero. Then, the following differential equation can be obtained: Subsequently, the load estimation equation can be obtained by discretizing Equation (4).
where T s is the sampling time. Figure 3 illustrates that the transfer function I(s)/E(s) = K p s + K p K i /s. Assume that the state X(s) = E(s)/s, and the differential equation is .

Speed Estimation
x(t) = e(t). I(s) can be expressed as: Taking the inversed Laplace transform, the current response is expressed as follows: For the speed estimation equation, Equation (4) can be rewritten in the form ..
Finally, the difference equations of Equation (7) is expressed as: and the speed estimation equation is

Particle Swarm Optimization Algorithm
In this study, the PSO algorithm was used to determine the parameters of PI controller. When a load is added into the motor system and the speed of the motor is significantly changed, the PSO algorithm starts. Figure 4 illustrates the operation of the PSO algorithm.
where X n (i) represents the n th particle position and V n (i) is the n th particle velocity. m and d are the number of particles and dimension of the searching space, respectively. The individual best position P n and global best position P g are expressed as follows: P g (i) = p g1 (i), p g2 (i), · · · , p gd (i) , n = 1, . . . , m,  [42][43][44]. In this study, the IAE was adopted as the fitness function, which is calculated as follows: where T I AE indicates the IAE value. The load disturbance is first estimated by the load estimation equation. Afterward, N sampling points are predicted by the speed estimation equation for calculating the fitness function. Then, T I AE is calculated by subtracting the command speed and the estimated speed. By operating the following equations, the updated velocity and position of the particle can be obtained: where w p is the inertia weight, which is a constant value. C 1 and C 2 are the individual and global coefficients, respectively. R 1 and R 2 are random numbers generated from a uniform distribution in the interval [−1, 1]. When the number of iterations is larger than the number of maximum iterations, the PSO algorithm stops. In this study, the parameter settings of the PSO algorithm are shown in Table 1. The dimension of the searching space is 2 for K p and K i . Furthermore, the number of the predicted sampling point is 300, about 0.075 s. Two settings of the maximum iteration number are discussed in Section 4.2.

Deep Deterministic Policy Gradiend with Particle Swarm Optimization
This study proposes a DDPG model to implement the online learning of speed tracking control. By combining the sub-optimized solutions of the PSO algorithm, the DDPG model can converge faster.

State-Space
The state space denotes observations from the environment. To train the model efficiently, the state variables should be strongly related to actions. According to the load estimation and speed estimation equations shown in Equations (5) and (10), the variations in the speed and current are important. In addition, the current values of K p and K i affect the compensatory performance of the current. To ensure the scales of each feature are similar, the values of K p and K i should be normalized to K pn and K in . The normalization process is as follows: K * indicates the values and of K p and K i . S n * represents the space size of the K p and K i . In this study, S np and S ni were 250 and 100, respectively. K pn , K in , and the amount of change of these two signals are measured and processed as the input state vectors.
Once the speed is continuously lower than S e % of the command speed for N e , with more than 9 sampling points, the tuning starts. The speed W r (rad/s) and current I r (A) signals in the N e continuous sampling points can be expressed as follows: Figure 6 illustrates the measured signals for W r and I r . By subtracting the first variable from the others and combining the two vectors, the input state vector s t can be obtained. s t = w r1,2 , w r1,3 , . . . , w r1,10 , i r1,2 , i r1,3 , . . . , i r1,10 , K pn , K in (21)

Action Space
The main actions controlling the speed response are the parameters of K p and K i . The available setting ranges of K p and K i for the actuator utilized in this study are [0, 500] and [0, 100], respectively. However, by observing the sub-optimized solution solved by the PSO method, the boundary of K p can be restricted to [251,500], and the setting range of K i is kept the same.

Reward Function Design
This study proposes a reward function that enables the agent to track the speed command. It encourages the agent to decrease the settling time and minimize the change in speed while the load disturbance occurs. The IAE of the speed response is a key measurement of how well the motor speed is maintained. Furthermore, to decrease the training time and protect the mechanism during online training, the sub-optimized solutions of K ps and K is are considered. Assume that the actions determined by the actor network are K pa and K ia . In addition, the two search boundaries of the two actions are defined as [D pl ,D pu ] and [D il , D iu ]. The reward function R, based on the IAE and PSO, is presented as follows: where r 1 is a constant adopted to scale up the reward, and the neural network is allowed to converge more easily. The value of r 1 depends on the size of the action space. When the size of the action space is large, the value of r 1 should increase, otherwise, its value should decrease. r 2 is the decay rate, which is a negative constant. The normalized process is usually used in the reward function design [45][46][47], and is calculated by I w /I n . I w is the IAE value obtained by subtracting the command speed and the measured output speed. The number of the calculated sampling points is 300. I n is the average IAE result under a fixed K p and K i . The fixed values of K p and K i are 226 and 36, respectively, which are calculated from the Delta actuator under the auto-tuning mode. r 3 is also a constant, which makes r ∼ = 0.0 when I/I n is 0.5. The design of r 3 leads the reward function to contain both positive and negative value. D pu and D pl are calculated as K ps ± d p . Similarly, D iu and D il are calculated as K ps ± d i . If action a * ,t is out of the boundaries of [D * l ,D * u ], a small punishment is given. Based on numerous experimental results and observations, the values of the parameters are shown in Table 2. To determine the actions and evaluate the value of actions under a certain state, a critic network Q(s, a θ Q ) and an actor µ(s|θ µ ) were designed. The architectures of the networks are constructed by the fully connected layers, which are illustrated in Figure 7. After the experimental tuning, the proposed actor has one input state, as shown in Equation (21), including the values of K pn and K in and variations in the speed and current signals. Subsequently, two fully connected layers are connected. Finally, two output layers with the Tanh activation function generate the actions K pa and K ia , which are in the range of −1 to 1. By scaling into the range of each action space, as mentioned in Section 3.4.2, the final actions can be obtained. In addition, the proposed critic network has two input vectors including the state vector s t and the action vector. First, the two input vectors are connected to two fully connected layers separately, and are then concatenated together. Subsequently, one fully connected layer is connected. Finally, one linear output layer outputs the action value. All the fully connected layers used in the hidden layers are used with the ReLU [48] activation function. Adam optimization with the mean square error function is used to train the critic network. The mean of value given by the critic network is used to train the actor. Based on the tuning results, a fixed learning rate of 0.003 is assigned. Tables 3 and 4 list the hyperparameter settings of the proposed critic and actor networks.  The details of the reinforcement motor speed control method are presented in Algorithm 1. First, the experience memory M with the size of 100, and the critic, actor, and target networks are initialized. Afterward, every 5 s, motor II changes the disturbance load in the range of 0.0 to 1.0 Nm. If significant change happens on the motor speed, the state s t is fed into the actor and determines the parameter settings of PI controller. The exploration noise t is a random number, which is in the range of −5.0-5.0 after scaling. The reward is calculated after the actions are implemented on the motors. Then, the critic, actor, and target networks are updated. τ denotes the changed rate of the target network. The design of τ is intended to change the weight of the target network slowly, which is 0.05 in this paper.

Algorithm 1 DDPG for Speed Tracking Control
1: Initialize experience memory M 2: Randomly initialize critic network Q(s, a θ Q ) and actor µ(s|θ µ ) with weights θ Q and θ µ 3: Initialize target network Q and µ with weights θ Q ← θ Q , θ µ ← θ µ 4: repeat 5: Randomly select disturbance load in the range of 0.0~1.0 Nm 6: if significant change in motor speed then 7: Observe the state s t , and calculate sub-optimized solutions and boundaries K ps , K is , D pl , D pu , D il , and D iu 8: Select the actions a t = µ(s t |θ µ ) + t according to the current policy and exploration noise 9: if a * ,t is out of the boundaries of [D * l , D * u ] then a. Store transition (s i , a i , r i ) b. Randomly select action a * ,t in the boundary of [D * l , D * u ] 10: end if 11: Execute actions a t and observe reward r t 12: Store transition (s i , a i , r i ) 13: Sample a random minibatch of N m transitions (s i , a i , r i ) from M 14: Set y i = r i 15: Update critic by minimizing the loss 16: Update the actor policy using the sampled policy gradient:

Experimental Results
In this section, the results of the experiment are discussed. The flowchart of the proposed online tuning method is shown in Figure 8. First, the approximate model of the motor was obtained by the system identification. To verify the stability of the system, the Routh-Hurwitz [49] stability criterion was used. Afterward, the searching regions of K p and K i could be determined. By using the Hall sensor and speed sensor, the current signal and the motor speed could be acquired. When a significant change happened in the motor speed, the tuning algorithm started. In this study, three tuning methods, including the fixed method, PSO algorithm, and DDPG, were compared. Finally, the desired parameters of K p and K i could be obtained and updated in the actuator.

System Description and Stability
The motor system was considered to be a two-order transfer function, which is shown in Equation (1). By measuring the three-phase current and the output speed, and using the MATLAB system identification toolbox, the coefficients of A, B, C, and D were found to be 4.4720 × 10 −4 , 1.0181, 3.9129, and 886.21519, respectively. To confirm the performance of the system identification process, motor under settings of K p = 226 and K i = 36 were tested. Figure 9a illustrates the transient and steady state response under the command speed of 500 rpm. In addition, Figure 9b shows the speed response to load disturbance. The results show that the settling times of the approximate model and the real motor were similar. However, different amounts of overshoot and undershoot were found to exist. In Figure 9a, it can be seen that the overshoots of the simulation and real motor were 15.0% and 8.4%, respectively. In Figure 9b, when adding a load to the motor, the undershoot of the simulation and real motor were found to be 8.4% and 13.4%, respectively. However, when releasing the load, the overshoot of the simulation and real motor were observed to be 7.7% and 12.7%, respectively.  In order to implement the online learning method, the stability of the system needed to be considered. Inappropriate tuning parameters would cause the system to be unstable. According to the Routh-Hurwitz criterion, the stable boundaries of K p and K i that cause the system to be stable were calculated and are shown in Figure 10. The positive area below the blue curve is the stable region. As mentioned in Section 3.4.2, the available setting ranges of K p and K i of the actuator were [0, 500] and [0, 100], respectively. The available operating area, the red boundary, is shown in Figure 10. All the operating area was in the stable region. Hence, the settings of the actuator were all allowed.

Online-Tuning Results
The experiment was divided into four stages representing the load conditions of 0→0.717 Nm, 0.717→0 Nm, 0→0.717 Nm, and 0.717→0 Nm, respectively. Figure 11 shows the comparison of the four methods. As shown in Figure 11, using fixed values of K p and K i , the settling time, undershoot and overshoot were about 100 ms, 13.4%, and 12.7%, respectively. To improve the performance of the time response, PSO tuning method is first discussed. Two parameter settings of PSO algorithm are shown in Table 5.
In Case I, when twenty continuous sampling points of the rotating speed was lower than 97% of the command speed, the PSO tuning method started. To calculate the fitness equation T I AE as shown in Equation (15), 300 sampling points of the speed response after the starting point were predicted by Equations (5) and (10). Subsequently, the suboptimized solutions could be obtained and sent into the actuator.
First, for comparison with the fixed method, the values of K p and K i were initialized to 226 and 36. In stage I, the settling time could be improved from 100 ms to 58 ms. However, the undershoot did not undergo significant change, and was about 13%. In stage II, the settling time was decreased to 39 ms, which was 61 ms better than that of the fixed method. Moreover, the overshoot was decreased to 6.8%. To confirm the performance of the motor encountering the same load conditions, stages III and IV were conducted. In stage III, the settling time and undershoot were 34 ms and 7.6%, respectively. Because the values of K p and K i had been tuned, a better ability to compensate the current is shown. In stage IV, the settling time was 34 ms, and the overshoot was 6.26% which was 0.54% better than that in stage II. The results suggest that the performance of the time response was improved. However, the tuning results were affected by the computational consumption and delay time. In stage I, the real tuning point was operated after the lowest point, which was far from the start point. The delay time caused the solutions determined by the PSO algorithm to be sub-optimized solutions. In Case I, the PSO algorithm took about 2 ms to yield the solutions. In addition, the delay time of communicating with the actuator was about 10 ms. The results show that the time consumption is an important issue.
In order to minimize the influence of computational time and delay time, some parameter settings were changed. In Case II, the threshold S e , error number N e , and the iteration number were modified to 98.5% and 5. By changing the settings, the calculation time of the PSO algorithm was reduced to 1 ms. A lower value of S e led the tuning method to be more sensitive to the speed error; however, it made the start point move forward by about 2.5 ms.
In stage I, the settling time was about 28 ms, which was 22 ms better than that of Case I. However, the change of the settings did not make great improvement on the undershoot. In stage II, the settling time was decreased to 20 ms. Furthermore, the overshoot was decreased to 6.4%. In stage III, the settling time and undershoot were 19 ms and 8.3%, respectively. In stage IV, the settling time was 22 ms, and overshoot was 6.26%. The results show that, by reducing the processing time, some parts of the performance could be improved.
To deal with the delay time problem and imprecise mathematical model, DDPG model-free reinforcement learning was used to train online. By observing the experimental results of Cases I and II, the values of S e and N e were set to 98.5% and 10 in the Cases of DDPG. In addition, the TensorFlow Lite technique was utilized to accelerate the prediction process without losing accuracy. The prediction time could be reduced to 0.03 ms, which was about 300 times smaller than that of Case II. As illustrated in Figure 11, when a load was added to the motor, the DDPG method could provide the best current compensation. The settling times in stages I and II were 18 and 16 ms, which were the smallest values in each Case. When releasing the load, in stages II and IV, the settling times were 20 ms and 18 ms, respectively. Table 6 list the values of settling time, undershoot, and overshoot of each method. The fixed value settings obtained from the Delta actuator performed the worst. For the undershoot and overshoot, the other three methods had similar performance. However, for the settling time, the proposed method had the smallest values.  Figure 11. Comparison of the four tuning methods.  39, respectively. Generally, in stage I, the tuning results of Case II were better than those of Case I. Because the starting point was moved forward and the calculation time was decreased, the solutions of K p and K i could operate at a more suitable point. In stages II and III, the average IAE values were similar. In stage IV, the tuning results of Case I were better than those of Case II. Furthermore, in Case I, the standard deviations of each stage were 169.69, 205.41, 277.11, and 89.92, respectively. In Case II, the standard deviations of each stage were 246.02, 169.64, 429.51, and 322.51, respectively. The solutions of Case II were more separated, which was caused by the smaller number of iterations. However, some solutions determined by Case II were better than Case I. The average IAE values of the proposed DDPG showed the best performance. Furthermore, according to the standard deviation of the IAE value, the solutions determined by the proposed DDPG were more converged, which is shown in Figure 12. Table 7 lists the average and standard deviation of the IAE value of the four methods. The experimental results show that the DDPG method combines the advantages of Cases I and II. The proposed method can learn the unknown model of the motor system and determine better tuning solutions for the load disturbance.

Conclusions
This study proposed a DDPG method to effectively track the speed control under an unknown load disturbance. A precise mathematical model of the motor system is not required when using the model-free reinforcement learning algorithm. Three methods, including the fixed method, PSO algorithm, and DDPG, were discussed. The fixed method showed the worst performance. The fixed values of K p and K i , 226 and 36, were provided by the Delta actuator based on the motor status and settings. It took the longest time to return back to the speed command, and had the largest overshoot and undershoot. The tuning results of the PSO algorithm show that the speed response was improved when the load disturbance occurred. However, the computational time and time delay caused by communicating to the actuator affected the tuning performance. To solve the mentioned challenges, the DDPG method was proposed. The sub-optimized solutions of K p and K i determined by the PSO algorithm were added into the DDPG training, which caused the training to converge faster and more safely. Through the use of Tensorflow Lite, the computational time challenge could be solved. The experimental results show that the proposed DDPG method was able to effectively decrease the settling time, overshoot, and undershoot under the load disturbance. The proposed technique can be used in the field of machining. In the cutting process, the speed of the motor usually decreases when the external load suddenly becomes large. To obtain a smooth surface and high quality of the parts, the fast compensation method of the motor speed can be utilized.