1. Introduction
The interior permanent magnet synchronous motor (IPMSM) has been widely applied due to its excellent characteristics, such as simple structure, high efficiency, high power density, and fast response [
1,
2]. Currently, IPMSM drive systems typically employ the field-oriented control (FOC) strategy, which can independently regulate the torque and flux linkage of the IPMSM [
3,
4]. The outer speed loop is used to control the motor speed, generating reference torque to ensure that the actual motor speed tracks the reference value. The inner current loop controls the flux linkage to ensure that the stator current vector tracks the reference value. The inner current loop plays a crucial role in the IPMSM control system as its performance directly impacts torque control, dynamic response, efficiency, stability, and system reliability. Optimizing the inner current loop control can significantly enhance the performance of the IPMSM control system [
5,
6,
7].
In addition to the proportional–integral (PI) control [
4], many new algorithms have been proposed to improve the performance of the current loop, such as model predictive current control (MPCC) [
8], adaptive linear neuron (ADALINE) control [
9], sliding mode control (SMC) [
10], active disturbance rejection control (ADRC) [
11], the complex vector decoupling method [
12], and so on. Among these control methods, model predictive current control (MPCC) has become one of the most commonly used current control strategies due to its fast dynamic response [
13]. Finite Control Set Model Predictive Current Control (FCS-MPCC) directly determines the optimal voltage vector by minimizing a predefined cost function [
14]. Continuous Control Set Model Predictive Current Control (CCS-MPCC) calculates the reference voltage [
15]. The switching signals are then generated by Space Vector Pulse Width Modulation (SVPWM). Compared with FCS-MPCC, CCS-MPCC has a fixed switching frequency and produces smaller current ripples. However, the conventional CCS-MPCC relies on the accurate model of the system. The IPMSM control system is a nonlinear and strongly coupled system with unavoidable and unmeasurable disturbances such as external loads and potential parameter variations during operation [
16,
17,
18]. Moreover, the inverter output voltage contains nonlinearities due to factors such as conduction voltage drop, dead time, and high-frequency modulation [
19,
20]. These non-ideal factors are often neglected in the modeling of IPMSM control systems. Model mismatches may lead to steady-state errors, degrade system control performance, and even affect the stability of the system.
Among the various anti-disturbance control methods, the disturbance observer (DO)-based control method has gained widespread attention and become a research focus due to its simple structure, independence from the accurate system model, and lack of additional hardware cost. Disturbance-observer-based compensation preserves the original structure of the control algorithm. Study [
21] proposed a multiple-model adaptive disturbance observation method to eliminate steady-state current errors. Study [
22] proposed an adaptive observer based on the model reference adaptive system (MRAS), which enhances the control performance, disturbance rejection capability, and robustness of the control system. Study [
23] presented a novel super-twisting algorithm based on an extended state observer (ESO) to improve dynamic response performance. Study [
24] introduced an adaptive internal model observer (AIMO) to estimate and compensate the disturbances caused by parameter variations. Although these disturbance observation methods can effectively suppress system disturbances, they have limitations, including challenges in parameter tuning and suboptimal performance in observing nonlinear disturbances. The Sliding Mode Observer (SMO), as a nonlinear observer with simple parameter tuning, has received increasing attention and is employed in a lot of IPMSM control systems [
25,
26,
27,
28]. However, due to its discontinuous switching characteristics, the conventional SMO is prone to chattering issues [
26]. The terminal SMO incorporates nonlinear terms into the conventional sliding mode surface, enabling the observation error to converge to zero within finite time. Its dynamic performance surpasses that of the conventional SMO, which achieves state asymptotic convergence under linear sliding mode surface conditions [
27]. Additionally, the terminal SMO eliminates switching terms, effectively mitigating chattering. However, the incorporation of nonlinear terms introduces negative exponential components into the sliding mode control function, which may lead to singularity issues [
28].
In addition to disturbance issues, since MPCC calculates the reference voltage vector by minimizing the cost function, the design of different cost functions will impact the control performance of the control system [
29,
30]. Study [
31] proposed a simplified cost function for the Z-source inverter (ZSI) MPC algorithm, which allows all the states to be controlled at their setting points by only introducing a weighting factor of the capacitor voltage and reduces the computational complexity of MPC. Study [
32] addressed the coupling problem from the trade-offs between multiple objectives in the cost function and removed the penalty of control actions’ variation from the cost function by introducing the event-triggered (ET) mechanism into MPC. Study [
33] established a multi-layer distributed mathematical model considering the equipment adjustment cost that adjusts the optimization objectives based on different stages of system operation to ensure the optimal adjustment cost of the system. However, the above methods do not consider the time factor in the control algorithm. The MPCC performs optimization calculation based on the predictive model in each cycle to obtain the optimal reference voltage vector sequence within a finite prediction horizon and apply the first vector to the system, ensuring that the system can respond in real time to dynamic changes and achieve optimal performance under different operating conditions. In the predictive model based on the state equations of the IPMSM, the state and operating cost of the system in the next cycle are determined only by the current state and action, independent of past states and actions. Therefore, it can be considered that the state transition and optimal decision of MPCC exhibit the Markov property [
34,
35]. According to the Markov decision process (MDP) theory, in optimization problems aimed at long-term reward, considering the decision preference for time, the future cost and the current cost have different impacts on the decision, and the future rewards will gradually decay based on a discount factor [
36,
37].
In this article, a Markov decision MPCC algorithm of an IPMSM based on lumped disturbances compensation is proposed. An IPMSM model is given by incorporating all the unideal factors into lumped disturbances. In order to track and compensate the lumped disturbances, a terminal sliding mode disturbance observer (SMDO) based on a recursive integral sliding mode surface is designed to eliminate the impact of disturbances, and improve the dynamic response of system. In the sequential decision optimization of conventional MPCC, a discount factor is introduced according to the MDP theory to reduce state fluctuations during operation without affecting the dynamic response. Moreover, compensating for lumped disturbances eliminates uncertainties in the system, transforms the optimization calculations into a deterministic decision process, and simplifies the Markov decision MPCC algorithm design. The rest of the article is organized as follows: In 
Section 2, an IPMSM model considering lumped disturbances is derived, and a recursive integral SMDO is designed to estimate the disturbances of the model. In 
Section 3, a Markov decision MPCC algorithm considering the time factor in the optimization calculation of conventional MPCC is proposed to enhance the stationarity of system. In 
Section 4, experiments are conducted, and the results are presented and analyzed. Finally, conclusions are drawn in 
Section 5.
  2. Lumped Disturbance Observer Design for IPMSM
  2.1. Modeling of IPMSM with Parameters Variation and Unknown Disturbances
Under ideal conditions, the state equation of the IPMSM can be denoted as (1):
        where ω
e, 
Te, 
TL, J, n
p, and B are the angular velocity, electromagnetic torque, load torque, rotational inertia, number of pole pairs, and damping coefficient, respectively; 
ud, 
uq, 
id, 
iq, 
Ld, and 
Lq are the terminal voltages, stator currents, and inductances on the d- and q-axis; 
Rs and 
ψf are the stator resistance and magnet flux linkage.
In the actual operating condition, the parameters will be perturbed by temperature, load, and other factors. Meanwhile, the undesirable factors of the three-phase voltage source inverter will also lead to the distortion of the voltages. Since the load torque TL is an external input and the damping coefficient B of the system is unknown, they are also considered as part of the disturbances.
Substituting the torque equation in (1) into the mechanical motion equation and considering all the above undesirable factors as lumped disturbances, the state equation of the IPMSM in actual operating conditions is as shown in (2):
        where 
σω is the lumped disturbances of torque; 
σd and 
σq are the lumped disturbances of voltages on the d- and q-axis.
Rewrite (2) in compact form as (3):
In (3), 
, 
, and 
, 
A(
x) is the state transition matrix affected by the system states, and 
B is the input transition matrix.
        
  2.2. Design of Lumped Disturbances Observer Based on Recursive Integral Sliding Mode Surface
The sliding mode lumped disturbances observer is constructed based on (3) as follows:
        where 
 and 
 are the observations of the system states and lumped disturbances, respectively; 
 is the observer control term. 
, 
, and 
.
        
Define the observation errors as 
, then (7) is derived according to (3) and (5):
To ensure the 
 can converge to 
0 in finite time, and the 
 is non-singular, an integral fast terminal sliding mode surface is selected, as shown in (9):
        where 
αω, 
αd, 
αq, 
βω, 
βd, 
βq, 
γω, 
γd, and 
γq are sliding mode observer coefficients. All the parameters are positive where 
γω, 
γd, and 
γq are less than 1.
When the observer satisfies the arrival condition of the sliding surface 
s and is in the sliding mode, the derivative of 
s is 
0, as shown in (10):
Based on (7) and (10), (11) can be obtained:
In order to satisfy the arrival condition, 
 is selected as follows:
However, (12) only guarantees that the state observation errors converge to 0 in finite time. The observer can only observe the system states and cannot realize the estimation of the lumped disturbances σ.
In this case, (13) can be obtained:
Therefore, the recursive sliding surface 
sσ based on 
s is introduced to estimate the lumped disturbances 
σ as follows:
        where 
ασω, 
ασd, 
ασq, 
βσω, 
βσd, 
βσq, 
γσω, 
γσd, and 
γσq are sliding mode observer coefficients of 
sσ. As with 
s, all the parameters are positive, and 
γω, 
γd, and 
γq are less than 1.
Substituting (13) into (14), the following can be obtained:
The derivative of 
sσ is also 
0 when the observer is in sliding mode. Therefore, the observation result 
 of lumped disturbances is selected as shown in (16):
The traditional fast terminal SMDO selects (10) as the sliding surface. When calculating the observer control term and the observation of lumped disturbances, it is necessary to calculate the derivative of  and the γ power of  to satisfy the sliding mode arrival condition. Because the controller can only handle discrete data, derivatives are typically computed using differencing methods that are sensitive to noise. Moreover, since 1 > γ > 0, the derivatives of the nonlinear parts yield negative powers of observation errors. Consequently, when observation errors approach 0, the control functions of the observer become infinitely large, leading to singularity issues.
Compared with the traditional terminal SMDO, the proposed recursive integral SMDO does not require derivative calculations while solving  and , thus avoiding the noise sensitivity and singularity problem while tracking the system states and lumped disturbances in finite time.
The Lyapunov function 
 is defined as a quadratic form of 
, and 
 and 
 are given as (17):
According to (13), (18) is obtained:
When the control period is sufficiently small, it can be considered that 
. At this time,
        
Since the observer parameters are all positive,  is negative definite. Therefore, the observer is asymptotically stable, and it will reach and move along  within a finite time. The observation errors of both the system states and lumped disturbances converge to 0.
The trajectory of the observation errors is denoted as (20):
By solving (20), the time to arrive at 
 from any initial state 
 can be derived as follows:
  3. IPMSM Control Strategy Based on Lumped Disturbance Compensation
The IPMSM employs a dual-loop control strategy consisting of speed and current control loops. By converting the reference electromagnetic torque output from the speed loop into reference current inputs, the terminal voltages are ultimately obtained All the disturbances are considered to be matched disturbances that can be directly compensated within the system.
  3.1. Conventional IPMSM Control Strategy
In conventional dual-loop control strategy for IPMSM, the speed loop adopts PI control to ensure that there is no steady-state error in speed control. The torque disturbance estimated by the observer is compensated through feedforward control to improve the performance of the speed loop control. Then, the constraint of the minimum modulus of current vectors is incorporated into the torque equation mentioned in (1), calculating the corresponding current vector to achieve Maximum Torque per Ampere (MTPA) control. Because online computation involves a large computational load, offline calculation of the relationship between MTPA currents and torque is performed based on the nominal values of motor parameters and is used to obtain the preset MTPA curves through curve fitting, aiming to enhance the real-time performance of the system.
The speed loop control diagram based on disturbance compensation is shown in 
Figure 1. In the figure, 
 represents the desired electrical angular velocity, which is obtained by the set value of the motor speed 
nref using a ramp-up and ramp-down method. 
 is the reference value of electromagnetic torque synthesized from the output of the speed controller and the observed torque disturbance 
. 
 is the reference current vector calculated by the preset MTPA curve.
The current loop employs model predictive control to discretize the voltage equation in (1) and incorporate the coupling terms into the voltage disturbances, as shown in (22):
In (22),
        
        where T
pwm represents the current loop control period.
The tracking error vector of the current loop is
        
Define the quadratic performance index 
 as
        
        where Np represents the prediction horizon, 
Q is the error weight matrix, and 
R is the control weight matrix. Both 
Q and 
R are chosen as symmetric positive definite matrices.
In the kth control cycle, predict the state sequence  within the next Np steps based on (22). Then, solve the optimization problem to minimize the performance index  within the prediction horizon and obtain the optimal control sequence . In the k + 1th control cycle, repeat the operation of the kth cycle with the updated motor current vector  to perform rolling optimization.
  3.2. IPMSM Model Predictive Current Control Strategy Based on Markov Decision Process
According to (22), 
idq[k+i+1] is only dependent on 
idq[k+i] and 
udq[k+i]. Therefore, after incorporating disturbance compensation, the model predictive process of the current loop becomes a deterministic sequential decision Markov chain. The immediate cost function r for each cycle is a quadratic function of tracking error 
e and the control voltage 
udq, as shown in (27):
According to (26), the system performance index is obtained by accumulating the cost of each step within the subsequent Np steps. Due to the use of rolling optimization, only the first element 
udq(k) of the control sequence is selected each time, and the rest of the control sequence is discarded without being actually applied to the motor. A large prediction step size will cause the system to overly consider future trajectory changes, leading to a decrease in tracking accuracy; conversely, a small step size will prevent the system from adequately responding to sudden changes in the reference value. According to the theory of Markov processes, the impact of the cost in future cycles (when i > 0) on the control strategy differs from the impact of the cost in the current cycle (i = 0) on the control system. When the cost of the subsequent period is incorporated into the previous period, a discount rate needs to be multiplied. Therefore, the Markov decision discount index can be used to optimize the control strategy instead of (27). 
 is rewritten as (28):
Equation (28) represents the discounted total cost over Np steps starting from the kth cycle. In the equation, β is the discount factor matrix, which is a diagonal matrix. Additionally, all eigenvalues of β lie in the range of [0, 1]. Discounting future costs allows the controller to focus more on the cost in the current cycle without excessively considering costs that have not yet occurred. This allows the controller to consider future trajectory trends and accelerate the response while reducing the impact of prediction errors in future moments.
Let 
 and 
 be defined as follows:
Equation (28) can be written as follows:
        where
        
Due to control delay, 
udq[k] will be applied to the system in the k + 1th cycle, and the reference current vector 
 can be assumed to remain unchanged during the calculation process. According to Equations (22) and (25),
        
        where 
I represents the second-order identity matrix.
When the decision is optimal, meaning that udq is the optimal control voltage vector, the discounted expected total cost  of the system is minimized.
Taking the partial derivative of 
 with respect to 
 and substituting (32) into the result yields Equation (34):
The second partial derivative of 
 with respect to 
 is
        
Since 
Q, 
R, and 
β are all symmetric positive definite matrices, 
Qa and 
Ra are also symmetric positive definite matrices; therefore, (35) is positive definite. When the result of Equation (37) is 0, 
 is the minimum value. Thus, the optimal decision path of the MDP under the discount model can be obtained as follows:
The kth cycle control vector 
udq[k] of 
 is combined with the observed voltage disturbances vector and the coupling terms vector to synthesize the final control voltage 
udq_ref[k], as shown in 
Figure 2.
Due to control delay, 
udq_ref[k] affects the motor in the k + 1th cycle. Meanwhile, the discrete-form disturbance observer can not only observe the lumped disturbances 
 but also estimate the next cycle’s system states 
. Therefore, 
 can be used to compute coupling terms to reduce the impact of control delay, as shown in (37):
        where 
, and
        
  4. Experimental Results
An experimental setup was constructed to validate the proposed algorithm, as illustrated in 
Figure 3. This setup includes an interior permanent magnet synchronous motor (IPMSM), an asynchronous induction motor (ACIM), and a torque sensor, all connected via diaphragm couplings. The IPMSM is employed to test the proposed algorithm, while the ACIM provides the load torque. The parameters of the IPMSM are detailed in 
Table 1.
The control algorithm for the IPMSM is implemented using a DSP TI TMS320F28377D and an FPGA Xilinx XC6SLX16. CPU1 of the DSP communicates with the PC while executing the motor control algorithm, while CPU2 is tasked with implementing the disturbance observer. The FPGA manages peripheral interfaces including the resolver-to-digital converter (R/D Converter) ADI AD2S1210, analog–digital converter (ADC) ADI AD7606, and digital–analog converter (DAC) ADI AD5348 and the external I/O interface, as well as the implementation of hardware protection logic. Communication between the DSP and FPGA is facilitated through the external memory interface (EMIF) of the DSP. Hall sensors are utilized to acquire voltage and current data from the IPMSM, while a rotary transformer is employed to measure the speed and rotor position of the IPMSM. The pulse width modulation (PWM) switching frequency is set at 20 kHz, and the DC bus voltage is 200 V. All the data during the experiments are transmitted by DAC and recorded by the Yokogawa DL850 ScopeCorder. The experimental setup is shown in 
Figure 4.
Experiments are carried out to validate the effectiveness of lumped disturbances compensation and the performance of model predictive current control based on the MDP. The same PI parameters for the speed loop and weight matrices for the current loop are used during all the experiments. After starting the IPMSM, the speed ramps up to 1400 rpm. Subsequently, the ACIM provides 20 N·m torque load to the IPMSM. Then, the load inverter is shut down to decrease the load torque to 0 N·m rapidly. Finally, the speed of the IPMSM is ramped down to 0 rpm, completing one experimental cycle.
Figure 5 and 
Figure 6 show the experimental results without disturbance compensation.
 From 
Figure 5, it can be observed that the speed loop with the PI controller achieves a zero steady-state error in speed tracking. Since the influence of disturbances was not considered, the model predictive control method based on the voltage equations in the motor state equation under ideal condition (1) cannot eliminate steady-state errors in the system. Therefore, there exists a deviation between the operating point of the current loop and the reference value. Moreover, these static errors increase with higher speed and greater load torque.
Figure 6 shows the speed waveforms during acceleration, deceleration, loading, and unloading processes without lumped disturbance compensation. During the acceleration and deceleration processes, it takes 2 s of adjustment for the speed to track the reference value. Additionally, during the acceleration process, there is overshoot in the speed, and the actual speed stabilizes 1.5 s after the reference speed reaches the set value. During the loading process, the speed drops by 55 rpm. After 6 s of adjustment, the speed returns to 1400 rpm. During the unloading process, due to the abrupt shutdown of the loading inverter, the load torque quickly drops to 0 N·m. As a result, the speed increases by 233 rpm. The speed returns to the set value after 4 s of adjustment.
 Figure 7 and 
Figure 8 show the observation results of the recursive integral SMDO.
 By comparing waveforms of the actual system states with the estimated system states in 
Figure 7, the recursive integral SMDO can estimate the system states with high accuracy.
Figure 8 shows the lumped disturbances observation results of the recursive integral SMDO. It can be observed that absolute values of torque disturbance and voltage disturbance on the d-axis increase with higher speed and greater load torque. At the beginning of acceleration, the voltage disturbance on the q-axis moves towards the negative direction. As the speed and torque increase, the disturbance moves towards the positive direction. The direction of voltage disturbances is opposite during acceleration and deceleration processes. The operating point of the current loop is influenced by both the control voltage and disturbance voltage, ultimately leading to a deviation between the actual output torque of the IPMSM and the reference torque. The PI controller in the speed loop will adjust the reference torque and modify the reference value of current vector to eliminate the steady-state error of motor speed and achieve system equilibrium. This results in differences in the variation of the reference currents shown in 
Figure 5 compared to when using zero steady-state error control methods, such as PI control.
 Figure 9, 
Figure 10 and 
Figure 11 show the experimental results with the conventional control strategy based on disturbances compensation.
 By comparing waveforms in 
Figure 9 and 
Figure 5, it can be seen that the static errors of the currents disappear with lumped disturbances compensation. At the same time, the fluctuations of motor speed and d-axis current become smaller. Due to the torque disturbance compensation in the speed loop, the fluctuation of 
 becomes larger, and the electromagnetic torque is mainly provided by the q-axis current, which makes the fluctuation of the q-axis current become larger. However, the compensation of voltage disturbances also reduces the high-frequency harmonics in the q-axis current.
Figure 10 shows the dynamic performance of motor speed using the traditional control method based on lumped disturbances compensation. Compared with the performance shown in 
Figure 6, the motor speed can track the reference value faster during acceleration and deceleration processes, with reduced overshoot of the acceleration process. When the load torque changes, the motor speed experiences a smaller decrease (20 rpm) and increase (150 rpm), and the adjustment time is shorter.
 Figure 11 shows the currents’ tracking performance during the unloading process, where the currents change most rapidly. At this time, the currents can still track the set values without static errors.
 Compared with the conventional control strategy, introducing the Markov decision MPCC into the current loop will not affect the dynamic performance and disturbances compensation effectiveness of the system. However, the decision based on the Markov discount model can reduce the fluctuations in motor speed and currents, leading to the system being more stationary. 
Figure 15 shows the comparison of state fluctuation amplitudes at 1400 rpm under loaded and unloaded conditions for the three experiments.
In 
Figure 15 the blue lines represent the system state fluctuation amplitude when using the traditional control method without lumped disturbances compensation; the orange lines represent the system state fluctuation amplitude when using the traditional control method with lumped disturbances compensation; the purple lines represent the system state fluctuation amplitude while the Markov decision discount index is introduced to the MPCC in the current loop.
  5. Conclusions and Future Studies
This article proposed a Markov decision model predictive control of an IPMSM based on lumped disturbances compensation. An IPMSM model that integrates the unideal factors of both the IPMSM and the inverter into lumped disturbances in the speed and current loops is considered. A terminal SMDO based on a recursive integral sliding mode surface is utilized to simultaneously estimate all the lumped disturbances. This method mitigates the noise sensitivity and singularity issues inherent in traditional terminal SMDOs and ensures that the estimated values of system states and disturbances converge to their real values within finite time. Compensation based on aggregated disturbance observation values does not require the original control structure to be changed. Furthermore, due to the characteristics of model predictive current control, a discounted cost criterion based on the MDP is introduced to enhance the control performance of the system. The experimental results are presented to validate the effects of lumped disturbances compensation and the Markov decision model predictive current control. After introducing the lumped disturbances compensation, the speed of the IPMSM can track the reference value more quickly during acceleration and deceleration processes with smaller overshoot. When the load torque changes, the speed variation becomes smaller, and returns to the reference value faster. Additionally, the static errors of the currents disappear, which indicates that the influence of system disturbances has been eliminated. The control accuracy of system has been improved. Replacing the traditional current control method with Markov decision model predictive current control does not affect the dynamic performance of system, but results in smaller fluctuations in system states. The operation of the IPMSM is smoother, and performance of system is further enhanced.
The proposed Markov decision model predictive current control method assumes that the reference current vector is constant during optimization calculation. The disturbances’ feedforward compensation and states’ feedback control parts are separately designed and then superimposed, which leads to the rolling optimization that does not include the disturbances compensation term, resulting in the final solution no longer being optimal. Further research will consider the trajectory of the reference speed, and optimize the speed loop to output the reference current vector sequence within the prediction domain of the current loop. Simultaneously, disturbances effects and variable constraints (such as state constraints) will be considered during current loop optimization calculations to achieve optimal model predictive disturbances attenuation control with variable constraints and further enhance the system performance.