The Temperature Prediction of Permanent Magnet Synchronous Machines Based on Proximal Policy Optimization

: Accurate temperature prediction plays an important role in the thermal protection of permanent magnet synchronous motors. A temperature prediction method of permanent magnet synchronous machines (PMSMs) based on proximal policy optimization is proposed. In the proposed method, the actor-critic framework of reinforcement learning is introduced to model the effective temperature prediction mechanism, and the correlations between the input features are then analyzed to select the appropriate input features. Finally, the simpliﬁed proximal policy optimization algorithm is introduced to optimize the value of the prediction temperature of PMSMs. Experimental results reveal the high accuracy and reliability of the proposed method compared with an exponential weighted moving average method (EWMA), a recurrent neural network (RNN), and long short-term memory (LSTM).


Introduction
Temperature prediction of permanent magnet synchronous machines (PMSMs) has been a research focus in the field of motor protection. In recent years, researchers have made many attempts to predict the temperature of PMSMs [1], since temperature is an important factor for PMSMs to work. Most researchers have focused on the thermal model of the motor. For example, the temperature equivalent model based on hardware-in-loop (HIL) was proposed to effectively predict the motor temperature [2], but this method required high calculation complexity. An equivalent thermal transfer model with two heat nodes for a permanent magnet synchronous motor was also proposed [3]. The thermal effect of the current and stator frequency was considered. The predicted results verified the rationality of this transfer model. Mohamed et al. [4] constructed a Lumped Parameter Thermal Network (LPTN) to calculate important component temperatures inside PMSMs. The air temperature between permanent magnets was considered in this model. However, the computational complexity of the model is high. Wallscheid et al. [5] proposed a dynamic measurement method by introducing the magnetic flux observer into the time-dependent dispersion model of PMSMs. However, this approach is not universal because it is strongly correlated to machine speed. Wallscheid et al. [6] examined the prediction performance of flux observers in PMSMs, and the results illustrated that the worst case of the Euclidean norm is less than 10 K. Lan et al. [7] established a temperature thermal network with 38 nodes by analyzing the temperature fields of PMSMs, which accurately described the temperature values of each component inside the motor. However, the acquisition of overheat spots lacked optimization. Sciascera et al. [8] built a variable heat model of an LPTN to improve the prediction accuracy of the traditional LPTN, which requires low computational complexity. In addition, this model provides an effective fine-tuning experience of model parameters. Liu et al. [9] investigated the signal injection method for estimating the temperature of the motor stator windings, but the temperature estimation results under motor overload were not given. Du et al. [10] established a finite element model of the electromagnetic fields of the motor using finite element analysis. The model obtained a temperature distribution of major components inside the motor under a rated working condition by calculating motor loss and a coefficient of thermal conductivity. In conclusion, the above models aimed to establish the empirical formulas of motor temperature. However, these processes of modeling design and the factors adopted depend on prior experience. In this work, temperature prediction is seen as a time series problem, and the temperature change of motor components can be fitted dynamically with additional degrees of freedom due to the capability of the dynamic tuning in PPO-RL.
The development of artificial intelligence technology has shown great potential in the field of temperature prediction. Xu et al. [11] proposed a novel deep-learning-based indoor temperature prediction method for public buildings, which verified the prediction accuracy in the direction of indoor temperature change and its disadvantage in the horizontal direction. Liu et al. [12] analyzed the time dependence of ocean temperatures at multiple depths and proposed a time-dependent ocean temperature prediction method, and the test results showed a better predictive performance than both support vector regression (SVR) and a multilayer perceptron regressor (MLPR). Wallschied et al. [13] verified the feasibility of LSTM on temperature prediction. However, the introduction of memory blocks in LSTM made the topological relationships complex, thus increasing the computing complexity.
In order to provide an accurate prediction method, we propose a method based on correlation analysis (CA) and proximal policy optimization (PPO) [14]. It selects the input features by correlation analysis and optimizes the model training process with a PPO algorithm. The remainder of the paper is organized as follows: The dataset and the correlation analysis process are described in Section 2. The rationale of our proposed method is presented in Section 3. The predictive model is validated and compared with other predictive networks in Section 4. Finally, a conclusion is given in Section 5.

Dataset and Correlation Analysis
In order to improve the prediction accuracy, an effective data processing method is proposed in this paper. The specific process is shown in Figure 1. The benchmark data are firstly sampled, and the correlation analysis is then conducted on the sampled data using the Pearson correlation coefficient (PCC) and the p-value. After the correlation analysis, the data features that are significantly negatively correlated with the predicted target are discarded. Meanwhile, some additional features such as the voltage magnitude u s , the current magnitude i s , and the electric apparent power S el are added to the processed sampled data to enrich the dataset and improve prediction accuracy.

Data Description
The benchmark data used in the experiment came from the Kaggle data science competition platform. The measurement and collection of the data were conducted by the University of Paderborn in Germany, and the benchmark data were normalized. The definitions of parameters and symbols of the column labels in the benchmark data are shown in Table 1. ϑ sy , ϑ st and ϑ sw were chosen as the test objectives for the experiment. The data contain 990,000 pieces. The experiment consisted of 52 measurement sessions, and each measurement session can be distinguished by S id . All measurement records were measured at a sampling frequency of 2 Hz on a test bench equipped with a three-phase permanent magnet synchronous motor.

Parameters
Symbols The benchmark data involve the main thermal process of a PMSM. In our work, 30,000 samples were taken from the benchmark data at random. The experiment selected 300 test samples as the test set, and the rest of the samples were used as the training set.

Correlation Analysis
Equipment failure often occurs in the process of continuous data acquisition, which will cause partial distortion in the benchmark data and become interference factors of the prediction. Therefore, Pearson correlation coefficient analysis [15] was adopted to observe the correlation between different features, and the p-value [16] was used to measure the related level.
The general expression of the Pearson correlation coefficient is as follows: where σ x and σ y are the standard deviation of variables x and y, respectively. Additionally, cov(x, y) is the covariance of the two variables, and µ x and µ y are the average values of variables x and y, respectively. In general, if the covariance of x and y is larger than 0, the variables x and y are positively correlated. If the covariance of x and y is equal to 0, the variable x the variable y are independent. Otherwise, the variable x and the variable y are negatively correlated.
The correlations between data features are discussed, and the significance level p-value is also evaluated. It is generally acknowledged that there is a significant difference between the two groups of data characteristics when the p-value is less than 0.05, and the difference is of particular significance when the p-value is less than 0.01.
In order to evaluate the correlation between the monitoring target and the benchmark data, the Pearson correlation values of each feature through the thermal diagram of the sampling data are analyzed in Figure 2. The values of the correlation coefficients between ϑ sy and T m , the current d-component i d , and the current q-component i q of PMSMs are all negatively correlated. The joint distribution density diagrams between ϑ sy and the above three features are shown in Figure 3. The correlation coefficients between ϑ st and the voltage d-component u d , the motor T m , the current d-component i d , and the current q-componenti q are shown in Figure 2, respectively. The values of the above correlation coefficients are all less than 0. Figure 4 shows the joint distribution of ϑ st with the above features to further show the correlation degree of the features. In the same way, it can be seen in Figure 2 that the correlation coefficients of ϑ sw with the voltage d-component ud and the current d-component i d are negative, respectively, so there are negative correlations between the features. Meanwhile, the joint distribution density diagrams of the target feature ϑ sw are shown in Figure 5. On the basis of the sampled data set, some additional feature quantities are considered in this paper. These features include the voltage magnitude u s based on their dq-components, the current magnitude i s based on their dq-components, and the electric apparent power S el , respectively. The specific calculation methods are defined as follows: where u d and u q are the components of voltage on d-component and q-component respectively, i d and i q are the components of the current on d-component and q-component, respectively, and * represents dot product operation.

Reinforcement Learning
In order to accurately predict the temperature of the main components of the PMSMs, the Actor-Critic framework of reinforcement learning (RL) [17] is introduced into the predictive network. The general structure of the Actor-Critic learning framework is shown in Figure 6. The target function of actor training can be dynamically adjusted by the feedback function. Therefore, the feedbacks of the Critic network to the Actor network are particularly important in the prediction process. In addition, the Nadam algorithm is used in the gradient optimization process.

Proximal Policy Optimization
The PPO algorithm is one of the policy gradient methods for RL proposed by OpenAI in 2017, and this algorithm is often applied in the control process of intelligent agents. The algorithm can easily achieve adjustments of hyper-parameters during the training of agents. In each iteration, it will attempt to minimize the objective function and recalculate the new update strategy. The objective function of the PPO algorithm can be defined by Formula (5): where ε is a constant, andÂ t is the feedback of the Critic network. Furthermore, r t (θ) is the ratio of the new strategy and the old strategy, and its calculation method is represented by Formula (6): where π θ (a t |s t ) is the updated new policy, π θ old (a t |s t ) is the corresponding old policy, a t and s t are the action and state values at time t, respectively. As shown in Formula (5), the objective function L CLIP (θ) includes two main parts: The first part is a product of the strategy ratio r t (θ) and the feedback valueÂ t . The second part is a product about r t (θ) and the feedback valueÂ t after clipping in the interval [1 − ε, 1 + ε]. Finally, the minimum value of the two parts can be obtained by Formula (5).
The definition of strategy ratio r t (θ) is given in Formula (7), where out t represents the output value at time t, and y t represents the real output value. Additionally, the output out t is given by the Actor network, and the loss function of the critic network is selected as the feedbackÂ t . The strategy ratio r t (θ) andÂ t are defined as follows: where N denotes the number of all predicted values.

Model Construction and Prediction
The temperature prediction model of the PMSM is shown in Figure 7. The Actor network and the Critic network include an input layer and an output layer, respectively, and h i (i = 1, 2, ..., 5) is the hidden layer.
The definition methods of hidden layers in the model are as follows: where x t is the input data matrix at time t, and * is an element multiplication sign. w i , b i and h i , respectively, represent the weight, the bias, and the output of each hidden layer for the network, (i = 1, 2, ..., 5). The corresponding weights and bias of the network output layer are w out and b out , respectively, and the final predicted value of the network at time t is out i . Further, θ and θ out , respectively, represent the parameter vectors before and after the policy update. After the completion of data processing and model construction, the loss objective function of the training model is determined by Formula (5). The Actor network and the Critic network share five hidden layers in this model, and the numbers of these network neurons are 512, 256, 128, 64, and 32, respectively. Moreover, the relu function is used as an activation function in each hidden layer.
The model chooses the input sequence with step size 5 as the input data. In the process of iteration training, the target L CLIP (θ) of model training is calculated according to theÂ t value, and r t (θ) is updated at each step.
In order to accelerate the convergence of the objective function and make the gradient reach the global minimum more quickly, the Nadam algorithm is used to optimize the training process. The correction valueĝ t of gradient g t is introduced into the Nadam algorithm and compared with Adam at time t, and the gradientĝ t is defined by Formula (12). In addition, the updated gradient ∆θ t is calculated by Formula (13). Finally, the predicted output values can be obtained by the trained model.
Here, u i is the momentum factor of the first moment estimation at time i, η is the learning rate of the Nadam algorithm,n t is the correction value of the second raw moment estimation of gradient at time t, and ξ is a positive number close to but not equal to zero.

Experimental Environment and Parameter Definition
The experimental environment in this experiment consisted of an Intel(R) Core(TM) i5-8250U 3.4 GHz quad-core processor with a 16 GB memory. The operating system was 64 bit Windows 10, the programming language version was Python3.7.5, and the deep learning framework version was Tensorflow1.13.1. The hyper-parameters considered during the experiment are in Table 2. In addition to the types of parameters in the table that are self-explanatory, some of the hyper-parameters not specifically mentioned should be interpreted as follows: When initializing the weights of the prediction network, the simplest method would be to assign random values from the interval [−1, 1]. In addition, more complex and efficient initializing schemes of weights can be considered, such as unit normal distribution or uniform distribution.

Model Evaluation
The goal of this paper is to predict the temperature of the PMSMs at the next moment. Therefore, the most effective evaluation methods for the above PMSM temperature prediction are the root mean square error (RMSE) [18] and the mean absolute error (MAE) [19]. As shown in Equations (14) and (15), the RMSE and MAE are calculated as follows: where R j represents the measured temperature of the target, P j represents the predicted temperature of the target, and N denotes the number of test data. In order to comprehensively evaluate the prediction performance of different methods, the Euclidean norm L 2 [20] and worst-case error L ∞ [21] are introduced to measure the approximation degree of the prediction target. The specific evaluation indexes are defined as follows: where R j , P j and N represents the same elements as in RMSE, and e ij indicates the sum of absolute values for all error in row i.

Experimental Results and Analysis
In order to evaluate the overall performance of our proposal and the comparative methods on the sampled dataset, the trend prediction results for ϑ sy , ϑ st and ϑ sw are demonstrated, respectively. As shown in Figures 8-10, the prediction curves that we proposed fit the real curves in the prediction period best. Although the curves of LSTM and the RNN conformed to the real target curves at first, they largely deviate at the end. Moreover, the fitted curves given by the EWMA method have a large delay characteristic. The x-coordinates of the curve represent the prediction period of the test data, and the y-coordinates are the prediction targets in Figures 8-10.   The relating evaluation indicators for ϑ sy , ϑ st and ϑ sw are provided on Table 3, Table 5 and Table 7 respectively, including RMSE, MAE, Euclidean norm L 2 and Infinite norm L ∞ .
The quantitative evaluation indicators of the temperature prediction of ϑsy with four prediction methods are given in Table 3. According to Table 3, the prediction error values of the prediction model proposed in this paper are the lowest compared with the other three methods. In the optimal case, the RMSE value and the L 2 of PPO-RL decreased by 0.1540 and 2.6624, compared with the LSTM network. In order to compare the computational complexity of the four methods, the calculation time of each method on the training set and test set was given after 30 iterations. It can be seen in Table 4 that the computational complexity of the PPO-RL is relatively high for ϑ sy on the training set. By contrast, it shows a low complexity for ϑ sy on the test set, which is 0.38 min lower than that of the LSTM. The quantitative evaluation indicators of temperature prediction of stator tooth with four prediction methods are given in Table 5. As shown, the PPO-RL method proposed in this paper has achieved an excellent performance. The RMSE value and MAE value of PPO-RL decreased by 0.0117 and 0.0424, respectively, and its Euclidean norm L 2 reached the minimum value. As can be seen in Table 6, LSTM has the lowest computational complexity for ϑ st on the training set, while compared with LSTM, the RNN, and EWMA, PPO-RL has the lowest computational complexity of ϑ st on the test set. The quantitative evaluation indicators of temperature prediction of ϑ sw with four prediction methods are given in Table 7. It can be seen in the table that the PPO-RL model has a lower prediction error and can obtain a higher prediction accuracy. It is worth noting that the LSTM network has a lower error than the RNN network and the EWMA method in the prediction experiment of the ϑ st . The errors in the prediction of the ϑ sy and ϑ s w are high. Table 8 shows the computing time analysis of the four methods on the training set and test set for ϑ sw . The PPO-RL shows the optimal computational complexity on the test set for ϑ sw , while the LSTM has the greatest computational complexity on the test set, which is 0.79 min higher than PPO-RL.

Conclusions
This paper systematically elaborates on the research status and shortcomings of traditional thermal network and machine learning methods on PMSM temperature prediction. Based on the problems found in the literature review, a temperature prediction method of PMSM based on proximal optimization is proposed. This method can obtain a better performance by adjusting the network structure and minimizing the objective function of PPO.
The prediction performance of the proposed method as well as three other classical machine learning networks were explored to validate the applicability and validity of this method. The results further show that the performance of an LSTM neural network is uncertain with regard to the test samples, which increases the difficulty of solving the global optimal values in the training process. In future research work, the improvement of the real-time performance of this method should be considered.