An Improved Autoencoder and Partial Least Squares Regression-Based Extreme Learning Machine Model for Pump Turbine Characteristics

Featured Application: Authors are encouraged to provide a concise description of the speciﬁc application or a potential application of the work. This section is not mandatory. Abstract: Complete characteristic curves of a pump turbine are fundamental for improving the modeling accuracy of the pump turbine in a pump turbine governing system. In view of the di ﬃ culty in modeling the “S” characteristic region of the complete characteristic curves in the pump turbine, a novel Autoencoder and partial least squares regression based extreme learning machine model (AE-PLS-ELM) was proposed to describe the pump turbine characteristics. First, a mathematical model was formulated to describe the ﬂow and moment characteristic curves. The improved Suter transformation was employed to transfer the original curves into WH and WM curves. Second, the ELM-Autoencoder technique and the partial least squares regression (PLSR) method were introduced to the architecture of the original ELM network. The ELM-Autoencoder technique was employed to obtain the initial weights of the Autoencoder based extreme learning machine (AE-ELM) model. The PLS method was exploited to avoid the multicollinearity problem of the Moore-Penrose generalized inverse. Lastly, the e ﬀ ectiveness of the proposed AE-PLS-ELM model has been veriﬁed using real data from a pumped storage unit in China. The results demonstrated that the AE-PLS-ELM model can obtain better modeling accuracy and generalization performance than the traditional models and, thus, can be exploited as an e ﬀ ective and su ﬃ cient approach for the modeling of pump turbine characteristics.


Introduction
As the demand for electricity and the requirements for developing a low-carbon economy continues, the driving force for energy development will gradually shift to renewable and clean energy such as photovoltaic power and wind power [1][2][3]. Due to its fast start-stop speed and flexible working conditions, the pumped storage units (PSUs) can quickly respond to the requirements of the power system for frequency and phase modulation, peak load shifting, rotation, and accident reserve, which can enhance the grid's ability to absorb wind power and photoelectricity [4,5]. The pump turbine governing system (PTGS) is a complex hydraulic-mechanical-electrical-magnetic coupling system, of the pump turbine. The results show that the proposed model can obtain higher fitting accuracy and better generalization performance than a single BP neural network model.
Although the neural network model can fit and extend the complete characteristic curves and facilitate the calculation of the flow and moment characteristic curves for the pump turbine, the convergence speed of the traditional neural network model is slow and it is easy to fall into local minimum. As a novel single hidden layer feedforward neural network, the extreme learning machine (ELM) obtained the input weights and hidden layer biases through random initialization [22]. The output weights are directly obtained by calculating the generalized inverse matrix of the hidden layer output matrix [23,24]. The convergence rate is far faster than the traditional BP neural network. The ELM has been widely used in pattern recognition, statistical prediction, and classification and regression [25]. However, because of the random initialization strategy of input weights, the common ELM model fails to make full use of the inherent characteristics of the training data. In addition, the Moore-Penrose generalized inverse used in the ELM model may produce pathological solutions. Therefore, there exists multicollinearity in the output matrix, which affects the fitting and generalization performance of the model [26]. Therefore, there still exists much room to improve in describing pump turbine characteristics using ELM.
To improve the modeling accuracy of the pump turbine in the simulation of PTGS, an Autoencoder and partial least squares regression based extreme learning machine model (AE-PLS-ELM) is proposed to model the pumped turbine of a PSU. With the strong fitting ability of the AE-PLS-ELM model, the non-linear mapping relationship of the full characteristic curves of pump turbines is represented. The flow and moment characteristic curves of pump turbines are transformed into neural network models, which can be used for a real-time simulation. On the basis of curve pretreatment with improved Suter transformation, two AE-PLS-ELM models are used to model the characteristic curves. The automatic encoder technique (AE) and partial least squares regression algorithm (PLSR) are introduced to improve the performance of the ELM model. The rest of this paper is arranged as follows. Section 2 describes the model of pump turbine based on characteristic curves, Section 3 proposes an AE-PLS-ELM model, Section 4 provides the specific modeling process of the pump turbine characteristics based on the proposed AE-PLS-ELM model, Section 5 employs a numerical example to verify the performance of AE-PLS-ELM, Section 6 provided an additional test problem, and Section 7 gives the conclusions.

Nonlinear Modeling of the Pump Turbine
The most common method for the pump turbine nonlinear modeling is through the complete characteristic curves [14]. The main idea of the nonlinear modeling based on complete curves is to first extract certain discrete data points from the practical curves, and then the extracted data points are fitted or extended to obtain the modeling curves [14]. The mathematical model of the flow and moment characteristics to express the pump turbine characteristics is as follows [2].
where M 11 represents the unit moment, Q 11 denotes the unit flow, N 11 is the unit speed, and a denotes the guide vane opening. In this study, the pump turbine complete characteristic curves of a hydropower station in China are employed as a case study. The practical complete characteristic curves are shown in Figure 1 as follows [2]. As can be seen in Figure 1, the original completed curves still have a significant twist curl when the unit speed is bigger than 80, which causes the multi-value phenomenon in the "S" characteristic area. For example, three flow ( Figure 1a) and moment (Figure 1b) data points appear when the value of unit speed is 90 r/min and the value of the guide vane opening is 10. The interpolation error is large and the derivative is discontinuous when multiple values exist, which may cause an iteration error to occur in the PTGS. To avoid the multi-value phenomenon of the original curves, this study introduces the improved Suter transformation [20] to pre-process the original complete curves. The original flow and moment characteristic curves are changed to WH and WM curves, respectively, through the improved Suter transformation. The converting equations of the improved Suter transformation are expressed as follows.
where a, q, h, and m denote the relative speed, relative flow, relative water head, and relative moment, respectively, x and y denote the relative flow angle and relative opening, respectively. s 2 > |M 11max |/M 11r , s 1 = 0.5~1.2, C y = 0.1~0.3, and C h = 0.4~0.6. The WH(x,y) and WM(x,y) curves based on an improved Suter transformation are given in Figure 2 as follows [2]. As can be seen in Figure 1, the original completed curves still have a significant twist curl when the unit speed is bigger than 80, which causes the multi-value phenomenon in the "S" characteristic area. For example, three flow ( Figure 1a) and moment (Figure 1b) data points appear when the value of unit speed is 90 r/min and the value of the guide vane opening is 10. The interpolation error is large and the derivative is discontinuous when multiple values exist, which may cause an iteration error to occur in the PTGS. To avoid the multi-value phenomenon of the original curves, this study introduces the improved Suter transformation [20] to pre-process the original complete curves. The original flow and moment characteristic curves are changed to WH and WM curves, respectively, through the improved Suter transformation. The converting equations of the improved Suter transformation are expressed as follows.
where a, q, h, and m denote the relative speed, relative flow, relative water head, and relative moment, respectively, x and y denote the relative flow angle and relative opening, respectively. s2 > |M11max|/M11r, s1 = 0.5~1.2, Cy = 0.1~0.3, and Ch = 0.4~0.6. The WH(x,y) and WM(x,y) curves based on an improved Suter transformation are given in Figure 2 as follows [2].

An Autoencoder and Partial Least Aquares Regression Based Extreme Learning Machine Model
An accurate pump-turbine model is the key to the modeling and simulation of PTGS. In this study, an autoencoder and partial least squares regression-based extreme learning machine model (AE-PLS-ELM) is proposed for the nonlinear modeling of pump turbine characteristics. The AE-PLS-ELM model is introduced in this section.

An Autoencoder and Partial Least Aquares Regression Based Extreme Learning Machine Model
An accurate pump-turbine model is the key to the modeling and simulation of PTGS. In this study, an autoencoder and partial least squares regression-based extreme learning machine model (AE-PLS-ELM) is proposed for the nonlinear modeling of pump turbine characteristics. The AE-PLS-ELM model is introduced in this section.

Extreme Learning Machine
ELM is a new type of single hidden layer forward neural network. It randomly generates the connection weights and biases between the input layer and the hidden layer. The connection weights and biases do not need to be adjusted during the training process. Once the number of hidden neurons is determined, the optimal solution can be obtained. Compared with the traditional training methods, ELM has the advantages of fast learning speed and good generalization performance [27].
Suppose that there are N samples x k , y k , k = 1, 2, . . . , N, where x k ∈ R p and y k ∈ R q . The mathematical expression for an ELM model with L hidden neurons is as follows.
whereŷ k denotes the simulated output of the kth sample, a i and β i are the connection weights between the ith hidden neuron and the input layer and hidden layer, respectively, b i is the bias of the ith hidden neruon, and g(·) is the activation function. Equation (3) can be reformulated below.

Hβ =Ŷ
where where H is known as the hidden layer output matrix, and β is the output weight matrix. Based on the Moore-Penrose generalized inverse matrix theory, β can be calculated as: where H † is the generalized inverse matrix of H. To improve the stability and the generalization of the ELM network, Huang et al. [28] added a positive constant 1/C to the diagonal of H T H or HH T . β can then be calculated as:

The ELM-Autoencoder Technique
The convergence speed of ELM is fast and the generalization capability is well compared with the traditional BP neural networks. However, the initial parameters of ELM are independent of the modeling data. Thus, the characteristics and internal relations of the modeling data cannot be effectively reflected. To obtain better initial parameters of ELM, the Autoencoder technology, which has been widely employed in deep learning, is introduced to ELM for modeling [29]. The traditional Autoencoder technology developed by Rumelhart et al. in 1986 is an unsupervised learning method based on a BP algorithm. The purpose of the Autoencoder technology is to approximate an identity function such that the output data is the same as the input data [30]. The ELM-Autoencoder technology based on the ELM algorithm is introduced in this study to avoid repeated iterative training of the BP network [31]. The Autoencoder based extreme learning machine (AE-ELM) proposed in this study can be implemented mainly in two stages. First, the ELM-Autoencoder technology is employed to establish the mapping relationship between X to X (X is the input data) using the ELM algorithm. The output weights of the ELM-Autoencoder is taken as the initial weights of the AE-ELM model. Second, the AE-ELM model, which takes X and Y (Y is the output data) as training data, is trained with the initial weights taken from the first stage.
The ELM-Autoencoder is a type of unsupervised network and the input weights and hidden biases should be orthogonal. Assume the number of input neurons is N p , and the number of the hidden neurons is N L . In this study, the number of input neurons is smaller than the number of hidden neurons. Therefore, a sparse ELM-Autoencoder architecture is adopted [32]. Given a set of N data samples, i.e., x k , y k for k = 1, 2, . . . , N, the hidden layer output of the ELM-Autoencoder can be expressed as: where h(x i ) ∈ R N L denotes the hidden layer output with respect to the ith input, a T a = I, b T b = I, I is a unit matrix, a = [a 1 , a 2 , . . . , a N ] denotes the orthogonal weights connecting the input layer and the hidden layer, b = [b 1 , b 2 , . . . , b N ] denotes the orthogonal biases of the hidden nodes, and g(·) is the activation function. The output weight β is optimized to minimize the squared loss of the training error. The optimization problem can be expressed as the equation below.
where X denotes the input data, H denotes the hidden layer outputs, and C denotes a penalty factor on the training error. The first derivative of O β with respect to β can be denoted as: The final hidden layer output weights can be calculated as:

Partial Least Squares Regression
PLSR developed by Wold is a multivariate statistical analysis method that takes advantage of both principal component analysis and the least square method [33]. Compared with the least square method, the PLSR method can deal with the multicollinearity problem. The PLSR method allows for regression modeling when the number of samples is smaller than the number of independent variables. In addition, the PLSR model considers information of both the independent and responsible variables, which makes it easier to identify system information and noises.
Given a set of N observed data samples, which were composed of p input and q output variables, i.e., S = (x i , y i ) for i = 1, 2, . . . , N, where x i ∈ R p denotes the independent variable and y k ∈ R q denotes the responsible variable. The objective of PLSR is modeling the linear relationship between the p independent variables and q response variables. The modeling process of PLSR can be denoted as follows [26]. First, the independent matrix X = x 1 , x 2 , . . . , x p n×p and response matrix Y = y 1 , y 2 , . . . , y q n×q are normalized into a zero mean and one variance. The normalized matrix of X and Y are denoted as E 0 and F 0 , respectively. Second, the PLSR method is applied to X and Y to extract the first pair of score vectors u 1 and v 1 , respectively. The score vectors u 1 and v 1 are the combination of the independent variable and responsible variables, respectively, and should contain the maximum variation information of them. The relationship between u 1 and v 1 should be as maximum as possible. Then the regression model between Y and u 1 is deduced. Third, the residual matrix E 1 and F 1 are calculated to replace E 0 and F 0 , respectively, to skip to the next step iteration until the residual matrix meets the stopping criteria.

The Proposed AE-PLS-ELM Model
In the Autoencoder based extreme learning machine (AE-ELM) model, the Moore-Penrose generalized inverse based on least square is exploited to calculate the output weights, which makes it possible to apply the PLSR method by replacing the least square method. The employment of PLSR in the AE-ELM model can avoid the multicollinearity problem when applying the Moore-Penrose generalized inverse especially when the hidden layer output matrix is highly correlated and contains noises [34]. Based on the description of AE-ELM and PLSR, the key to establish the AE-PLS-ELM model is to model the linear relationship between the hidden layer output matrix H and the output layer matrix Y using the partial peast squares regression (PLSR) method. As has been described in Section 3.2, the hidden layer output matrix H is an N × L dimension matrix with L hidden layer output variables. The output layer matrix Y is a N × q dimension matrix. The detailed modeling process of the AE-PLS-ELM model can be given as follows.
Step 2: Calculate the hidden layer output matrix of the training data H, according to Equation (3).
Step 3: The hidden layer output matrix H is taken as the independent matrix of the PLSR model.
Step 4: Extract the first pair of score vectors of E 0 and F 0 . The score vectors are denoted as u 1 and v 1 , respectively. u 1 and v 1 can be denoted below.
(1) u 1 and v 1 should contain the maximum variation information of E 0 and F 0 .
(2) u 1 and v 1 should have maximum relevance. The deduction of ω 1 and c 1 can be transformed to the following optimization problem.
According to the Lagrange multiplier method, the following equation is true.
From Equation (16), it can be deducted that: Note that: θ 1 is the objective function of the optimization problem, and Equations (19) and (20) are true.
From Equation (19) and Equation (20), it can be deducted that: where ω 1 is the eigenvector of E 0 T F 0 F 0 T E 0 , θ 1 2 is the eigenvalue, and ω 1 and c 1 can be obtained according to Equation (19) and Equation (20). After obtaining ω 1 and c 1 , the score vectors u 1 and v 1 can be calculated according to Equation (13).
Step 5: Establish the linear regression model between E 0 and u 1 , and F 0 and v 1 according to the least square method.
Step 6: If F 1 satisfy the stopping criteria, Equation (22) is the final regression model and the iteration stops. Otherwise, replace E 0 and F 0 with E 1 and F 1 , respectively, and skip to Step 3 to get the second pair of score vectors.
where α 2 and γ 2 are regression vectors and can be denoted as: Appl. Sci. 2019, 9, 3987 9 of 19 Step 7: Repeat Steps 4-5 until r principal components are calculated. The remaining m − r components are small and are considered as noises. The residuals E r and F r are very small. E 0 and F 0 can be expressed by the equation below.
The relationship between u k and v k can be expressed by the equation below [1].
Then F 0 can be translated into: whereÛ = E 0 W, then the regression equation can be denoted as: According to the description above, the hidden layer output weight vector can be expressed by the equation below.β where W denotes the component matrix and B denotes the diagonal matrix. Based on the above modeling process, the structure of the proposed AE-PLS-ELM model is shown in Figure 3 as follows. where W denotes the component matrix and B denotes the diagonal matrix. Based on the above modeling process, the structure of the proposed AE-PLS-ELM model is shown in Figure 3 as follows.

Modeling Process of the Pump Turbine Based on AE-PLS-ELM
The proposed AE-PLS-ELM model is used to model the pump turbine characteristics of a PSU. The flow and moment characteristic curves of the pump turbine are preprocessed using an improve Suter transformation method. Two independent AE-PLS-ELM models are used to model the preprocessed complete curves. The preprocessed complete curves are then converted into target

Modeling Process of the Pump Turbine Based on AE-PLS-ELM
The proposed AE-PLS-ELM model is used to model the pump turbine characteristics of a PSU. The flow and moment characteristic curves of the pump turbine are preprocessed using an improve Suter transformation method. Two independent AE-PLS-ELM models are used to model the preprocessed complete curves. The preprocessed complete curves are then converted into target variables to construct a neural network model that can be used for real-time simulation. The input of the AE-PLS-ELM model is the relative flow angle x and the relative vane opening y. The full characteristic curves based on improved Suter transformation (Figure 2) are employed as data samples.
The specific steps of the modeling process of the pump turbine characteristics based on AE-PLS-ELM are as follows.
Step 1: Apply the improved Suter transformation to the complete characteristic curves of the pump turbine provided by the power station and the corresponding preprocessed WH and WM curves are obtained.
Step 2: Extract data points from the curves. Convert the relative flow angle x, the relative vane opening y, and the extracted data points into input and output sample pairs.
Step 3: Divide the above sample data into training data and test data. Since the dimensions and magnitudes of the data samples are different, the input and output data are normalized to facilitate the modeling and calculation process.
Step 4: Set the Sigmoid function as the activation function of the hidden layer and determine the range of the number of hidden layer nodes, according to the Kolmogorov empirical formula. The optimal number of hidden nodes is selected using a trial calculation modeling error.
Step 5: Import the normalized training data to the AE-PLS-ELM model for training, and a well-trained AE-PLS-ELM model is obtained.
Step 6: Import the normalized test data to the well-trained AE-PLS-ELM model and de-normalize the output data to obtain the test output.

Numerical Experiments and Analysis
In this section, a PSU in China is used as the research object to carry out the nonlinear modeling of the pump turbine [8]. A total number of 1125 data points are extracted from the pre-processed WH and WM characteristic curves using the improved Suter transformation (Figure 2). Thus, 1125 data pairs are generated for constructing an AE-PLS-ELM model. In addition, 90% of the data pairs (1012 points) are employed as the training samples and the remaining 10% (112 points) as the test sample.

Parameters Setting
To highlight the effectiveness of the proposed model, four conventional data-driven techniques named the Bagtree, the support vector regression (SVR), the BP neural network [35], and the ELM are employed as a control group to simulate the complete characteristic curve of the pump turbine. The Bagtree model is constructed using MATLAB's "Bag" function. The parameters of the SVR model including the penalty factor C and the kernel parameter σ are obtained using the grid search (GS) algorithm. The search range of C is set as [2 −8 , 2 8 ], and the search range of σ is set as [2 −5 , 2 5 ]. The "trainlm" algorithm is employed in training the BP neural network. The maximum number of iterations is set as 500 and the target error is 1e-5. The number of the hidden nodes is selected using a trial-and-error method.

Comparative Analysis of the Results
To evaluate the performance of different models, four evaluation indices were used to include the root mean square error (RMSE), the mean absolute error (MAE), the mean absolute percent error (MAPE) [23,36], and the Nash-Sutcliffe Efficiency (NSE) [37] are employed. The four indices are defined as follows.
where f s (i) and f o (i) denote the simulated and observed value of the ith sample point, respectively, f o (i) denotes the mean observed value, and N denotes the size of the data set. The 3D spatial surfaces for the WH and the WM characteristics based on the AE-PLS-ELM model are shown in Figures 4 and 5, respectively. As can be seen from Figures 4 and 5, the WH and WM spatial surfaces based on the AE-PLS-ELM model are smooth and uniform. In addition, the complete characteristic curves are continuous derivative, which make it easy to ensure the convergence of the water hammer calculation process. The AE-PLS-ELM model can also be used to encrypt and extend the WH and WM characteristic surfaces, according to the practical requirements for research and engineering applications. Therefore, the transition between different opening lines on the surface is smoother and it is convenient for operators to obtain the pump turbine characteristics of different working conditions. for research and engineering applications. Therefore, the transition between different opening lines on the surface is smoother and it is convenient for operators to obtain the pump turbine characteristics of different working conditions.     Table 1. It can be seen from the table that the five models have good training and test accuracy and can model the WH characteristics of the PSU accurately. The AE-PLS-ELM model performs the best in terms of the four indices in both the training and test period among the five models, which indicates that the AE-PLS-ELM model can enhance the modeling accuracy of pump turbine for the WH characteristic curve effectively. Taking the RMSE value of the test period as an example, the RMSE value of ELM is 0.00297, which is lower than that of the Bagtree, SVR, and BP models. The neural network models as BP and ELM perform better than the Bagtree and SVR models. The performance of ELM is slightly better than BP and the Bagtree model performs the worst.
Compare the training and test results between the ELM and AE-PLS-ELM models, it can be found that the performance of the AE-PLS-ELM model is significantly better than the ELM model.   Table 1. It can be seen from the table that the five models have good training and test accuracy and can model the WH characteristics of the PSU accurately. The AE-PLS-ELM model performs the best in terms of the four indices in both the training and test period among the five models, which indicates that the AE-PLS-ELM model can enhance the modeling accuracy of pump turbine for the WH characteristic curve effectively. Taking the RMSE value of the test period as an example, the RMSE value of ELM is 0.00297, which is lower than that of the Bagtree, SVR, and BP models. The neural network models as BP and ELM perform better than the Bagtree and SVR models. The performance of ELM is slightly better than BP and the Bagtree model performs the worst. Compare the training and test results between the ELM and AE-PLS-ELM models, it can be found that the performance of the AE-PLS-ELM model is significantly better than the ELM model. For the training samples, the RMSE, MAE, MAPE, and NSE values of the AE-PLS-ELM model were 0.00229, 0.00130, 0.00599, and 0.99988, respectively, which were improved by 21.03%, 37.20%, 72.75%, and 0.007% compared with the 0.00290, 0.00207, 0.02198, and 0.99981 obtained by the ELM model. For the test samples, the RMSE, MAE, MAPE, and NSE values of the AE-PLS-ELM model were 0.00217, 0.00142, 0.00697, and 0.99988, respectively, which were improved by 26.93%, 39.32%, 71.75%, and 0.011% when compared with the 0.00297, 0.00234, 0.02467, and 0.99977 obtained by the ELM model. In a word, the proposed AE-PLS-ELM model overcomes the instability and multi-collinearity of the single ELM model, and can improve the generalization ability and fitting accuracy of the ELM for modeling the pump turbine characteristics.
The comparison of the residuals for the WH characteristics between the AE-PLS-ELM model and the Bagtree, SVR, BP, and ELM models are shown in Figure 6a-d, respectively. It can be seen from Figure 6 that the prediction accuracy of the AE-PLS-ELM model is significantly better than that of the Bagtree and SVR models at all test sample points. In addition, the residual of the AE-PLS-ELM model is generally smaller than the BP and ELM models at most of the test points.
The comparison of the residuals for the WH characteristics between the AE-PLS-ELM model and the Bagtree, SVR, BP, and ELM models are shown in Figures 6a-6d, respectively. It can be seen from Figure 6 that the prediction accuracy of the AE-PLS-ELM model is significantly better than that of the Bagtree and SVR models at all test sample points. In addition, the residual of the AE-PLS-ELM model is generally smaller than the BP and ELM models at most of the test points. The training and test results of the Bagtree, SVR, BP, ELM, and AE-PLS-ELM models for the WM characteristic are shown in Table 2. The comparison of the residuals for the WH characteristics between the AE-PLS-ELM model and the Bagtree, SVR, BP, and ELM models are shown in Figures  7a-7d, respectively. The results obtained from Table 2 and Figure 7 are consistent with Table 1 and The training and test results of the Bagtree, SVR, BP, ELM, and AE-PLS-ELM models for the WM characteristic are shown in Table 2. The comparison of the residuals for the WH characteristics between the AE-PLS-ELM model and the Bagtree, SVR, BP, and ELM models are shown in Figure 7a-d, respectively. The results obtained from Table 2 and Figure 7 are consistent with Table 1 and Figure 7. The neural network models that BP, ELM, and AE-PLS-ELM performed better than the Bagtree and the SVR models. The AE-PLS-ELM model can overcome the instability and multi-collinearity of the single ELM model, and obtain higher modeling accuracy.  The regression analysis scatter diagram of the residual and actual value for the WH and WM characteristics is shown in Figure 8. It can be seen from Figure 8 that the scatter plot of the single ELM model is more divergent around the axis, and the points of the AE-PLS-ELM model distributes more closely, which demonstrates the superiority of the AE-PLS-ELM model in modeling the complete characteristic curves of the pump turbine. The regression analysis scatter diagram of the residual and actual value for the WH and WM characteristics is shown in Figure 8. It can be seen from Figure 8 that the scatter plot of the single ELM model is more divergent around the axis, and the points of the AE-PLS-ELM model distributes more closely, which demonstrates the superiority of the AE-PLS-ELM model in modeling the complete characteristic curves of the pump turbine.

Additional Test Problem
To further demonstrate the effectiveness of the proposed AE-PLS-ELM model, a widely used nonlinear differential equation is studied as an additional test problem [38]. The nonlinear differential equation can be expressed by the equation below [9].

Additional Test Problem
To further demonstrate the effectiveness of the proposed AE-PLS-ELM model, a widely used nonlinear differential equation is studied as an additional test problem [38]. The nonlinear differential equation can be expressed by the equation below [9].
where u(k) is the control variable and is randomly generated in [−2,2] in the training period. Furthermore, 800 data pairs are generated for training. y(k + 1) is taken as the output variable. y(k), y(k − 1), y(k − 2), u(k), u(k − 1) are taken as the input variables. In the test period, u(k) is generated using the following equation.
The test results of the five different models are shown in Figure 9. As can be seen in Table 3, the AE-PLS-ELM outperforms the other four models in terms of the four indices in the test period. The performance of the SVR model is the worst. The Bagtree model performed the second worst and the BP and ELM models perform better than the Bagtree model. It can also be observed from Table 3 that the SVR and ELM models encounter the overfitting problem during the test period. Their training performances are much better than the other models while the test performances are much worse.  Figure 9. Comparison of the test results of the five different models for the nonlinear differential equation.

Conclusions
PTGS plays an extremely important role in maintaining the safe and stable operation of the power system. However, it is a closed-loop control system with a complex structure, variable parameters, and strong nonlinearity. As a crucial part of PSU, an accurate pump turbine model is the Figure 9. Comparison of the test results of the five different models for the nonlinear differential equation.

Conclusions
PTGS plays an extremely important role in maintaining the safe and stable operation of the power system. However, it is a closed-loop control system with a complex structure, variable parameters, and strong nonlinearity. As a crucial part of PSU, an accurate pump turbine model is the key to the accurate modeling and simulation of PTGS. This study first introduced an improved Suter transformation to process the complete characteristic curves of the pump turbine. The crossing, aggregating phenomena, and multi-value problems in the "S" characteristic region of the pump turbine were reduced through the improved Suter transformation. Furthermore, an AE-PLS-ELM model was proposed to model the pump turbine characteristics precisely. The AE technique was introduced to the single ELM model for feature extraction of input data to improve its stability. In addition, the PLSR algorithm was employed to replace the Moore-Penrose generalized inverse in ELM to reduce the multicollinearity of the output weight. Results have shown that the proposed AE-PLS-ELM model has better fitting precision and generalization performance than traditional models such as Bagtree, SVM, BP, and ELM. Essentially, the proposed modeling framework is an effective technique in modeling the pump turbine characteristics and the proposed AE-PLS-ELM can be used in other regression problems in a future study. However, the performances of some other data-driven methods with different structures, such as the multivariate adaptive regression spline (MARS), the gene expression programming (GEP) [39], the general regression neural network (GRNN), the genetic programming (GP), and the cascaded neural network (CCNN) have not been studied. More attention will be paid to the performances of different data-driven methods for nonlinear modeling of pump turbine characteristics in the future study.
Author Contributions: C.Z. designed and performed the experiments and drew the figures. T.P. analyzed the results, and wrote the original draft. J.Z. provided the data, reviewed the paper, and gave constructive advice. J.J. checked the whole paper and improved the writing of this paper. X.W. collected relevant material and gave some advice for advisement.
Funding: This research received no external funding.