Degradation Tendency Prediction for Pumped Storage Unit Based on Integrated Degradation Index Construction and Hybrid CNN-LSTM Model

Accurate degradation tendency prediction (DTP) is vital for the secure operation of a pumped storage unit (PSU). However, the existing techniques and methodologies for DTP still face challenges, such as a lack of appropriate degradation indicators, insufficient accuracy, and poor capability to track the data fluctuation. In this paper, a hybrid model is proposed for the degradation tendency prediction of a PSU, which combines the integrated degradation index (IDI) construction and convolutional neural network-long short-term memory (CNN-LSTM). Firstly, the health model of a PSU is constructed with Gaussian process regression (GPR) and the condition parameters of active power, working head, and guide vane opening. Subsequently, for comprehensively quantifying the degradation level of PSU, an IDI is developed using entropy weight (EW) theory. Finally, combining the local feature extraction of the CNN with the time series representation of LSTM, the CNN-LSTM model is constructed to realize DTP. To validate the effectiveness of the proposed model, the monitoring data collected from a PSU in China is taken as case studies. The root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) obtained by the proposed model are 1.1588, 0.8994, 0.0918, and 0.9713, which can meet the engineering application requirements. The experimental results show that the proposed model outperforms other comparison models.


Introduction
A pumped storage unit (PSU) operates under the combination of different conditions, which may cause equipment wear, degradation, and fault issues [1]. Degradation tendency prediction (DTP) is essentially a time-series prediction problem, namely predicting the future degradation propagation using the history and current monitoring data [2]. Accurate DTP can not only discover abnormal operating conditions, but also improve the reliability and stability of a PSU [3]. Thus, it is crucial to research DTP under different conditions to reduce failure rate, maintenance cost, and downtime [4][5][6].
In general, DTP mainly includes physics-based and data-based methods [7]. Physics-based methods would face more restrictions due to the heavy dependence on the physical rules and domain expertise. Nevertheless, without much expertise, data-driven methods can conduct DTP by mining the historical monitoring data and have attracted much attention in machinery prognostics [8]. They can be applied to the prognosis of complicated systems whose degradation process are difficult to analyze through physics the monitoring data (H, P, G) and Gaussian process regression (GPR) at first. It can fully reveal the relationship between the PSU's operating status and the monitoring data. In order to accurately quantify the degradation level of PSU, the IDI is built with entropy weight (EW) and PDIs from multiple objects, such as the lower bracket and the lower guide of PSU. Furthermore, to enhance the performance of the prediction model, the CNN-LSTM model is developed for DTP. The main contributions are highlighted as follows: (1) Considering the multi-dimensional monitoring data such as working head (H), active power (P), and guide vane opening (G), the health state model is established.
(2) In order to characterize a PSU's performance, the IDI is defined using EW theory and PDIs from multiple objects.
(3) For capturing the nonlinear characteristic from IDI series, a hybrid CNN-LSTM model is applied for DTP, where CNN can extract local features, LSTM can map the features into separable spaces to produce prediction results.
The rest of this paper is organized below. Section 2 reviews the related knowledge about GPR, EW, CNN and LSTM. The framework of the proposed DTP model is presented in Section 3. Then, Section 4 illustrates the engineering application and analysis with the proposed model. Finally, the conclusions are given in Section 5.

Gaussian Process Regression
With the flexibility in describing modeling uncertainty, GPR outperforms in fitting and regression problems [28]. GPR can model time series with Gaussian prior, which is determined by the kernel function K(x i , x j ) and mean function [29,30]. The process of GPR is described as follows: where x is the input and y is the output. f (x n ) is a hidden function. ξ n is the noise. Given the data D = [(x * , y * )], the joint distribution of the training set y and y* can be obtained according to the Bayesian principle: y y * = N 0, K(X, X) + σ 2 n I K(X, x * ) K(x * , X) k(x * , x * ) .

Convolutional Neural Network
Through layer-by-layer convolution operation, CNN can extract the spatial features hidden inside the data [31]. The features can be further applied for classification or regression. Different topologies have a direct impact on the network's fitting ability and generalization performance. Generally, CNN has achieved great success in processing 2D data, which has three dimensions: width, height, and number of channels. Particularly, the 1D CNN can be regarded as a special case, when the height of the input data is regarded as 1 [32]. Thus, 1D CNN is applied for capturing the spatial features of degradation tendency. The convolution layer can be considered as a filter of multiple input data to extract features. The input is processed by convolutional operation before passing to the next layer [33].
where W is the filter weight, b denotes a bias parameter, and σ represents the underlying activation function.

Long Short-Term Memory Neural Network
With recurrent connections in hidden layers, RNN can consider contextual information in history input, which makes it suitable for the analysis of sequential data [34]. However, due to the gradient explosion and vanishing, RNN is unable to represent long-term dependency characteristics of sequence data [35]. In contrast, as one of the most popular variants of RNN, the LSTM neural network can avoid such drawbacks by using hidden memory. It extends RNN with three types gates: the input gate determining whether the current input should be stored, the forget gate controlling whether the historical information is forgotten of the cell memory, and the output gate determining the information that flows into node output [36].
With the special memory structure and gated designing, LSTM has a better ability to learn long-term dependency. The structure of LSTM can be described by the following equations [36,37]: where i t , f t , c t , and o t represent the input gate, forget gate, cell activation vectors, and output gate, respectively.

Degradation Tendency Prediction Model of PSU
For modeling the highly nonlinear dynamics and estimating degradation tendency, a hybrid DTP model of PSU based on IDI construction and CNN-LSTM is proposed, which considers multi-dimension monitoring data from multiple objects. The proposed model systematically has a fine mix of mathematical statistics and intelligent prediction technology. This section consists of four parts: (1) analyze the monitoring data and construct the health model; (2) generate IDI series of degradation tendency using EW theory and PDIs; (3) construct the CNN-LSTM predict model; and (4) illustrate the general procedure of the proposed model.

Analyze and Establish the Health State Model
With the accumulation of running time, PSU gradually experiences the process of health, degradation and failure. The gradual nature makes the degradation tendency predictable. Through in-depth analysis of historical data, it can be found that the most representative monitoring data are working head (H), active power (P) and guide vane opening (G) of PSU. By constructing the health state model with GPR, the mapping between monitoring data (H, P, G) and operation status data (F) can be obtained.

Analyze the Monitoring Data
With the continuous expansion of online monitoring systems, the relationship between monitoring data and operating state can be effectively mined to determine the health state of PSU. Through data Sensors 2020, 20, 4277 5 of 22 visualization, the relationship between monitoring data (H, P, G) and operation status data (F) is analyzed. The sensitive monitoring data and measured operation status data are selected as input and output of the health state model, respectively.

Establish Health State Model
Based on the above comprehensive analysis, it can be found that the monitoring data (H, P, G) are the three main factors affecting the operation of PSU. Therefore, a standard multi-dimensional health state model is constructed with GPR. The model can accurately describe the health state with the monitoring data. The recorded monitoring data (H, P, G) are brought into GPR and the corresponding operation status data (F) is the output. The mapping relationship can be expressed as follows: In health state model establishment stage, the correlation (R) is utilized as the criterion of model fitting [37].
where c(t) is the model fitting value, r(t) is the actual value of PSU, N is the length of the data, t is the operating time, while r(t) and c(t) represent the average of the actual value and the fitting value, respectively.

Construct IDI for PSU
In order to effectively quantify the degradation level of PSU, a comprehensive PDI should be constructed. For each object with available data, its PDI can be constructed based on the health state model. However, the PDI of a single object cannot represent the degradation degree of the whole PSU. With the help of available PDI of multiple objects and EW theory, an IDI is developed to comprehensively describe the health state of PSU.

Construction of PDI
The monitoring data (H, P, G) are taken as input of the health state model. The standard value of health state V(t) can be obtained. Then, comparing V(t) with the measured operation status data F(t) of a PSU, the PDI of the object is developed as follows:

Construct IDI with PDIs and EW Theory
Since the PDIs from each objects have different effects on the health state of PSU, its information entropy is also different [38]. The weight identified by entropy can indicate the amount of useful information contained in PDIs [39]. Therefore, the IDI using EW theory and PDIs of different objects is defined with all effective information retained, characterizing the health state of PSU. The specific process is described as follows: With the obtained PDIs of different objects, the input matrix of entropy is given in Equation (13): where X ij represents the index in the i-th data sequence of the j-th PDI, i = 1, 2, 3 . . . , m, j = 1, 2, 3 . . . , n.
Since the obtained PDIs have the same dimension, the process of normalizing is as follows: where Y ij is the normalized value corresponding to X ij . Then, the entropy of normalized PDIs can be calculated by [40]: where p i is the probability of the PDIs series and E i is value of the information entropy. According to the entropy results, the entropy weights W 1 , W 2 , . . . W k of each PDI can be calculated [41]: where W i represents the weight corresponding to the PDI, W is the weight matrix, and k is the number of PDIs series. Finally, the IDI can be obtained as follows: Note that the value of IDI varies from 0 to 1, "0" for health, and "1" for failure. In essence, the process of IDI can be regarded as the multi-dimensional data fusion, which can effectively and comprehensively describe the health state of PSU.

Predict the Degradation Tendency of PSU with CNN-LSTM
To improve prediction performance, a hybrid CNN-LSTM model is proposed for DTP of PSU. It takes the constructed IDI series and the future data of degradation tendency as the input and output, respectively.

The Structure of CNN-LSTM
In CNN-LSTM network, the CNN is applied to extract advanced spatial features from IDI, and the LSTM can investigate the relationships between historical inputs and current health state. The structural schematic of CNN-LSTM is illustrated in Figure 1. Specifically, the input layer is constructed from IDI time series variables. Next, two one-dimensional convolutional layers are built for extracting spatial features. After that, a LSTM layer is added to extract temporal nonlinearity inside IDI. Finally, two fully connected layers are applied to produce the final prediction results of the IDI.
In order to ensure that the proposed network has better accuracy and generalizability, an Adam optimizer is applied for minimizing the total loss [42]. The biases and weights are updated based on the gradient of the loss function. Meanwhile, the maximum training epochs is 200 and the initial learning rate is 0.01. To avoid possible over-fitting, the dropout rate is 0.2 in fully connected layer (FC). Meanwhile, the loss function is mean square error (MSE) in training as follows: whereŷ k is the output, y k is the real value, K is number of the training data. In order to ensure that the proposed network has better accuracy and generalizability, an Adam optimizer is applied for minimizing the total loss [42]. The biases and weights are updated based on the gradient of the loss function. Meanwhile, the maximum training epochs is 200 and the initial learning rate is 0.01. To avoid possible over-fitting, the dropout rate is 0.2 in fully connected layer (FC). Meanwhile, the loss function is mean square error (MSE) in training as follows: where ˆk y is the output, k y is the real value, K is number of the training data.

Evaluation Criterion
To evaluate the effectiveness of prediction model, three common indexes such as root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) are employed for evaluation [43].
where i y and ˆi y are the actual and predicted values, respectively.
Additionally, the improved percentage of the above indexes is applied for evaluating different models quantitatively. The definition of improved percentage index P is defined as follows: where A v and B v are the index values of the comparative model and the proposed model, respectively.

The Proposed Degradation Tendency Prediction Model and Framework
In this section, a hybrid model is presented that includes the health state model establishment, IDI construction with PDIs and EW theory, CNN-LSTM predicting model and predicting results

Evaluation Criterion
To evaluate the effectiveness of prediction model, three common indexes such as root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) are employed for evaluation [43].
where y i andŷ i are the actual and predicted values, respectively. Additionally, the improved percentage of the above indexes is applied for evaluating different models quantitatively. The definition of improved percentage P index is defined as follows: where v A and v B are the index values of the comparative model and the proposed model, respectively.

The Proposed Degradation Tendency Prediction Model and Framework
In this section, a hybrid model is presented that includes the health state model establishment, IDI construction with PDIs and EW theory, CNN-LSTM predicting model and predicting results evaluation. The main steps are displayed in Figure 2. The more specific processes are as described below.
Step A: Construct the health state model   Step B: Generate PDI and obtain IDI Real-time monitoring data of PSU can be substituted into the health model. The standard health state value V(t) is obtained. The standard V(t) is compared with the real value of the monitoring data F(t). Then PDI can be defined by Equation (12). With the obtained PDIs of different objects and EW theory, the IDI is defined with all effective important retained, characterizing the health state of PSU.
Step C: Degradation tendency prediction with CNN-LSTM Due to the nonlinearity of degradation tendency series, simple network prediction is generally not effective. In order to obtain more accurate prediction results of degradation tendency, we constructed the input and output structure and proposed a hybrid CNN-LSTM model. The hybrid model fuses the advantages of CNN in local features extraction and the expression of LSTM on time series.
Step D: Prediction results evaluation Prediction effectiveness can be measured not only by the square sum of predicting error, but also by the mean squared deviation of the error. In this study, three common evaluation criteria are employed to assess the prediction performance, in particular, RMSE, MAE, and MAPE. In addition, the error distributions and Taylor diagram are also employed to compare the performance of the proposed model more intuitively.

Engineering Application and Analysis
To validate the proposed DTP model, the experiments were conducted with the dataset collected from PSU, which is located in Jingxing County, Hebei Province of China. The PSU is equipped with a single-stage mixed-flow reversible pump-turbine unit, whose capacity of 250 MW. The maximum turbine/pump head is 346 m, and the minimum turbine/pump head is 291 m. The rated head of turbine is 305 m, and the rated speed of PSU is 333.3 r/min. The experiments applied to the dataset were conducted in MATLAB R2016a software with CPU 2.5 GHz and RAM 32 GB.
The TN8000 condition monitoring system was installed in PSU [44]. It contains local sensors, data acquisition and central web server. It can collect up to 4 key phase signals, 24 vibration/swing signals, 12 static signals and 255 process signals obtained by Modbus communication. The central web server can store and manage the data from data acquisition equipment. The data monitored from TN8000 system are illustrated in Table 1. The TN8000 system acquires vibration and swing data with Bently 330505 sensors and 3300 sensors, which follows the standard of ISO/IEC7498-IEEE/ANSI 802 [45]. The sensors are a kind of electric eddy current sensor of 3300 series, whose output is 4-20 mA or 1-5 V. In addition, some data like active power and guide vane opening are directly taken from the PSU through hard wiring. And the data like working head needs to be read through the computer supervise and control system. Meanwhile, with the complicated structure and frequent switching of operating conditions, it may easily lead to different monitoring data with different time intervals. Hence, the monitoring data that meet the average time interval were chosen for analysis, which is in line with engineering practice. The corresponding P, H and G are recorded to construct data vectors F (H(t), P(t), and G(t)) from 31 July 2008 to 25 December 2009. They were selected to build the health state model. Then, the PDI from 2 January 2010 to 15 December of 2011 can be obtained.

Data Collection
In engineering application, the vibration and swing signals are usually applied for evaluating the status of equipment. To comprehensively quantify the degradation level, the vibration and swing data are selected as for DTP from different objects, such as the vibration in X-direction and Y-direction of lower bracket, the swing data in X-direction and Y-direction of lower guide. The corresponding monitoring data (H, P, G) are recorded as presented in Figure 3. The monitoring data from 31 July 2008 to 25 December 2009 were used to construct the health state model. With the data from 2 January 2010 to 15 December of 2011, the IDI was then obtained based on EW theory and PDIs. As illustrated in Figure 4, due to the integration of various working conditions, including pumping, no-load, and generating operations, the vibration and swing data are strongly non-linear and non-stationary.
10 of 23 The TN8000 system acquires vibration and swing data with Bently 330505 sensors and 3300 sensors, which follows the standard of ISO/IEC7498-IEEE/ANSI 802 [45]. The sensors are a kind of electric eddy current sensor of 3300 series, whose output is 4-20 mA or 1-5 V. In addition, some data like active power and guide vane opening are directly taken from the PSU through hard wiring. And the data like working head needs to be read through the computer supervise and control system. Meanwhile, with the complicated structure and frequent switching of operating conditions, it may easily lead to different monitoring data with different time intervals. Hence, the monitoring data that meet the average time interval were chosen for analysis, which is in line with engineering practice. The corresponding P, H and G are recorded to construct data vectors F (H(t), P(t), and G(t)) from 31 July 2008 to 25 December 2009. They were selected to build the health state model. Then, the PDI from 2 January 2010 to 15 December of 2011 can be obtained.

Data Collection
In engineering application, the vibration and swing signals are usually applied for evaluating the status of equipment. To comprehensively quantify the degradation level, the vibration and swing data are selected as for DTP from different objects, such as the vibration in X-direction and Y-direction of lower bracket, the swing data in X-direction and Y-direction of lower guide. The corresponding monitoring data (H, P, G) are recorded as presented in Figure 3. The monitoring data from 31 July 2008 to 25 December 2009 were used to construct the health state model. With the data from 2 January 2010 to 15 December of 2011, the IDI was then obtained based on EW theory and PDIs. As illustrated in Figure 4, due to the integration of various working conditions, including pumping, no-load, and generating operations, the vibration and swing data are strongly non-linear and non-stationary.

Monitoring Data Analysis
To further analyze the relationship between monitoring data (H, P, G) and operation status data (F), the data visualization is utilized. With the reversible operation and frequent condition switching, the monitoring data covers different working conditions of PSU. In addition, the obtained relationships are different among the four kinds of vibration and swing signals. For simplicity, the Xdirection swing data of the lower guide were selected for detailed discussion. The relationship diagrams are shown in Figure 5.
Specially, Figure 5a illustrates the relation diagram between H and swing data, it can be seen

Monitoring Data Analysis
To further analyze the relationship between monitoring data (H, P, G) and operation status data (F), the data visualization is utilized. With the reversible operation and frequent condition switching, the monitoring data covers different working conditions of PSU. In addition, the obtained relationships are different among the four kinds of vibration and swing signals. For simplicity, the X-direction swing data of the lower guide were selected for detailed discussion. The relationship diagrams are shown in Figure 5.
Specially, Figure 5a illustrates the relation diagram between H and swing data, it can be seen that the distribution of H is mainly concentrated between 315 m and 345 m. Figure 5b is the relationship between P and swing data. There are three parts of swing data: the generating condition dataset, the no-load condition dataset and pumping condition dataset. The data are mainly distributed in 0 MW and −250 MW under no-load condition and pumping condition, while the distributions in generating condition are mainly concentrated in 250 MW, 200 MW and 150 MW. Figure 5c shows the relationship between G and swing data. The data can be divided into two parts, including the guide vane closing dataset and the guide vane opening dataset. The guide vane opening dataset is mainly distributed at 40-90% opening.

Health State Model Establishment
Based on the above analysis, it can be seen that there is a clear mapping relationship between monitoring data (H, P, G) and operation status data (F) of PSU. And the vibration and swing data are greatly affected by the monitoring data (H, P, G). Therefore, the health state model was constructed with GPR and the operation status data (F). The 4000 data were selected from 31 July 2008 to 25 December 2009. Among them, three out of four samples were used for the health state model training, while the remainder was utilized for testing. The parameters of GPR are shown in Table 2. The fitting results of the swing and vibration data are shown in Figure 6. In the model establishment stage, the values of R are 0.9714, 0.9716, 0.9634, and 0.9539, respectively. This means that the health state model can represent the original PSU at a high level of confidence.

Monitoring Data Analysis
To further analyze the relationship between monitoring data (H, P, G) and operation status data (F), the data visualization is utilized. With the reversible operation and frequent condition switching, the monitoring data covers different working conditions of PSU. In addition, the obtained relationships are different among the four kinds of vibration and swing signals. For simplicity, the Xdirection swing data of the lower guide were selected for detailed discussion. The relationship diagrams are shown in Figure 5.
Specially, Figure 5a illustrates the relation diagram between H and swing data, it can be seen that the distribution of H is mainly concentrated between 315 m and 345 m. Figure 5b is the relationship between P and swing data. There are three parts of swing data: the generating condition dataset, the no-load condition dataset and pumping condition dataset. The data are mainly distributed in 0 MW and −250 MW under no-load condition and pumping condition, while the distributions in generating condition are mainly concentrated in 250 MW, 200 MW and 150 MW. Figure 5c shows the relationship between G and swing data. The data can be divided into two parts, including the guide vane closing dataset and the guide vane opening dataset. The guide vane opening dataset is mainly distributed at 40-90% opening.

Health State Model Establishment
Based on the above analysis, it can be seen that there is a clear mapping relationship between monitoring data (H, P, G) and operation status data (F) of PSU. And the vibration and swing data are greatly affected by the monitoring data (H, P, G). Therefore, the health state model was constructed with GPR and the operation status data (F). The 4000 data were selected from 31 July 2008 to 25 December 2009. Among them, three out of four samples were used for the health state model training, while the remainder was utilized for testing. The parameters of GPR are shown in Table 2.

Construction of PDI
The measured data from 2 January 2010 to 15 December of 2011 were adopted to construct the PDI of PSU. Given the operation status data F(t), the monitoring data (H, P, G) in the same state are brought into the established health model. Then, the standard value of operation status V(t) can be obtained. So, the PDI series can be obtained through Equation (12). As shown in Figure 7, there are

Construction of PDI
The measured data from 2 January 2010 to 15 December of 2011 were adopted to construct the PDI of PSU. Given the operation status data F(t), the monitoring data (H, P, G) in the same state are brought into the established health model. Then, the standard value of operation status V(t) can be obtained. So, the PDI series can be obtained through Equation (12). As shown in Figure 7, there are four PDI series from vibration data in X-direction and Y-direction of lower bracket, swing data in X-direction and Y-direction of lower guide. The daily mean value of PDIs tends to increase, which means the performance of PSU gradually deteriorates overtime. However, it is insufficient to randomly select any of the obtained PDIs to represent the degradation tendency of the entire PSU.

Construct IDI with PDIs and EW Theory
To reduce the impact of PDIs irregular fluctuations and obtain the comprehensive PDI series of PSU, an integrated strategy with EW theory is developed as described in Section 3.2. First, the weight is calculated by EW for the four PDI series. The entropy and its weight values of the PDIs can be obtained as shown in Table 3. The values of entropy are within the range of 0.95-1, which indicates that these PDI series can represent the degradation degree of the PSU to a certain extent. Then, as illustrated in Figure 8, IDI series can be obtained with the PDIs and its weight based on Equation (19). The entropy value of IDI series is 0.9502. This means the generated IDI series can retain the complete information in the PDIs and reduce the complexity of PDIs series. In summary, the IDI can be used as an effective indictor to evaluate the degradation level of PSU.

Construct IDI with PDIs and EW Theory
To reduce the impact of PDIs irregular fluctuations and obtain the comprehensive PDI series of PSU, an integrated strategy with EW theory is developed as described in Section 3.2. First, the weight is calculated by EW for the four PDI series. The entropy and its weight values of the PDIs can be obtained as shown in Table 3. The values of entropy are within the range of 0.95-1, which indicates that these PDI series can represent the degradation degree of the PSU to a certain extent. Then, as illustrated in Figure 8, IDI series can be obtained with the PDIs and its weight based on Equation (19). The entropy value of IDI series is 0.9502. This means the generated IDI series can retain the complete information in the PDIs and reduce the complexity of PDIs series. In summary, the IDI can be used as an effective indictor to evaluate the degradation level of PSU. With the proposed CNN-LSTM model, the prediction result is obtained as illustrated in Figure  9. The performance of CNN-LSTM model is generally satisfactory. In addition to the local abrupt points, the prediction values and the actual values of the degradation time series fit well. Owing to the strong nonlinear learning and fitting ability, the proposed model can better predict the degradation tendency.

Results and Analysis
To verify the nonlinear characterization ability of CNN-LSTM, two types of comparison experiments are carried out. Specifically, the first type includes support vector regression (SVR)-based model and extreme learning machine (ELM)-based model. The second includes CNN-based model, LSTM-based model and gated dual attention unit neural network (GDAU) [46]. Furthermore, the analysis and discussion are conducted for a more comprehensive evaluation of the proposed model.

Comparison of SVR and ELM Prediction
SVR is an extended application of support vector machine from classification to regression. The parameters of SVR are searched by grid search (GS) method in this section. As a feedforward neural network with single hidden layer, ELM has been widely studied for its fast convergence speed. And the optimal number of hidden layer nodes is determined using GS method. In addition, the input data of SVR and ELM is consistent with the CNN-LSTM.
The prediction results of different models are presented in Figures 10 and 11 and Table 5. As can be seen in Table 5 [46]. Furthermore, the analysis and discussion are conducted for a more comprehensive evaluation of the proposed model.

Comparison of SVR and ELM Prediction
SVR is an extended application of support vector machine from classification to regression. The parameters of SVR are searched by grid search (GS) method in this section. As a feedforward neural network with single hidden layer, ELM has been widely studied for its fast convergence speed. And the optimal number of hidden layer nodes is determined using GS method. In addition, the input data of SVR and ELM is consistent with the CNN-LSTM.
The prediction results of different models are presented in Figures 10 and 11 and Table 5. As can be seen in Table 5, the RMSE and MAE of SVR are 1.9565 and 1.5712 respectively, while that of CNN-LSTM are 1.1588 and 0.8994. Similarly, the results of ELM are 1.9115 and 1.6613. In addition, the R values of SVR and ELM are 0.9214 and 0.9261, which are lower than that of CNN-LSTM 0.9713. On the whole, the proposed CNN-LSTM model achieved minimum values on three indicators, which illustrates the effectiveness of the proposed model.   Conv1D (32,8) Conv1D (32,8) LSTM(32) Figure 11. The predicted results of ELM model.

Comparison of CNN, LSTM and GDAU Prediction
Furthermore, we have conducted experiments to compare the proposed model with other deep learning-based models. CNN, LSTM and GDAU are adopted for prediction, and the results are evaluated in four error criteria like RMSE, MAE, MAPE, and R. The comparison models are related strictly according to the comparison principle. The settings of hyper-parameters and experimental conditions are the same as CNN-LSTM. The structure of CNN and LSTM is shown in Figure 12. In Figures 13-15, it can be seen that the prediction results of the compared models can approximately fit the actual values with little error. Moreover, the hybrid CNN-LSTM model performs best in RMSE, MAE and MAPE. The RMSE of CNN, LSTM, and GDAU are 1.4233, 1.6166, and 1.1797, respectively. Similarly, the MAE value of CNN-LSTM model is slightly lower than that of compared model. As seen in Table 5, the R value of CNN-LSTM model is 0.9713, which is obviously higher than that of CNN 0.9502 and LSTM 0.9470. The results illustrate the hybrid CNN-LSTM can effectively improve the prediction accuracy by integrating the excellence of CNN in local feature extraction and good expression of LSTM on time series features. Similarly, the MAE value of CNN-LSTM model is slightly lower than that of compared model. As seen in Table 5, the R value of CNN-LSTM model is 0.9713, which is obviously higher than that of CNN 0.9502 and LSTM 0.9470. The results illustrate the hybrid CNN-LSTM can effectively improve the prediction accuracy by integrating the excellence of CNN in local feature extraction and good expression of LSTM on time series features.

Analysis and Discussion
To compare the performance of various models more intuitively, the results achieved by comparative models are shown in Table 6 and Figures 16 and 17, respectively. The average performance improvement ratio is shown in Table 6 And the error distributions of all models are illustrated in Figure 16. Furthermore, the Taylor diagram is employed as illustrated in Figure 17. The standard deviations, centered root-mean-square and correlations are clearly exhibited.

Analysis and Discussion
To compare the performance of various models more intuitively, the results achieved by comparative models are shown in Table 6 and Figures 16 and 17, respectively. The average performance improvement ratio is shown in Table 6 And the error distributions of all models are illustrated in Figure 16. Furthermore, the Taylor diagram is employed as illustrated in Figure 17. The standard deviations, centered root-mean-square and correlations are clearly exhibited.

Analysis and Discussion
To compare the performance of various models more intuitively, the results achieved by comparative models are shown in Table 6 and Figures 16 and 17, respectively. The average performance improvement ratio is shown in Table 6 And the error distributions of all models are illustrated in Figure 16. Furthermore, the Taylor diagram is employed as illustrated in Figure 17. The standard deviations, centered root-mean-square and correlations are clearly exhibited. As can be seen in Table 6, it is clearly indicated that the proposed model performs better than SVR, ELM with the P RMSE 40.77% and 39.38%. The same improvement also exists in MAE and MAPE indexes. In addition, the error distributions of SVR and ELM are relatively wide, which means that they cannot achieve uniform precision at almost all points. From Figures 9 and 13, Figures 14 and 15, the prediction results of CNN, LSTM, GDAU and CNN-LSTM models basically consistent with the actual values, expect for the local abrupt change points. Compared with the CNN model and the LSTM model in Table 6, the P RMSE of CNN-LSTM model is increased by 18.58% and 28.32%, respectively. Comparing the box plots of errors distribution in Figure 16, the errors of the proposed model are mainly concentrated around 0 and narrow in tiny scales in all the comparative models. It means that the proposed model can achieve uniform precision at almost all points. As shown in Figure 17, the CNN-LSTM model and GDAU model have the roughly same correlation coefficient with observation. However, the proposed model has the same standard deviation as the observation, whereas the GDAU model has more spatial variability considering its larger standard deviation. This means the performance of the GDAU model is also good.
As can be seen in Table 6, it is clearly indicated that the proposed model performs better than SVR, ELM with the PRMSE 40.77% and 39.38%. The same improvement also exists in MAE and MAPE indexes. In addition, the error distributions of SVR and ELM are relatively wide, which means that they cannot achieve uniform precision at almost all points. From Figure 9 and Figures 13-15, the prediction results of CNN, LSTM, GDAU and CNN-LSTM models basically consistent with the actual values, expect for the local abrupt change points. Compared with the CNN model and the LSTM model in Table 6, the PRMSE of CNN-LSTM model is increased by 18.58% and 28.32%, respectively. Comparing the box plots of errors distribution in Figure 16, the errors of the proposed model are mainly concentrated around 0 and narrow in tiny scales in all the comparative models. It means that the proposed model can achieve uniform precision at almost all points. As shown in Figure 17, the CNN-LSTM model and GDAU model have the roughly same correlation coefficient with observation. However, the proposed model has the same standard deviation as the observation, whereas the GDAU model has more spatial variability considering its larger standard deviation. This means the performance of the GDAU model is also good.
To sum up, the proposed CNN-LSTM model can accurately predict the degradation tendency of PSU. We have confirmed through experiments that the proposed CNN-LSTM is a competitive method for DTP.  As can be seen in Table 6, it is clearly indicated that the proposed model performs better than SVR, ELM with the PRMSE 40.77% and 39.38%. The same improvement also exists in MAE and MAPE indexes. In addition, the error distributions of SVR and ELM are relatively wide, which means that they cannot achieve uniform precision at almost all points. From Figure 9 and Figures 13-15, the prediction results of CNN, LSTM, GDAU and CNN-LSTM models basically consistent with the actual values, expect for the local abrupt change points. Compared with the CNN model and the LSTM model in Table 6, the PRMSE of CNN-LSTM model is increased by 18.58% and 28.32%, respectively. Comparing the box plots of errors distribution in Figure 16, the errors of the proposed model are mainly concentrated around 0 and narrow in tiny scales in all the comparative models. It means that the proposed model can achieve uniform precision at almost all points. As shown in Figure 17, the CNN-LSTM model and GDAU model have the roughly same correlation coefficient with observation. However, the proposed model has the same standard deviation as the observation, whereas the GDAU model has more spatial variability considering its larger standard deviation. This means the performance of the GDAU model is also good.
To sum up, the proposed CNN-LSTM model can accurately predict the degradation tendency of PSU. We have confirmed through experiments that the proposed CNN-LSTM is a competitive method for DTP.  To sum up, the proposed CNN-LSTM model can accurately predict the degradation tendency of PSU. We have confirmed through experiments that the proposed CNN-LSTM is a competitive method for DTP.

Conclusions
In this paper, a hybrid model is proposed for DTP based on IDI construction and CNN-LSTM. Considering the condition parameters, such as active power (P), working head (H), and guide vane opening (G), the health state model is established with GPR. Then, the IDI that characterizes the degradation level is developed with multi-objects and EW theory. The CNN-LSTM model is applied for DTP with higher accuracy. To illustrate the effectiveness of the proposed model, the vibration dataset and the swing dataset of PSU in China are chosen as case study. The parallel predicting models as SVR, ELM, CNN, LSTM, and GDAU are conducted to highlight the advantages of the proposed model. Comparing the results of all the models, the lowest RMSE, lowest MAE, and lowest MAPE are achieved by the proposed model, which revealed the effectiveness of the CNN-LSTM model for DTP.
However, we paid more attention to the performance of the proposed model and considered the number of parameters comparatively less. In addition, the mixed models have been widely investigated to enhance the predicting performance, where the advantages corresponding to each model could be maximized, thus making full use of each model to deal with different situations. Therefore, the perspectives of our subsequent research work are summarized as follows: (1) some strategies that improve the efficiency of model and reduce the number of parameters will be employed and (2) the combination of multiple predicting models will be the focus of our future work.