A Novel Hybrid Decomposition—Ensemble Prediction Model for Dam Deformation

Accurate and reliable prediction of dam deformation (DD) is of great significance to the safe and stable operation of dams. In order to deal with the fluctuation characteristics in DD for more accurate prediction results, a new hybrid model based on a decomposition-ensemble model named VMD-SE-ER-PACF-ELM is proposed. First, the time series data are decomposed into subsequences with different frequencies and an error sequence (ER) by variational mode decomposition (VMD), and then the secondary decomposition method is introduced into the prediction of ER. In these two decomposition processes, the sample entropy (SE) method is innovatively utilized to determine the decomposition modulus. Then, the input variables of the subsequences are selected by partial autocorrelation analysis (PACF). Finally, the parameter-optimization-based extreme learning machine (ELM) models are used to predict the subsequences, and the outputs are reconstructed to obtain the final prediction results. The case analysis shows that the VMD-SE-ER-PACF-ELM model has strong prediction ability for DD. The model is then compared with other nonlinear and time series models, and its performance under different prediction periods is also analyzed. The results show that the proposed model is able to adequately describe the original DD. It performs well in both training and testing stages. It is a preferred data-driven model for DD prediction and can provide a priori knowledge for health monitoring of dams.


Introduction
Dams can bring significant socio-economic benefits under safe operating conditions. In case of a dam accident, there will be a huge disaster [1][2][3]. In fact, most dam accidents did not arise suddenly, but went through a process from quantity to quality variation [4]. If we can establish suitable prediction models for dam monitoring data and analyze them in a timely manner, potential problems in the structural behavior of dams will be identified, thus avoiding accidents.
As the controlled indicator of dam safety monitoring, deformation monitoring data can objectively reflect the structural state and the safety condition of dams, which are one of the important bases for assessing the safety of dam projects [5]. During the actual service of a dam, the deformation monitoring data are usually complex nonstationary and nonlinear time series. Therefore, it is an important research topic to accurately predict dam deformation (DD) in the future by using historical subsequences have similar SE values, modal confusion occurs in the decomposition. Therefore, in this paper, we choose the maximum K value that allows a large difference between the SE values of any two subsequences as the final decomposition modulus of the VMD. In addition, there is an error sequence (ER) between the sum of the subsequences obtained by the VMD algorithm and the original sequence. In other words, the sum of the VMFs is not equal to the original sequence [47]. Since the ER contains the true fluctuation characteristics of the original sequence, considering only VMFs will not fully reflect the randomness, which will lead to distortion of the prediction results to some extent. In this paper, the ER of the DD after decomposition is extracted, and because the sequence is approximately noisy, it cannot be modeled directly using the machine learning model to obtain good prediction results. To solve this problem, the authors of paper [47] proposed the random number method for point prediction of ER, but the uncertainty of the results obtained by this method is significant. Therefore, we innovatively propose a VMD-based secondary decomposition for the ER in order to dig deeper into the temporal features embedded in it. The results show that the prediction obtained by considering the ER is closer to the observed values and has more practical engineering significance.
In summary, a hybrid model, namely VMD-SE-ER-PACF-ELM, for the DD prediction is proposed, which makes full use of the advantages of the VMD, SE, PACF, and ELM neural network. First, the VMD-SE model is used to decompose a deformation sequence into K VMFs with good characteristics and an ER. For the ER, the same method is used to obtain a series of subsequences. Secondly, the partial autocorrelation function (PACF) method is used to determine the input variables of the subsequences. Then the ELM models corresponding to each subsequence is trained in the machine-learning process. Finally, the ELMs are applied to the corresponding subsequences, and the sum of the prediction results of each component is the final result of the DD prediction. Meanwhile, comparisons are made between the model and other prediction models, while validating it through different prediction periods.
The rest of this paper is organized as follows: Section 2 briefly introduces the methods mentioned above. In Section 3, the proposed model and performance evaluation indicators are introduced. Then, in Section 4, a case study and discussion of the results are presented, and Section 5 presents the conclusions of the study.

VMD Based Decomposition Method
VMD is a completely new, non-recursive signal decomposition method proposed by Dragomiretskiy et al. in 2014 [32]. It can decompose any signal f (t) into K modal components u k around the center frequency ω k . For the VMD algorithm, the signal decomposition process is solving a variable division problem, which is modeled as follows where {u k } = {u 1 , u 2 , · · · u k } are the K modal components, {ω k } = {ω 1 , ω 2 · · · ω k } are the center frequency of each modal component, f is the original signal, and δ t is the pulse function. In order to obtain the optimal solution of the constrained variational problem, the Lagrange multiplication operator λ(t) and the quadratic penalty factor a are introduced to transform the constrained variational problem into an unconstrained variational problem. The extended Lagrange function is expressed as The optimal solution of Equation (2)  (ω) +λ n (ω)/2 In Equations (3)-(5),û n+1 k (ω),f (ω) andλ n+1 (ω) represent the Fourier transforms corresponding to u n+1 k (ω), f (ω) and λ n+1 (ω) respectively. The constraints for iterative stop are Above all, the specific procedures of VMD are shown in Figure 1.
The optimal solution of Equation (2) is obtained by updating , and with alternating direction operator multiplication algorithm. The iterative equations are as follows In Equations (3) Above all, the specific procedures of VMD are shown in Figure 1.

SE Based Modulus Selection Method
When using the VMD algorithm, the decomposition modulus can be set in advance. By setting reasonable convergence conditions, the computational complexity of the model can be effectively reduced [41]. A new method for determining the modulus K using the SE values of the VMFs is proposed. The concept of SE was introduced by Richman et al. in 2000 [48], and it is used to evaluate the complexity of time series. The higher self-similarity of the time series means the smaller SE value, and vice versa [49]. After the decomposition of the DD time series, a number of subsequences and their corresponding SE values can be obtained. If there are two or more subsequences with similar SE values, it is assumed that over-decomposition occurs, which leads to modal confusion; if the SE values of the VMFs are quite different from each other, the maximum K value that allows this state should be selected as the final decomposition modulus to avoid under-decomposition of the VMD. The specific process of SE is illustrated in Figure 2.
When using the VMD algorithm, the decomposition modulus can be set in advance. By setting reasonable convergence conditions, the computational complexity of the model can be effectively reduced [41]. A new method for determining the modulus K using the SE values of the VMFs is proposed. The concept of SE was introduced by Richman et al. in 2000 [48], and it is used to evaluate the complexity of time series. The higher self-similarity of the time series means the smaller SE value, and vice versa [49]. After the decomposition of the DD time series, a number of subsequences and their corresponding SE values can be obtained. If there are two or more subsequences with similar SE values, it is assumed that over-decomposition occurs, which leads to modal confusion; if the SE values of the VMFs are quite different from each other, the maximum K value that allows this state should be selected as the final decomposition modulus to avoid under-decomposition of the VMD. The specific process of SE is illustrated in Figure 2. In Figure 2, Equations (7)-(9) are as follows where is an integer that represents the length of the comparison vector, is a real number indicating the measure of similarity, ( ) = [ ( ), ( + 1), ⋯ , ( + − 1)] , [ , * ] is defined as [ , * ] = | ( ) − * ( )|, * , ( ) is the element of vector , represents the distance between vectors ( ) and ( ), and the value of is in the range [1, − + 1] .
As shown in Figure 2, the values of and need to be determined before calculating the SE. Typically, values of the embedding dimension are taken as 1 or 2. The selection of a similar tolerance depends largely on the practical application scenario, usually = 0.1 * ~0.25 , where std is the standard deviation of the original data. In the paper, set = 2 and = 0.2 * . In Figure 2, Equations (7)-(9) are as follows where m is an integer that represents the length of the comparison vector, r is a real number indicating the measure of similarity, is the element of vector X, d represents the distance between vectors X(i) and X( j), and the value of j is in the range [1, N − m + 1] . As shown in Figure 2, the values of m and r need to be determined before calculating the SE. Typically, values of the embedding dimension m are taken as 1 or 2. The selection of a similar tolerance r depends largely on the practical application scenario, usually = 0.1 * std ∼ 0.25std, where std is the standard deviation of the original data. In the paper, set m = 2 and r = 0.2 * std.

PACF Based Input Selection Method
Due to the different fluctuation characteristics of the DD at different periods, the correlations of the K VMFs obtained by the above method also vary. Therefore, before predicting each component, we need to analyze the correlation of each VMF, and then the optimal input variables for the ELMs can be selected. Here, the PACF method is used to evaluate the correlations and selection results of the components [50].
Assume that X t is the output variable, and if the lag autocorrelation length value of X t−a falls at the 95% confidence interval − 1.96 √ n , 1.96 √ n for the first time, and there are no obvious outliers after it. At this time, (a − 1)d is selected as the delay time value of the corresponding time series. PACF is described below.
For DD time series, the covarianceγ a at lag a is expressed aŝ where x is the mean value of the time series and M is the largest lag coefficient, a is the lag length of the autocorrelation function, andρ a can be estimated as followŝ ρ a =γ a /γ 0 (11) For PACF at lag a, the f aa is presented as follows where 1 ≤ a ≤ M.

ELM-Based Prediction Model
The ELM model is chosen as the core model for DD prediction in this paper, and its structure is shown in Figure 3. It is a new type of feedforward neural network [22]. Compared with the traditional single hidden layer feedforward neural networks (SLFNs), ELM has some significant advantages, such as fast training speed, good generalization ability, and few adjustable parameters. Its specific principles are as follows

PACF Based Input Selection Method
Due to the different fluctuation characteristics of the DD at different periods, the correlations of the K VMFs obtained by the above method also vary. Therefore, before predicting each component, we need to analyze the correlation of each VMF, and then the optimal input variables for the ELMs can be selected. Here, the PACF method is used to evaluate the correlations and selection results of the components [50].
Assume that is the output variable, and if the lag autocorrelation length value of falls at the 95% confidence interval [− . √ , . √ ] for the first time, and there are no obvious outliers after it. At this time, ( − 1) is selected as the delay time value of the corresponding time series. PACF is described below.
For DD time series, the covariance at lag is expressed as where ̅ is the mean value of the time series and is the largest lag coefficient, is the lag length of the autocorrelation function, and can be estimated as follows For PACF at lag , the is presented as follows where 1 .

ELM-Based Prediction Model
The ELM model is chosen as the core model for DD prediction in this paper, and its structure is shown in Figure 3. It is a new type of feedforward neural network [22]. Compared with the traditional single hidden layer feedforward neural networks (SLFNs), ELM has some significant advantages, such as fast training speed, good generalization ability, and few adjustable parameters. Its specific principles are as follows Assume that the activation function of the hidden layer neurons is ( ) , the output vector of the network is as follows Given a Q-group sample (x n , t m ), where x n = x n1 , x n2 , . . . , x nQ T and t m = t m1 , t m2 , . . . , t mQ T . Assume that the activation function of the hidden layer neurons is g(x), the output vector of the network is as follows where b i denotes the threshold of the ith neural node of the hidden layer, ω i = [ω 1i , ω 2i , . . . , ω mi ] T is the weight of the ith neural node, and β i = [β 1i , β, . . . , β mi ] T is the weight of the ith neural node. Equation (13) can be described as Equation (15) where H is the hidden layer output matrix, which can be represented by Equation (16) H ω 1 , ω 2 , . . . , According to the ELM theorem, if l = Q, for arbitrary ω and b, Equation (17) can be obtained where y j = y 1 j , y 2 j , · · · , y mj T ( j = 1, 2, · · · , Q). When Q is large, l is usually less than Q in order to reduce the amount of computation, which means that the ELM training error can be approximated to an arbitrary number ε > 0.
The connection weight β can be obtained as follows Its solution can be expressed as Equation (20) where H + is the generalized inverse Moore-Penrose matrix of the H matrix.
In summary, the ELM model does not require iterative corrections to weights and thresholds, and it outperforms conventional SLFNs.

The Hybrid DD Prediction Model
In this section, we develop a new hybrid model for DD prediction based on VMD-SE-ER-PACF-ELM. The detailed steps of the proposed model are shown in Figure 4.

Performance Evaluation Indicators
In this paper, three evaluation indicators and the Taylor diagram are presented to evaluate the performance of the proposed model.

1.
Root mean square error (RMSE) RMSE is used to characterize the overall prediction precision.
Appl. Sci. 2020, 10, 5700 where y p(i) is the predicted value, y (i) is the measured value, n is the length of testing set, and RMSE ∈ (0, +∞). The prediction accuracy of the model is inversely proportional to the value of RMSE; 2.
Average absolute error (MAE) The MAE visually represents the loss value of the prediction results.
The magnitude of the MAE value has the same relationship with prediction accuracy as the RMSE; 3.
Determination coefficient (R 2 ). The R 2 can be used to describe the correlation between two or more variables. If the correlation between the predictions and the original sequence is poor, the model is unreliable even if the values of RMSE and MAE are small.
where R 2 ∈ (0, +∞), and the strength of the correlation is proportional to the value of R 2 ; 4.
Taylor Diagram A Taylor diagram can provide a visual framework for comparing the prediction results of a model to a reference model. It can represent the relevant information of various prediction models in a centralized way, which fully and clearly reflects the prediction capabilities of different models. Taylor diagrams have been widely adopted in recent years as an effective method for assessing the predictive ability of different models.

Performance Evaluation Indicators
In this paper, three evaluation indicators and the Taylor diagram are presented to evaluate the performance of the proposed model.

Description of the Dam Project and Datasets
A Roller Compacted Concrete (RCC) gravity dam is located in southeast China. The dam crest elevation is at 634.40 m and the maximum height of the dam is 72.40 m. The length and width of the dam crest are 206.00 and 7.50 m, respectively. The upstream face of the dam body is vertical, and the downstream face of the retaining dam section has a dam slope of 1:0.72. The overflow dam section is located in the middle of the dam, and the weir crest elevation is at 621.00 m, with three overflow surface holes and three 12 × 12 m curved gates set ( Figure 5).
A Taylor diagram can provide a visual framework for comparing the prediction results of a model to a reference model. It can represent the relevant information of various prediction models in a centralized way, which fully and clearly reflects the prediction capabilities of different models. Taylor diagrams have been widely adopted in recent years as an effective method for assessing the predictive ability of different models.

Description of the Dam Project and Datasets
A Roller Compacted Concrete (RCC) gravity dam is located in southeast China. The dam crest elevation is at 634.40 m and the maximum height of the dam is 72.40 m. The length and width of the dam crest are 206.00 and 7.50 m, respectively. The upstream face of the dam body is vertical, and the downstream face of the retaining dam section has a dam slope of 1:0.72. The overflow dam section is located in the middle of the dam, and the weir crest elevation is at 621.00 m, with three overflow surface holes and three 12 × 12 m curved gates set ( Figure 5). This paper mainly studies the horizontal displacement of the top of the RCC gravity dam. A gravity dam has the characteristic of ensuring a stable state by the own gravity of the dam and cohesion with foundation. Therefore, dam failures are often the results of overtopping and penetration cracks in a section of the dam, which can be directly identified by the horizontal displacement. The displacement at the top of the dam is the most obvious and reflective of the deformation of the whole dam section, so the horizontal monitoring system of gravity dams is usually arranged at the top of the dam. Therefore, in this study, the horizontal displacement at the top of the dam is modeled and analyzed in order to verify the correctness of the proposed model.
The horizontal displacement of the dam is measured by the tension wire alignment system, with one measurement point arranged in each dam section ( Figure 5). The measurement point of the monitoring system is shown in Figure 6. For gravity dams, the middle dam section is generally the most important section of the dam and is often studied as a typical dam section, so the monitoring This paper mainly studies the horizontal displacement of the top of the RCC gravity dam. A gravity dam has the characteristic of ensuring a stable state by the own gravity of the dam and cohesion with foundation. Therefore, dam failures are often the results of overtopping and penetration cracks in a section of the dam, which can be directly identified by the horizontal displacement. The displacement at the top of the dam is the most obvious and reflective of the deformation of the whole dam section, so the horizontal monitoring system of gravity dams is usually arranged at the top of the dam. Therefore, in this study, the horizontal displacement at the top of the dam is modeled and analyzed in order to verify the correctness of the proposed model.
The horizontal displacement of the dam is measured by the tension wire alignment system, with one measurement point arranged in each dam section ( Figure 5). The measurement point of the monitoring system is shown in Figure 6. For gravity dams, the middle dam section is generally the most important section of the dam and is often studied as a typical dam section, so the monitoring data related to horizontal displacements at the EX5 measurement point on the 4th dam section are selected for analysis (to the left bank is positive, to the right bank is negative).
For the horizontal displacement monitoring data at EX5, 739 daily observations from 6 June 2016 to 22 October 2018 were selected as the research object, and the missing values and outliers during this period were pre-processed. The monitoring data were divided into training set F (1st-709th measurements) and testing set T (710th-739th measurements). The processed data series is shown in Figure 7.

Data Decomposition
As can be seen in Figure 4, prior to VMD decomposition, the decomposition modulus should first be determined so that the characteristics of the original data can be fully extracted, which is essential for the application of VMD. We propose a new method for determining the optimal K value by the variation in the SE values of the subsequences. The SE values of VMF subsequences at different K values and their curves are shown in Table 1 and Figure 8, respectively.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 25 data related to horizontal displacements at the EX5 measurement point on the 4th dam section are selected for analysis (to the left bank is positive, to the right bank is negative). For the horizontal displacement monitoring data at EX5, 739 daily observations from 6 June 2016 to 22 October 2018 were selected as the research object, and the missing values and outliers during this period were pre-processed. The monitoring data were divided into training set F (1st-709th measurements) and testing set T (710th-739th measurements). The processed data series is shown in Figure 7.

Data Decomposition
As can be seen in Figure 4, prior to VMD decomposition, the decomposition modulus should first be determined so that the characteristics of the original data can be fully extracted, which is essential for the application of VMD. We propose a new method for determining the optimal K value by the variation in the SE values of the subsequences. The SE values of VMF subsequences at different K values and their curves are shown in Table 1 and Figure 8, respectively. selected for analysis (to the left bank is positive, to the right bank is negative). For the horizontal displacement monitoring data at EX5, 739 daily observations from 6 June 2016 to 22 October 2018 were selected as the research object, and the missing values and outliers during this period were pre-processed. The monitoring data were divided into training set F (1st-709th measurements) and testing set T (710th-739th measurements). The processed data series is shown in Figure 7.

Data Decomposition
As can be seen in Figure 4, prior to VMD decomposition, the decomposition modulus should first be determined so that the characteristics of the original data can be fully extracted, which is essential for the application of VMD. We propose a new method for determining the optimal K value by the variation in the SE values of the subsequences. The SE values of VMF subsequences at different K values and their curves are shown in Table 1    Obviously, when K = 5, the SE values for VMFs show a monotonically increasing trend and the difference between the SE values of the two subsequence is relatively large. When the K value increases to 6-8, it can be seen from Figure 5 that the SE curves corresponding to the above values (K = 6-8) have similar results at both VMF 3 and VMF 5 points, which will lead to modal confusion, implying that the VMD is over-decomposed. Meanwhile, the decomposition will be insufficient if the value of K decreases. Therefore, in this case, K = 5 is the best value for VMD decomposition, and the results are shown in Figure 9. The built-in parameters of the VMD algorithm are shown in Table 2. Obviously, when K = 5, the SE values for VMFs show a monotonically increasing trend and the difference between the SE values of the two subsequence is relatively large. When the K value increases to 6-8, it can be seen from Figure 5 that the SE curves corresponding to the above values (K = 6-8) have similar results at both VMF3 and VMF5 points, which will lead to modal confusion, implying that the VMD is over-decomposed. Meanwhile, the decomposition will be insufficient if the value of K decreases. Therefore, in this case, K = 5 is the best value for VMD decomposition, and the results are shown in Figure 9. The built-in parameters of the VMD algorithm are shown in Table 2.

Input Selection by PACF
After decomposing the DD sequence in the previous section to obtain the five VMFs, it is necessary to establish ELM prediction models for these five subsequences separately. Prior to this, input variables of ELMs should be determined with the PACF method according to Section 2.3. Figure 10 shows the corresponding PACF results for each VMF, and the best input variables for each VMF are shown in Table 3.

Input Selection by PACF
After decomposing the DD sequence in the previous section to obtain the five VMFs, it is necessary to establish ELM prediction models for these five subsequences separately. Prior to this, input variables of ELMs should be determined with the PACF method according to Section 2.3. Figure  10 shows the corresponding PACF results for each VMF, and the best input variables for each VMF are shown in Table 3.

Series
Numbers of Input Input Variables Take VMF1 as an example to illustrate how to select the input variables through the PACF results in Figure 10. For VMF 1 , the PACF value falls at the 95% confidence interval when the lag is 6d, so the input variables corresponding to VMF 1 are the five values between (t − 5)d and (t − 1)d, and the output variable is the value of td. In order to guide the operation of the dam more comprehensively, the proposed model is also analyzed for different prediction periods (td, (t + 3)d and (t + 6)d) in this study. Taking VMF 1 as an example, the specific process of implementation is illustrated in Figure 11. Take VMF1 as an example to illustrate how to select the input variables through the PACF results in Figure 10. For VMF1, the PACF value falls at the 95% confidence interval when the lag is 6d, so the input variables corresponding to VMF1 are the five values between (t − 5)d and (t − 1)d, and the output variable is the value of td. In order to guide the operation of the dam more comprehensively, the proposed model is also analyzed for different prediction periods (td, (t + 3)d and (t + 6)d) in this study. Taking VMF1 as an example, the specific process of implementation is illustrated in Figure 11.

Determination of the ELM Structure Frameworks
After determining the input variables of the VMFs, we need to build an ELM-based, data-driven prediction model for each subsequence. Due to the stochastic nature of the ELM models, the ELM prediction results in this paper are based on the average of ten calculations with the removal of four outliers. During the prediction process, a corresponding ELM model is established for each subsequence, so that the number of the ELMs and the decomposition modulus are consistent. The output of each ELM represents the predicted results of the corresponding VMF, and the final prediction result is obtained by summing the predictions of all ELMs.
Since the prediction accuracy of ELMs will directly affect the prediction accuracy of the model, the selection of appropriate ELM parameters is crucial. The ELM model has two adjustable parameters, ( ) and , mentioned in Section 2.4. Sigmoid function is chosen as the ( ) in this study. Meanwhile, the performance of an ELM model also depends on , which can be obtained by Equation (24) as the ELM belongs to SLFNs.
In this study, and represent the number of input and output variables determined in Section 4.3, respectively. Since the original displacement sequence is decomposed into five VMFs, correspondingly, five ELM prediction models will be constructed. Equation (24) has given the method of determining the number of hidden layer neurons, and thus the structural framework of each ELM can be determined, expressed as ' − − '. Take VMF1 as an example to illustrate the

Determination of the ELM Structure Frameworks
After determining the input variables of the VMFs, we need to build an ELM-based, data-driven prediction model for each subsequence. Due to the stochastic nature of the ELM models, the ELM prediction results in this paper are based on the average of ten calculations with the removal of four outliers. During the prediction process, a corresponding ELM model is established for each subsequence, so that the number of the ELMs and the decomposition modulus are consistent. The output of each ELM represents the predicted results of the corresponding VMF, and the final prediction result is obtained by summing the predictions of all ELMs.
Since the prediction accuracy of ELMs will directly affect the prediction accuracy of the model, the selection of appropriate ELM parameters is crucial. The ELM model has two adjustable parameters, g(x) and l, mentioned in Section 2.4. Sigmoid function is chosen as the g(x) in this study. Meanwhile, the performance of an ELM model also depends on l, which can be obtained by Equation (24) as the ELM belongs to SLFNs. l = round √ n + m + rand( 1 ∼ 10) In this study, n and m represent the number of input and output variables determined in Section 4.3, respectively. Since the original displacement sequence is decomposed into five VMFs, correspondingly, five ELM prediction models will be constructed. Equation (24) has given the method of determining the number of hidden layer neurons, and thus the structural framework of each ELM can be determined, expressed as 'n − l − m'. Take VMF 1 as an example to illustrate the determination process of the ELM structural framework: the initial ELM structural framework of VMF1 is '5-l-1'. According to Equation (24), the best l of the model is located at interval [3,12] or its neighboring values, and ten operations are performed for different hidden layer neurons to calculate each performance indicators. The results show that when l = 10, the learning performance of the ELM is the best, so that the structure framework of the ELM model corresponding to VMF1 can be determined as '5-10-1'. The ELM structure frameworks of VMF2-VMF5 can be obtained by the same method, as shown in Table 4.

Construction of the VMD-SE-PACF-ELM Model
The ELM models with different structural frameworks are used to train and predict the VMF components separately, and then overlay the prediction results of each component to complete the construction of the VMD-SE-PACF-ELM model. During the prediction process, the result of each step are used to predict the next value until the 30th prediction result is obtained (Figure 11). The training and prediction results for each VMF are shown in Figures 12 and 13, respectively. The yellow areas in Figure 13 represent the absolute residuals of the prediction results.
It can be seen from Figures 12 and 13 that the VMF 1 , VMF 2 and VMF 3 components have good fitting capability and low prediction error. In comparison, the fitting and prediction results for VMF4 and VMF5 are somewhat different from the actual values near the curve inflection points, but the overall results are consistent with the trend of the actual values. Therefore, it can be initially concluded that the use of the VMD algorithm can effectively decompose the DD fluctuation information, thereby improving the prediction performance of the model. Here we use RMSE, MAE and R 2 to quantify the performance of the training and testing phases of ELMs, as shown in Table 5. Obviously, ELM VMF1 has the smallest RMSE and MAE values and the largest R 2 value in the training and testing phases, followed by ELM VMF2 and ELM VMF3 , and these three sub-models all show strong performance. ELM VMF4 and ELM VMF5 have relatively large errors during the training and testing phases, but their predictive accuracy is still at a high level so it does not affect the overall performance of the model.

Construction of the VMD-SE-PACF-ELM Model
The ELM models with different structural frameworks are used to train and predict the VMF components separately, and then overlay the prediction results of each component to complete the construction of the VMD-SE-PACF-ELM model. During the prediction process, the result of each step are used to predict the next value until the 30th prediction result is obtained (Figure 11). The training and prediction results for each VMF are shown in Figures 12 and 13, respectively. The yellow areas in Figure 13 represent the absolute residuals of the prediction results.     It can be seen from Figures 12 and 13 that the VMF1, VMF2 and VMF3 components have good fitting capability and low prediction error. In comparison, the fitting and prediction results for VMF4 and VMF5 are somewhat different from the actual values near the curve inflection points, but the overall results are consistent with the trend of the actual values. Therefore, it can be initially concluded that the use of the VMD algorithm can effectively decompose the DD fluctuation information, thereby improving the prediction performance of the model. Here we use RMSE, MAE and R 2 to quantify the performance of the training and testing phases of ELMs, as shown in Table 5. Obviously, ELMVMF1 has the smallest RMSE and MAE values and the largest R 2 value in the training and testing phases, followed by ELMVMF2 and ELMVMF3, and these three sub-models all show strong performance. ELMVMF4 and ELMVMF5 have relatively large errors during the training and testing phases, but their predictive accuracy is still at a high level so it does not affect the overall performance of the model.   To further illustrate the necessity of using the VMD decomposition algorithm in the prediction process of the ELM model, Figure 14 shows the prediction results and evaluation indicators before and after VMD optimization. In Figure 14, 'VMD Optimized ELM' represents the VMD-SE-PACF-ELM model, and 'PACF-ELM' represents the ELM model based on PACF to determine input variables. The left side of the figure is a bar graph of the performance evaluation indicators (RMSE, MAE and R 2 ) of the two models, and the right side is the graph of the predicted results. Obviously, the deformation prediction result of the ELM model optimized based on the VMD algorithm has smaller RMSE and MAE as well as a larger R 2 compared to the single ELM model. Combined with the prediction curves, the following conclusions can be obtained: if the ELM model is used directly to predict the deformation time series with strong fluctuation characteristics, it is not possible to get accurate prediction results; however, the prediction results of the VMD-optimized ELM model are closer to the real values of the DD, which means that the VMD algorithm can decompose the original deformation series into several subsequences with good deformation characteristics. The prediction performance of the model will be greatly improved by modeling and predicting each subsequence.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 17 of 25 variables. The left side of the figure is a bar graph of the performance evaluation indicators (RMSE, MAE and R 2 ) of the two models, and the right side is the graph of the predicted results. Obviously, the deformation prediction result of the ELM model optimized based on the VMD algorithm has smaller RMSE and MAE as well as a larger R 2 compared to the single ELM model. Combined with the prediction curves, the following conclusions can be obtained: if the ELM model is used directly to predict the deformation time series with strong fluctuation characteristics, it is not possible to get accurate prediction results; however, the prediction results of the VMD-optimized ELM model are closer to the real values of the DD, which means that the VMD algorithm can decompose the original deformation series into several subsequences with good deformation characteristics. The prediction performance of the model will be greatly improved by modeling and predicting each subsequence.

Prediction Results Considering the ER
Although the VMD-SE-PACF-ELM model greatly improves the prediction performance, its prediction results fail to show the fluctuation characteristics of the DD well. Considering that the ER obtained after VMD decomposition contains some of the fluctuation characteristics of the original sequence, it is necessary to extract the sequence and to explore and analyze the deformation features embedded in it. Due to the strong nonlinearity and nonstationarity of the ER, direct modeling and prediction of it with the ELM model will lead to the prediction results being messy and impractical. Therefore, we propose to perform a secondary decomposition of the ER by the VMD algorithm to obtain subsequences with relatively stable deformation characteristics, which are recorded as ER-VMFs. Just like the original sequence, the decomposition modulus of the ER is first determined by the SE method. By analysis, the optimal decomposition modulus of ER is 2, and the obtained subsequences are denoted as ER-VMF1 and ER-VMF2. PACF analysis is then performed for each ER-VMF to determine the input variables for the prediction model. The specific analysis process of the ER is shown in Figure 15.

Prediction Results Considering the ER
Although the VMD-SE-PACF-ELM model greatly improves the prediction performance, its prediction results fail to show the fluctuation characteristics of the DD well. Considering that the ER obtained after VMD decomposition contains some of the fluctuation characteristics of the original sequence, it is necessary to extract the sequence and to explore and analyze the deformation features embedded in it. Due to the strong nonlinearity and nonstationarity of the ER, direct modeling and prediction of it with the ELM model will lead to the prediction results being messy and impractical. Therefore, we propose to perform a secondary decomposition of the ER by the VMD algorithm to obtain subsequences with relatively stable deformation characteristics, which are recorded as ER-VMFs. Just like the original sequence, the decomposition modulus of the ER is first determined by the SE method. By analysis, the optimal decomposition modulus of ER is 2, and the obtained subsequences are denoted as ER-VMF 1 and ER-VMF 2 . PACF analysis is then performed for each ER-VMF to determine the input variables for the prediction model. The specific analysis process of the ER is shown in Figure 15.
As can be seen in Figure 15, the VMD-decomposed ER is able to achieve better prediction results. The prediction results of the proposed model can be obtained by summing it up with the prediction results of the VMD-SE-PACF-ELM model. The prediction results and evaluation indicators of the models before and after considering ER are shown in Figure 16. The left side of the figure is a bar graph of the performance evaluation indicators (RMSE, MAE and R 2 ), and the right side is a graph of the predicted results.
From Figure 16, the following conclusion can be drawn: the prediction results of the model without considering ER can reflect the overall trend of the time series, but cannot well capture the fluctuation characteristics of the original data. Through the analysis of ER, the prediction results of the hybrid model not only become more accurate, but also better reflect the fluctuation characteristics of the DD. Therefore, the analysis of ER has practical engineering significance. Appl. Sci. 2020, 10, x FOR PEER REVIEW 18 of 25 Figure 15. Analysis process of the ER.
As can be seen in Figure 15, the VMD-decomposed ER is able to achieve better prediction results. The prediction results of the proposed model can be obtained by summing it up with the prediction results of the VMD-SE-PACF-ELM model. The prediction results and evaluation indicators of the models before and after considering ER are shown in Figure 16. The left side of the figure is a bar graph of the performance evaluation indicators (RMSE, MAE and R 2 ), and the right side is a graph of the predicted results.
From Figure 16, the following conclusion can be drawn: the prediction results of the model without considering ER can reflect the overall trend of the time series, but cannot well capture the fluctuation characteristics of the original data. Through the analysis of ER, the prediction results of the hybrid model not only become more accurate, but also better reflect the fluctuation characteristics of the DD. Therefore, the analysis of ER has practical engineering significance. Figure 16. Comparison of predicted results before and after considering ER.

Comparison of sum of VMFs to the observed values
The ER of VMD decomposition

Establishment of the ELM models for ER-VMF1
and ER-VMF2 to obtain prediction results

ER-VMF1
ER-VMF2  As can be seen in Figure 15, the VMD-decomposed ER is able to achieve better prediction results. The prediction results of the proposed model can be obtained by summing it up with the prediction results of the VMD-SE-PACF-ELM model. The prediction results and evaluation indicators of the models before and after considering ER are shown in Figure 16. The left side of the figure is a bar graph of the performance evaluation indicators (RMSE, MAE and R 2 ), and the right side is a graph of the predicted results.
From Figure 16, the following conclusion can be drawn: the prediction results of the model without considering ER can reflect the overall trend of the time series, but cannot well capture the fluctuation characteristics of the original data. Through the analysis of ER, the prediction results of the hybrid model not only become more accurate, but also better reflect the fluctuation characteristics of the DD. Therefore, the analysis of ER has practical engineering significance. Figure 16. Comparison of predicted results before and after considering ER.

Establishment of the ELM models for ER-VMF1
and ER-VMF2 to obtain prediction results

ER-VMF1
ER-VMF2 Figure 16. Comparison of predicted results before and after considering ER.

Comparison with Other Benchmark Models
To further verify the superiority of the VMD-SE-ER-PACF-ELM model, its performance is compared with that of some benchmark models in this section. In addition to PACF-ELM and VMD-SE-PACF-ELM prediction models, three other DD prediction models, namely EMD-PACF-ELM, hydrostatic-seasonal-time (HST)-ELM and Arima models, are established. The EMD-PACF-ELM model can be used to verify that the VMD algorithm outperforms the EMD algorithm in DD prediction; the HST-ELM model can validate that the performance of the DD prediction model based on VMD decomposition is better than the ELM model based on statistical optimization, and the Arima model can demonstrate that the proposed model outperforms traditional time series prediction method in terms of prediction accuracy. The prediction results and performance evaluation indicators of each model are shown in Figure 17, where the bar graph of the performance evaluation indicators of each model is shown on the left and the curve of the prediction results is shown on the right. model is significantly better. Another important conclusion of Figure 17 is that the prediction results of the models combined with the signal decomposition methods (EMD, VMD) always outperform the other models. Meanwhile, the VMD-SE-ER-PACF-ELM model outperforms the EMD-PACF-ELM model, which indicates that the VMD algorithm used in this study to pre-process the deformation sequence is better than the EMD algorithm. Bar and curve graphs can provide a visual assessment of the prediction ability of the models and the correspondence between the observations and the model predictions. However, performance evaluation indicators can more accurately quantify the predicted performance of each model. Table 6 shows the performance evaluation indicators for the six models including PACF-ELM and VMD-SE-PACF-ELM.    As can be seen in Figure 17, both the evaluation indicators and the prediction curves show that Arima has the worst performance. In contrast to Arima, the prediction performance of the HST-ELM model is significantly better. Another important conclusion of Figure 17 is that the prediction results of the models combined with the signal decomposition methods (EMD, VMD) always outperform the other models. Meanwhile, the VMD-SE-ER-PACF-ELM model outperforms the EMD-PACF-ELM model, which indicates that the VMD algorithm used in this study to pre-process the deformation sequence is better than the EMD algorithm. Bar and curve graphs can provide a visual assessment of the prediction ability of the models and the correspondence between the observations and the model predictions. However, performance evaluation indicators can more accurately quantify the predicted performance of each model. Table 6 shows the performance evaluation indicators for the six models including PACF-ELM and VMD-SE-PACF-ELM. Table 6. Evaluation indicators of different prediction models. Compared to other models, the R 2 value of the proposed model in this paper has increased by 2.18%, 6.12%, 10.75%, 29.38% and 42.39%, respectively. Through calculation and comparison of the performance evaluation indicators of various models, we find that the VMD-SE-ER-PACF-ELM model is superior to all the other models in the performance of DD prediction.

RMSE
Furthermore, Figure 18 shows the Taylor diagram of the prediction performance of each model. It is clear that Arima has the worst prediction performance and HST-ELM outperforms the traditional time series prediction method. In addition, the prediction performance of the ELM model optimized based on the decomposition algorithm is generally better than that of the model not optimized by this method, which indicates that the use of the decomposition algorithm to pre-process original sequences is an efficient method to optimize the prediction of DD. Among the different decomposition algorithms, VMD outperforms EMD and the consideration of the ER enables prediction results closer to the true values, which are consistent with the conclusions in Figure 17 and Table 6. respectively. Through calculation and comparison of the performance evaluation indicators of various models, we find that the VMD-SE-ER-PACF-ELM model is superior to all the other models in the performance of DD prediction. Furthermore, Figure 18 shows the Taylor diagram of the prediction performance of each model. It is clear that Arima has the worst prediction performance and HST-ELM outperforms the traditional time series prediction method. In addition, the prediction performance of the ELM model optimized based on the decomposition algorithm is generally better than that of the model not optimized by this method, which indicates that the use of the decomposition algorithm to pre-process original sequences is an efficient method to optimize the prediction of DD. Among the different decomposition algorithms, VMD outperforms EMD and the consideration of the ER enables prediction results closer to the true values, which are consistent with the conclusions in Figure 17 and Table 6.

Performance of Different Prediction Periods Based on the Proposed Model
In summary, the proposed model has the smallest RMSE and MAE values and the largest R 2 value, and the prediction performance of the model is satisfactory. However, in the DD prediction, in addition to the prediction accuracy, the length of prediction period is also important to guide the normal and stable operation of a dam. In this section, we mainly discuss the influence of different prediction periods (1-, 4-and 7-day ahead) on the prediction performance of the proposed model. The specific implementation process for different periods is shown in Figure 11. The prediction results and performance evaluation indicators for each period are shown in Figure 19, where the bar graph of performance evaluation indicators for each model is shown on the left and the curves of prediction results is shown on the right. It is clear from Figure 19 that the VMD-SE-ER-PACF-ELM model is the most effective when the prediction period is one day, followed by the next best when the prediction period is four days, and the worst when the prediction period is seven days.
To further compare the impact of prediction periods on the results, Table 7

Performance of Different Prediction Periods Based on the Proposed Model
In summary, the proposed model has the smallest RMSE and MAE values and the largest R 2 value, and the prediction performance of the model is satisfactory. However, in the DD prediction, in addition to the prediction accuracy, the length of prediction period is also important to guide the normal and stable operation of a dam. In this section, we mainly discuss the influence of different prediction periods (1-, 4-and 7-day ahead) on the prediction performance of the proposed model. The specific implementation process for different periods is shown in Figure 11. The prediction results and performance evaluation indicators for each period are shown in Figure 19, where the bar graph of performance evaluation indicators for each model is shown on the left and the curves of prediction results is shown on the right. It is clear from Figure 19 that the VMD-SE-ER-PACF-ELM model is the most effective when the prediction period is one day, followed by the next best when the prediction period is four days, and the worst when the prediction period is seven days. ). Compared to other two periods, the R 2 value of one-day prediction period has increased by 9.62% and 26.47%, respectively. By calculating and comparing the performance evaluation indicators for each projection period, it can be concluded that the longer the prediction period, the worse the overall prediction performance of the model.   To further compare the impact of prediction periods on the results, Table 7 presents the performance evaluation indicators of the predicted results for the corresponding periods. Their RMSE values are: 0.2755 (1d), 0.4198 (4d) and 0.4547 (7d). The RMSE values of one-day prediction period are reduced by 34.37% and 39.41%, respectively, compared to the other two periods. Meanwhile, the values of MAE are: 0.2087 (1d), 0.3140 (4d) and 0.3063 (7d). The MAE values of one-day prediction period are reduced by 33.54% and 31.86%, respectively, compared to the other two periods. The above results are consistent with the relatively high R 2 values (0.8912 for 1d, 0.8130 for 4d, and 0.7047 for 7d). Compared to other two periods, the R 2 value of one-day prediction period has increased by 9.62% and 26.47%, respectively. By calculating and comparing the performance evaluation indicators for each projection period, it can be concluded that the longer the prediction period, the worse the overall prediction performance of the model. In addition, Figure 20 shows the Taylor diagram of prediction performance for different periods. Obviously, the model has the best prediction performance when the period is one day. When the period is seven days, the prediction result is the farthest from the true value and the performance is the worst. They are consistent with the conclusions drawn in Figure 19 and Table 7. Therefore, when using the VMD-SE-ER-PACF-ELM model to predict the DD in actual projects, the prediction period should be shortened as much as possible in order to obtain higher prediction performance.
Meanwhile, the values of MAE are: 0.2087 (1d), 0.3140 (4d) and 0.3063 (7d). The MAE values of oneday prediction period are reduced by 33.54% and 31.86%, respectively, compared to the other two periods. The above results are consistent with the relatively high R 2 values (0.8912 for 1d, 0.8130 for 4d, and 0.7047 for 7d). Compared to other two periods, the R 2 value of one-day prediction period has increased by 9.62% and 26.47%, respectively. By calculating and comparing the performance evaluation indicators for each projection period, it can be concluded that the longer the prediction period, the worse the overall prediction performance of the model.   In addition, Figure 20 shows the Taylor diagram of prediction performance for different periods. Obviously, the model has the best prediction performance when the period is one day. When the period is seven days, the prediction result is the farthest from the true value and the performance is the worst. They are consistent with the conclusions drawn in Figure 19 and Table 7. Therefore, when using the VMD-SE-ER-PACF-ELM model to predict the DD in actual projects, the prediction period should be shortened as much as possible in order to obtain higher prediction performance.

Conclusions
In order to improve the prediction performance of nonstationary DD, a novel hybrid model based on the decomposition-composition framework is proposed, namely VMD-SE-ER-PACF-ELM. The details are as follows: (1) The VMD algorithm is used to decompose an original deformation sequence into a number of subsequences with good characteristics to improve the prediction performance. (2) The SE method is used to quantify the complexity of each subsequence in order to select the appropriate decomposition modulus. (3) The secondary VMD decomposition of the ER enables the prediction value to be closer to the actual deformation characteristics. (4) The PACF method is used to analyze the characteristics of each subsequence to extract the input variables. (5) The ELM models are used to predict the subsequences and their combination is the final prediction result. Meanwhile, the prediction performance of the VMD-SE-ER-PACF-ELM model is compared with those of Arima, PACF-ELM, HST-ELM, VMD-PACF-ELM, and VMD-SE-PACF-ELM models with RMSE, MAE, R 2 , and Taylor diagrams as performance evaluation indicators.
The results show that the proposed VMD-SE-ER-PACF-ELM model has the best prediction performance among all prediction models. It can effectively predict nonstationary and nonlinear time series and significantly improve the prediction accuracy. In addition, we have also analyzed its performance under different prediction periods. The results show that the length of the prediction period is inversely proportional to the prediction accuracy, which can provide a priori knowledge for the normal operation and early warning of the dam projects.
Although the proposed method has good prediction results, there are still certain prediction errors. In this paper, the uncontrolled errors due to the randomness of the ELM model is the main cause of the prediction errors, which needs to be addressed in our future research. In addition, the idea of spatial relationships between the measurement points [51] will be introduced into the prediction to analyze the overall deformation effect of dams.