1. Introduction
In the long-term operation and maintenance of concrete dams, their structural performance will be gradually degraded by the combined effects of multiple internal and external factors. As a significant indicator of structural performance degradation, deformation monitoring of dams is essential to ensure their structural integrity and operational safety. Therefore, accurate prediction of the deformation behavior of concrete dams is a key measure to maintain the safe operation and maintenance of dams [
1,
2]. The noise and nonlinear characteristics in the monitoring data have a significant impact on the modeling accuracy. Although the traditional statistical model is widely used in engineering because of its simple model and efficient calculation, it has limitations in dealing with problems such as multicollinearity. Therefore, more advanced machine learning techniques should be used for optimization [
3]. In recent years, with the rapid development of artificial intelligence technology, a large number of machine learning algorithms such as support vector machines (SVM) [
4,
5,
6,
7], artificial neural networks (ANN) [
8], extreme learning machines (ELM) [
9,
10,
11], recurrent neural networks (RNN) [
12,
13,
14,
15,
16], random forest (RF) [
17,
18] and other technologies have been recognized for their powerful data-driven modeling capabilities and processing capabilities for complex nonlinear systems related to dam deformation prediction. These methods improve the accuracy and robustness of the prediction model by dealing with the deep nonlinear dependence between the dam influence factor and the deformation. Su et al. [
4] proposed a dam deformation prediction model to identify the significant nonlinear dynamic characteristics of dam deformation by combining support vector machine (SVM) with phase space reconstruction, wavelet analysis and particle swarm optimization (PSO). Compared with the traditional model, the model shows superior ability in explaining complex nonlinear relationships. Lin et al. [
19] proposed a multi-step displacement model prediction algorithm for concrete dams by combining fully integrated CEEMDAN with the K-adjusted harmonic mean (KHM) algorithm and extreme learning machine (ELM). The algorithm uses CEEMDAN to decompose the dam displacement sequence into different signals, uses KHM clustering to group the denoising data with similar features, and uses the sparrow search algorithm (SSA) to improve the KHM algorithm to avoid falling into local optimum. The engineering example shows that the model has good prediction performance and strong robustness, which proves the feasibility of applying the model to multi-step prediction of dam displacement. Xu et al. [
20] proposed a combined prediction model of concrete arch dam displacement by combining clustering analysis with long short-term memory (LSTM), CEEMDAN, least squares support vector machine (LSSVM), and PSO, which is used for signal residual correction of concrete arch dams. By mining the effective information in the residual sequence, the combined model has better generalization and robustness than the traditional single model. Tang et al. [
21] proposed a CEEMDAN-SSA-CNN-GRU dam deformation prediction model. The model uses CEEMDAN fusion to decompose the noise and uses SSA to further extract and reconstruct the high-frequency intrinsic mode function (IMF) to obtain components with enhanced noise reduction effect. However, the model does not comprehensively analyze the correlation of each IMF from multiple perspectives, and there are some deficiencies in dealing with nonlinear fluctuations caused by unstable loads. Cao et al. [
22] proposed a VMD-SE-ER-PACF-ELM hybrid model based on the decomposition ensemble method to deal with the fluctuation characteristics of dam deformation so as to obtain more accurate prediction results. Although the model considers the correlation between IMF components, it shows some limitations in decomposing time series and dealing with high-dimensional nonlinear correlation. Jiang et al. [
23] proposed a displacement prediction model of a concrete arch dam based on isolated forest (IF) and kernel extreme learning machine (KELM). The model uses IF to eliminate outliers and uses the robust nonlinear fitting ability of KELM to construct the model. However, it mainly solves the identification of outliers and the processing of significant nonlinear fluctuations. Zhou et al. [
24] proposed a dam deformation prediction model based on the CEEMDAN-PSR-KELM framework. The model uses CEEMDAN to decompose the deformation sequence and then reconstructs the phase space of each sequence to establish a KELM prediction model for these reconstructed sequences.
Based on the above considerations, this paper adopts the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) technology and introduces Gaussian white noise into the initial data set to promote the comprehensive decomposition of the signal and minimize the error processing. After decomposition, the correlation between each modal component is analyzed in depth, and then the relationship between these components and historical data is investigated. In order to reconstruct each decomposed modal component, sample entropy (SE) and partial autocorrelation function (PACF) analysis are used to evaluate the time correlation between each modal component and its historical corresponding component. Based on the above research, this paper chooses the global search whale optimization algorithm and kernel extreme learning machine (GSWOA-KELM) prediction model with excellent nonlinear mapping ability to establish the prediction model of dam deformation. The actual monitoring data of the Xiaowan double-curvature arch dam are used to verify the effectiveness and accuracy of the proposed prediction method.
3. Construction of Kernel Extreme Learning Machine Model Based on Global Search Strategy to Optimize Whale Algorithm
3.1. The Global Search Whale Optimization Algorithm (GSWOA)
The traditional whale optimization algorithm (WOA) is known for its streamlined structure and minimal parameterization. When dealing with multivariate function optimization tasks, it has a competitive advantage over previous algorithms in terms of calculation speed and accuracy [
34,
35]. However, the global search ability of the WOA is limited, and the accuracy of finding the optimal solution is relatively low. Therefore, this paper adopts an improved WOA, the global search whale optimization algorithm (GSWOA), which integrates the global search strategy [
36,
37]. The implementation of the algorithm is refined as follows:
The position improvement equation of the improved whale optimization algorithm is as follows:
An inertia weight
that changes with the number of iterations is added to the whale position update process.
where
inertia weights is a nonlinear shift that exists in the interval [0, 1];
is the number of iterations;
is the whale position;
is the global optimal position;
is a random point location where the whale may exist;
is a constant;
is a random number taken out of the interval [−1, 1]; and
is an arbitrary number of values taken from [0, 1].
and
are coefficients matrices, and the expressions are as follows:
where
and
are random numbers of [0, 1];
is the maximum number of iterations; and
is the convergence factor.
In order to alleviate the problem of the spiral motion mode of the whale rotation search being too homogeneous due to the constant coefficient
, a variable spiral motion position update mechanism is introduced. The mechanism involves setting the parameter
to increase with each iteration, thereby shrinking the spiral trajectory from a larger formation to a smaller formation. The modified mathematical model of rotation search is
In the process of a whale position update, constantly updating the optimal position will lead to low search efficiency and the emergence of local optimal solutions. In order to improve the convergence speed of the algorithm, this paper introduces an optimal domain fluctuation search. The formula of this improved search mechanism is as follows:
where
and
are random numbers between [0, 1];
is the new position randomly searched. If the new position is better than the most available position, the ones are exchanged; otherwise, the optimal position remains unchanged.
For the newly generated locations, greedy selection criteria are used to determine their retained survivability. The improved formula is as follows:
where
is the positional adaptation value of
x. If the new position is better than the most available position, the ones are exchanged; otherwise, the optimal position remains unchanged.
3.2. Kernel Extreme Learning Machine (KELM) Algorithm
Extreme learning machine (ELM) is a single hidden layer feedforward neural network, which is characterized by the randomness of weight and bias, which leads to the variability of its prediction performance. In order to improve this problem, kernel extreme learning machine (KELM) is proposed, which combines regularization and kernel methods to enhance the stability and generalization ability of the model [
38].
ELM improves the generalization ability of the network by minimizing the training error and output weight norm. In the optimization process, the regularization coefficient
is introduced to balance the two, avoid overfitting, and promote the performance of the model. Then the output
weight is
For the case where the hidden layer feature map
is unknown, the kernel matrix of the kernel extreme learning machine can be defined as
The output function of KELM can be described as follows:
The kernel function used in this paper is the Gaussian kernel function, which is defined as follows:
where
is the desired output result of the learning model;
H is the output matrix of the hidden layer;
is the Moore–Penrose generalized inverse matrix of matrix
H;
is a unit matrix with dimension
N;
and
are input vectors;
is the kernel function matrix;
is the stochastic matrix of the ELM model;
is a kernel parameter;
is the output target vector; and
K(
xi,
xj) is the kernel function.
3.3. The Specific Steps of GSWOA Optimizing KELM Model
The parameter selection of the KELM model mainly depends on the choice of kernel function type and regularization coefficient, and the choice of kernel function has a significant impact on the performance of the model. Therefore, this paper proposes a method to optimize KELM parameters using GSWOA. The optimization steps are as follows:
Step 1 Initialize the whale population: A set of random whales is generated, and each whale represents a set of potential parameters of the KELM model.
Step 2 Calculate fitness: For the parameters corresponding to each whale, the KELM model is used to evaluate the performance on the training set or the verification set, such as calculating the fitness through the error of cross-validation.
Step 3 Determine the optimal solution: Find the current optimal solution in the whale population, which will guide other whales to update their positions.
Step 4 Update position: According to the search mechanism in the whale optimization algorithm, combined with the position of the current optimal solution, the position of the whale is updated.
Step 5 Iterative search: Repeat the above steps, update the optimal solution after each iteration, adjust the search behavior according to the global search strategy, and iterate until the termination condition is satisfied.
Step 6 Parameter determination and final model training: After the iteration, the optimal solution (optimal whale position) is used as the parameter of the KELM model. The optimized parameters are used to retrain the KELM model to ensure that the model has fully learned the data features.
5. Case Analysis
This case study explores an important hydropower project located in the middle reaches of the Lancang River in Yunnan Province. The project includes a series of infrastructures, including a concrete double-curvature arch dam, plunge pool, sub dam, spillway tunnel, and extensive underground water diversion power generation network. Among them, the Xiaowan double-curvature arch dam is the focus, with a dam height of 294.5 m, a standard water level of 1240 m, and an installed capacity of 4200 MW. It guarantees a capacity of 1778 megawatts (MW) and generates up to 1.9 million kWh (kWh) of electricity per year. The top view of the dam is shown in
Figure 3.
The validity and accuracy of the dam deformation prediction model proposed in this paper are analyzed by using the monitoring data of A22-PL-02 and A22-PL-03 monitors of the Xiaowan double-curvature arch dam. This paper describes the advantages of the model in dam deformation analysis. In order to verify the reliability of the model, the monitoring data of arch crown beams A22-PL-02 and A22-PL-03 from December 2008 to December 2016 were used. The deformation prediction ability of the model was examined by 2896 sets of data sets. A total of 80% (2316 groups) of the data sets were allocated for model training, and the remaining 20% (580 groups) constituted the test set. The spatial distribution of arch dam measuring points is shown in
Figure 4a, and the related environmental factors are shown in
Figure 4b. These visual representations help in understanding the geospatial dynamics and environmental background of dam operation, thereby enhancing the robustness of deformation analysis.
5.1. Data Preprocessing: Constructing Model Feature Factors
In the process of collecting prototype data of dam deformation monitoring, a series of inevitable technical challenges are faced, including equipment dysfunction and data transmission failure, which lead to a small amount of missing data in the data set. For those partially missing monitoring data, this paper selects the cubic Hermite interpolation method for data supplement. The interpolation method can not only effectively restore the missing data but also maintain the local high-order continuity of the data, thus ensuring the integrity and accuracy of the data set and improving the reliability of subsequent analysis and model training. Based on the analysis of the influencing factors of dam deformation, the dam deformation displacement is composed of hydraulic component
δH, temperature component
δT, and aging component
δθ [
39].
The variables denoted by H and H0 represent the upstream water level and the base elevation of the dam, respectively, at the given moment; t and t0 represent the monitoring sequence at a specific moment and the reference moment, respectively; θ and θ0 are the ratios of t and t0 to 100, respectively; a1i, b1i, b2i, c1, and c2 correspond to the fitting coefficients. The resultant factor influencing water pressure is represented by . The temperature impact factor is . The aging influence factor is θ − θ0, lnθ − lnθ0.
In order to unify the range of different features to the same scale and reduce the numerical difference between features to avoid the negative impact on the accuracy of the model, data standardization has become a necessary preprocessing step.
5.2. Comparative Analysis of Decomposition and Reconstruction Techniques
In order to comprehensively evaluate the effectiveness of EMD, EEMD and CEEMDAN decomposition techniques in dam deformation monitoring data processing, a series of quantitative analysis was carried out. Specifically, in this paper, the EEMD and EMD algorithms are synchronized and applied to the standardized data set to facilitate the comparative evaluation of their error reconstruction capabilities. Through comparative analysis, it can be seen that the accuracy of EEMD reconstruction of dam deformation signal is significantly lower than that of traditional EMD method. This decline is largely attributed to the number of sets and white noise contained in EEMD. In contrast, the CEEMDAN algorithm shows stronger reconstruction performance. CEEMDAN shows a comparable error level with EMD, highlighting its advantages in signal reconstruction consistency. The results confirm that CEEMDAN improves the accuracy and efficiency of decomposition by noise cancellation. The reconstruction results of the three decomposition methods are shown in
Figure 5.
Through the comparative analysis of the above three decomposition methods, it can be seen that CEEMDAN shows better performance. In practical engineering applications, this method has the following advantages: (1) The CEEMDAN method can extract features and trends from dam deformation data more accurately. The prediction model based on these reconstructed data has high accuracy and accuracy, which enhances the reliability and effectiveness of dam deformation prediction. (2) The safety management and maintenance of dams requires reliable engineering decisions, including maintenance planning, monitoring, and early warning systems. With the help of CEEMDAN, a more accurate deformation prediction model enhances the reliability of these decisions and improves the safety of the dam. (3) Dam monitoring and early warning systems rely on accurate data to identify potential anomalies and trigger timely interventions. The CEEMDAN decomposition provides more reliable and consistent data, thereby improving the performance of the monitoring and early warning system, reducing false positives and false negatives, and ensuring timely response to potential risks. (4) The CEEMDAN method enables dam managers to better formulate strategies and allocate resources by suppressing false positives and unnecessary maintenance tasks, thereby reducing costs and improving operational efficiency. (5) As an important infrastructure, the safety of the dam directly affects the stability and development of the surrounding areas. Using CEEMDAN to predict deformation is conducive to early detection of risks and taking preventive measures so as to ensure the development and stability of social economy.
The practical significance of the CEEMDAN reconstruction error being better than the other two is that (1) the prediction accuracy is improved, and the lower reconstruction error means that the IMFs extracted from the original signal are less different from the real signal; (2) the quality of signal analysis is improved, as CEEMDAN can more accurately reveal the trend, periodic components, and outliers in the dam deformation signal, which is helpful to better understand the dynamic process; and (3) by reducing the reconstruction error, CEEMDAN can reduce the false positives and omissions caused by the model prediction error to a certain extent.
5.3. Analysis of the Results of Sample Entropy and CEEMDAN
In the practical application of dam safety, it is a key work to use the CEEMDAN algorithm to decompose dam data, which is helpful to reveal the important features and patterns hidden in the data. The CEEMDAN algorithm decomposes the original dam data into multiple IMF components, and each IMF component represents the components of different frequencies and amplitudes in the data. When determining the key IMFs in these components, we usually compare them based on the sample entropy associated with each intrinsic IMF. Specifically, through the sample entropy analysis, the important IMFs in the dam data can be determined, which have significant fluctuations and characteristics in the data. In
Figure 6, we can see the modal components after CEEMDAN-SE decomposition, which show different characteristics and fluctuation modes. It can be seen from
Figure 7 that the sample entropy values of IMF1~IMF4 are higher than the overall data, so they represent high-frequency components with significant fluctuation characteristics. These high frequency components may be related to the subtle changes of the dam structure or the influence of environmental factors. IMF5~IMF7 show periodic oscillation, which may be related to the periodic change in dam stress or the influence of surrounding geological conditions. In contrast, IMF8~IMF10 are low frequency components, reflecting the time trend of dam deformation, which may be related to long-term structural changes or temperature and other factors.
5.4. The Final Model Input Variables Are Determined by PACF Analysis
In the practical application of dam safety, in order to improve the accuracy and effectiveness of the prediction model, the CEEMDAN-SE method is used to decompose the original time series data in detail to obtain 10 IMF components. These IMFs can capture the fluctuation characteristics of different scales in the original data and serve as signal sources for subsequent analysis.
In this paper, PACF analysis is applied to the 10 IMF components generated above to quantify the direct relationship between time points in the time series and to eliminate the influence of indirect correlation in order to do an in-depth study of the correlation strength between its time series data points and select the best input feature set. As shown in
Figure 8, by calculating the partial autocorrelation coefficient between the time series and its lag sequence, a significant correlation can be found, and the optimal input variable length of each GSWOA-KELM model can be determined.
Table 1 provides a detailed configuration of the optimal input variables for each IMF component to ensure the maximum correlation between the input features and the target prediction output, thereby enhancing the prediction ability of the model. Through this method, we can make full use of the data after CEEMDAN-SE decomposition, combined with the GSWOA-KELM model and PACF analysis, to build a more accurate and reliable prediction model and provide more effective tools and means for dam safety management.
Selecting the appropriate input variables is the key to time series analysis and predictive modeling. The PACF results shown in
Figure 8 provide an important statistical basis for variable selection. In this paper, taking IMF1 as an example, the maximum lag period with significant correlation is determined by identifying the PACF value initially exceeding the 95% confidence interval. Specifically, if this threshold is exceeded on the fifth day of the lag, it indicates that there is a significant linear correlation between the lag value from the first day to the fourth day and the current observation value. Based on this analysis, four consecutive lag values from (
t − 4) to (
t − 1) d are selected as input variables, where d represents the number of days and t represents the current time point. These variables are used to predict the target value of the current day (td). In addition, in order to evaluate the predictive ability of the model for dam operation and management from a broader perspective, this study extends the focus from predicting the current day (td) to predicting the next three days ((
t + 3)
d) and six days ((
t + 6)
d).
Figure 9 illustrates the selection of appropriate input variables for different prediction periods based on PACF results. This method framework provides systematic guidance for determining which historical data points are most predictive when constructing prediction models. This input variable selection strategy optimizes the prediction performance of the model and ensures the operation and management of the dam.
5.5. Selection of Kernel Functions and Comparative Analysis of GSWOA-KELM Models
In this paper, GSWOA-KELM is used to model and predict the two measuring points. The choice of kernel function in the KELM model plays an important role in its performance and behavior. The kernel function is used to map the input data to a high-dimensional space, thereby enhancing linear differentiability or promoting improved fitting in the above space. Different kernel functions produce different data mappings and model behaviors, which have different effects on model performance. Among all kernel functions, the linear kernel function operates by mapping the data to the original feature space without nonlinear transformation, making it suitable for linearly differentiable scenarios. Therefore, linear kernels show good performance on linearly differentiable data sets. In contrast, the radial basis function (RBF) kernel evaluates the similarity between points in the feature space by projecting the data into an infinite dimensional space and using the negative exponential distance from the data point to the center point. Although the RBF kernel usually performs well in dealing with nonlinear data, it needs to be cautious when tuning parameters, especially bandwidth parameters, to avoid overfitting. In addition, such as the Sigmoid kernel, it may produce good results in specific scenarios, although careful selection and adjustment are required based on existing problems. The selection of kernel function must consider data characteristics, problem complexity, and model performance requirements. Through the reasonable selection and adjustment of the kernel function, the generalization ability, fitting ability, and adaptability to new data of the model can be enhanced.
Therefore, in this paper, the prediction ability of the unoptimized KELM model is evaluated for the original dam deformation data set. Then, under the condition of the uniform data set shown in
Figure 10a, the prediction performance of KELM models using different kernel functions is compared and evaluated, and the radar chart of the corresponding evaluation index is given in
Figure 10b.
After determining the kernel function type, it is necessary to determine the regularization parameter (C), the kernel parameter, and the number of ELM hidden layer nodes. GSWOA has been widely used in the field of function optimization because of its fast computational efficiency, fast convergence speed, and strong global search ability. Therefore, this paper selects the GSWOA algorithm to optimize the KELM parameters. In this paper, GSWOA is compared with traditional algorithms to evaluate its adaptability. As shown in
Figure 11, the convergence speed of GSWOA is significantly accelerated after the 10th iteration, which is a phenomenon that other algorithms have not observed in the same time. This observation shows that compared with other algorithms, GSWOA shows superior convergence speed and computational efficiency. The specific parameters of each algorithm are listed in
Table 2.
5.6. Evaluate the Robustness and Computational Efficiency of the KELM Model
In the practical application of dam safety, in order to evaluate the prediction effect of the KELM model proposed in this paper, we choose the traditional BP, ELM, CNN, SVM, and GRU models as the comparison models. These models are used to predict dam deformation data using the same data set division as the model proposed in this paper, including training set and test set. At the same time, in order to ensure the consistency of the model, this paper uses the initial model to directly verify and compare all the models. This paper compares the prediction results of the models and evaluates their prediction performance through graph display and evaluation indicators.
Figure 12 shows the prediction results of each model. By comparing the predicted values and measured values of different models, their fitting degree and prediction accuracy can be intuitively evaluated. By comparing the model proposed in this paper with the traditional neural network and machine learning model, its performance in dam deformation prediction can be comprehensively evaluated. Through this comparison, it is helpful to select the most suitable prediction model for practical application and provide a more reliable and effective prediction tool for dam safety management.
As shown in
Figure 12, the KELM model, as the final prediction framework of this study, shows excellent prediction performance compared with the traditional model. See
Table 3 for details. Compared with the BP model, the
RMSE,
MSE, and
MAE of the KELM model are reduced by 0.1707 mm, 0.0882 mm
2, and 0.1630 mm, respectively, while
R2 is increased by 2.79%. Compared with the BP model, the advantages of the KELM model are as follows: (1) The KELM model has a faster training speed because it does not require an iterative back propagation algorithm to directly solve the output weight; (2) compared with the BP model, the KELM model requires fewer hyperparameters to adjust, which simplifies the implementation and adjustment process; and (3) the KELM model often avoids the problem of the BP model falling into the local optimum by randomly initializing the feature weights, thereby reducing the risk of falling into the local optimum.
Compared with the ELM model, the RMSE, MSE, and MAE of the KELM model are reduced by 0.8509 mm, 1.0185 mm2, and 0.8124 mm, respectively, and the R2 is increased by 31.87%. Compared with the ELM model, the advantages of the KELM model are as follows: (1) Compared with the ELM model, the KELM model usually requires less parameter optimization, which simplifies the model tuning process; (2) the KELM model shows enhanced robustness to the random weight initialization between the input layer and the hidden layer, ensuring continuous prediction performance stability; (3) the KELM model is usually easy to learn online and can quickly update and adjust when new data arrive.
Compared with the CNN, SVM, and GRU models, the KELM model has a significant improvement in all evaluation indicators. Therefore, by predicting the dam deformation at different measuring points, this paper confirms the general effectiveness of the proposed model and also confirms its robustness to predict dam deformation skillfully even in the case of partial missing original data.
In order to verify the computational efficiency advantage of the model proposed in this study, we recorded the average execution time of 20 independent runs of each model, which is recorded separately in
Table 4. It can be seen from
Table 4 that the CNN and GRU models require longer running time when applied to the same target sequence. This difference is due to the fact that CNN often requires additional convolutional layers and filters to extract relevant features, which increases the complexity of the model. GRU performs well in retaining long-term dependent information in time series data. Compared with the CNN and GRU models, the KELM model shows higher computational efficiency. This observation shows that (1) the KELM model usually requires less memory resources because it does not need to store a large number of convolution kernel parameters. Therefore, it performs well in terms of parameter storage efficiency and computational cost benefits. (2) The KELM model has fewer parameters and a relatively simple structure, which enhances the interpretability and comprehensibility of the prediction process. (3) The KELM model shows greater flexibility in dealing with unstructured and sequential data, making it suitable for different data models.
5.7. Deformation Prediction Results and Comparative Analysis
In the practical application of dam safety, in order to verify the effectiveness of the proposed dam deformation monitoring model, it is compared with the CEEMDAN-WOA-KELM, GSWOA-KELM, CEEMDAN-KELM, and KELM models. These models represent different methods and strategies, and we compare them with the proposed models to evaluate their predictive performance. Firstly, the dam deformation data of each model are predicted, and the prediction results are compared. In this paper, the positive vertical lateral displacement is selected for the A22-PL-02 measuring point, and the positive vertical longitudinal displacement is selected for the A22-PL-03 measuring point: (1) There are significant differences in the structure and stress state of the dam at different locations. Some measuring points are susceptible to lateral forces, while others are mainly affected by vertical forces. Therefore, monitoring the displacement in different directions is helpful to fully understand the deformation behavior of the dam. (2) In the dam deformation monitoring, the key parts need to focus on the horizontal or vertical displacement to prevent the risk of structural instability or settlement. According to the location and importance, it is necessary to select the appropriate displacement direction for prediction. (3) The characteristics of displacement data in different directions may affect the prediction performance of the model. By analyzing historical data and selecting displacement prediction in a specific direction, the accuracy and reliability of the model can be improved.
Figure 13 shows the prediction results of each model. By comparing the predicted values of different models with the actual observations, the fitting degree and prediction accuracy can be evaluated. At the same time, we also analyze the residuals of each model.
Figure 14 shows the residual distribution of each model. The final prediction results are shown in
Table 5. These indicators can objectively evaluate the prediction accuracy and goodness of fit of the model and help us determine which model performs best in dam deformation analysis. Through comparative analysis, the optimal dam deformation analysis model can be determined, and its effectiveness in practical application can be verified.
It can be seen from
Figure 13 that the prediction performance of the CEEMDAN-SE-PACF-GSWOA-KELM model is better than that of the CEEMDAN-WOA-KELM, GSWOA-KELM, CEEMDAN-KELM, and KELM models to varying degrees. The A22-PL-02 measurement points in
Table 5 are analyzed. Compared with the CEEMDAN-WOA-KELM model,
RMSE,
MSE, and
MAE are reduced by 0.5992 mm, 1.1303 mm
2, and 0.5523 mm, respectively, and
R2 is increased by 6.83%. This shows that the GSWOA algorithm is effective in optimizing the key parameters of KELM, and the prediction accuracy of the model is improved compared with the WOA. Specifically, 1. GSWOA introduces the variable spiral position update, which improves the global search diversity and the ability of the algorithm to find the optimal solution; 2. GSWOA enhances the search stability, reduces the risk of falling into local optimum, and improves the robustness of the algorithm; and 3. GSWOA shows adaptability to different optimization problems and shows stronger generalization ability in complex scenes.
Compared with the GSWOA-KELM model, the RMSE, MSE, and MAE of the CEEMDAN-SE-PACF-GSWOA-KELM model are reduced by 0.3340 mm, 0.5414 mm2, and 0.3702 mm, respectively, while R2 is increased by 4.79%. This shows that the advantages of CEEMDAN-SE-PACF preprocessing are as follows: (1) CEEMDAN-SE-PACF effectively extracts the principal components of the signal, filters out the noise data components, and improves the data quality and accuracy; (2) CEEMDAN-SE-PACF identifies key signal features, which helps to understand the inherent laws of data and improve prediction accuracy; and (3) CEEMDAN-SE-PACF performs downscaling processing on the signal, reduces the data complexity, improves the analysis efficiency, and reduces the risk of overfitting.
Compared with the CEEMDAN-KELM model, the RMSE, MSE, and MAE of the CEEMDAN-SE-PACF-GSWOA-KELM model are reduced by 1.2763 mm, 3.3046 mm2, and 1.1409 mm, respectively, and R2 is increased by 14.37%. This emphasizes both the benefits of algorithm optimization and the benefits of data preprocessing.
The following can be seen from
Figure 14: (1) From the residual diagram, it can be seen that the CEEMDAN-SE-PACF-GSWOA-KELM model obeys the normal distribution, while other models show different degrees of bell symmetry, indicating that it approximately obeys the normal distribution. (2) For the CEEMDAN-SE-PACF-GSWOA-KELM model, the residual mean tends to zero, indicating that the deviation is the smallest, and the deviation from zero means that other models have potential model deviations. (3) It is worth noting that the abnormal residual distribution of the CEEMDAN-WOA-KELM, GSWOA-KELM, CEEMDAN-KELM, and KELM models indicates the prediction bias or error in some scenarios.