A Data-Driven Dam Deformation Forecasting and Interpretation Method Using the Measured Prototypical Temperature Data

: Dam deformation is an intuitive and reliable monitoring indicator for dam structural response. With the increase in the service life of the project, the structural response and environmental quantity data collected by the structural health monitoring (SHM) system show a geometric growth trend. The traditional hydraulic-seasonal-time (HST) model shows poor performance in dealing with massive monitoring data due to the multidimensional data collinearity problem and the inaccurate temperature ﬁeld simulations. To address these problems, this study proposes a data-driven dam deformation monitoring model for dealing with massive monitoring data based on the light gradient boosting tree (LGB) and Bayesian optimization (BO) algorithm. The proposed BO–LGB method can mine the underlying relationship between temperature changes and dam deformation instead of simple harmonic functions. Moreover, LGB is used to simulate the relationship between high-dimensional environmental quantity data and dam displacement changes, and the BO algorithm is used to determine the optimal hyperparameter selection of LGB based on massive monitoring data. A concrete dam in long-term service was used as the case study, and three typical dam displacement monitoring points were used for model training and validation. The experimental results have indicated that the method can properly consider the collinearity in variables, and has a good balance in modeling accuracy and efﬁciency when dealing with high-dimensional large-scale dam monitoring data. Moreover, the proposed method can explain the contribution difference between different input variables to select the factors with a more signiﬁcant inﬂuence on modeling.


Introduction
There are more than 98,880 dams in service in China, and about 40-50% of these dams were built in the 20th century [1]. These dams suffer from historical problems like poor design and construction standards, insufficient strength of dam materials, and poor construction technology [2]. With the increase in service life, the mechanical properties of materials inevitably decline, and then structural reliability also declines [3].
Dam structural health monitoring (SHM) is an effective monitoring technique for dam safety by imitating the self-sensing and self-diagnosis capability of humans [4][5][6]. Sensors are arranged in the dam body and its foundation to monitor various physical quantities related to dam structural response, such as deformation, settlement, crack opening, seepage, etc. [7,8]. As one of the most commonly used monitoring indicators, dam deformation is regarded as a direct response to dam structures [9][10][11][12][13][14]. The plumb line (PL) and inverted plumb line (IP) systems are often embedded in the dam body and its foundation to monitor the horizontal displacement for concrete dams. Other monitoring methods, such as tension lines and laser collimation, are also used for the observation of horizontal displacements. With the increase in the dam service period, dam prototype monitoring data continuously accumulate and superimpose, resulting in a huge amount of monitoring data [15]. Thus, it is desirable to develop advanced tools to mine useful information related to the dam displacement changes from these massive monitoring data.
The construction of a dam behavior prediction, monitoring, and interpretation model is of great significance for improving the management level of dam daily service [5,[16][17][18][19]. The most popular data-based method for dam safety monitoring is based on the hydraulicseasonal-time (HST) model first proposed by Willm and Beaujoint [20]. The basic assumption of the HST model is that dam structural response (e.g., dam deformation) can be represented as the linear combination of three effects, including hydraulic, temperature, and time effects [14]. However, practical application has proven that the conventional HST model still suffers from some problems. Firstly, the actual thermal effect cannot be accurately simulated through sinusoidal functions, and the multicollinearity between hydraulic and temperature effects is difficult to be considered by the HST model [21,22] To solve these problems, a series of variations of HST models were proposed in the past few decades. For example, Hu et al. [23] proposed a hydrostatic-thermal-crack-time model to interpret dam displacements for concrete dams with a large-scale horizontal crack on the downstream face. Among these, the hydraulic-thermal-time (HTT) model has been proven as an effective method for considering the actual temperature field [24]. By adding the monitoring data of thermometers embedded in the dam body and foundation, HTT can more accurately simulate the thermal effect of dam structural response. However, a large number of thermometers are embedded in the dam body and its foundation, and it is difficult to select suitable thermometer data that show similar variations to the structural response [3]. Moreover, the input variables of the HTT model are usually high-dimensional data, and conventional statistical methods cannot fully consider the collinearity between factors when dealing with high-dimensional collinearity [24].
In the past few decades, with the rapid development of artificial intelligence (AI), machine learning (ML) techniques have been applied in the dam safety monitoring field [23,[25][26][27]. A series of ML-based data-driven techniques have been introduced to build dam safety monitoring models [28][29][30]. For example, Ribeiro et al. [31] utilized four ML modeling methods, including recurrent neural network, LSTM, auto-regressive integrated models of seasonal moving average (SARIMA), and SARIMA with exogenous variables (SARIMAX) to predict concrete dam long-term deformation. Liu et al. [31] proposed a coupling prediction model for dam long-term displacement prediction based on the long-short memory network. Li et al. [25] developed a new distributed time series evolution model for dam deformation prediction based on constituent elements. However, most of the existing studies focus only on a small amount of monitoring data. With the increase in dam service life, the prototype monitoring data continuously accumulate and superimpose [32][33][34]. Thus, it is desirable to propose a scheme suitable for big data mining and modeling. Moreover, improving model transparency and the interpretability of prediction results is also a trend in the development of dam monitoring models.
To overcome these problems, this study developed a data-driven dam deformation monitoring and interpretation model using the light gradient boosting machine (LGB) and Bayesian Optimization (BO). Specifically, actual prototypical temperature data is introduced to represent the temperature variables instead of simple harmonic functions. Then, the LGB is used to deal with high dimensional long-term monitoring data and mine the underlying relationship of dam deformation behavior. Then, the Bayesian optimization (BO) algorithm is used to determine the optimal parameter in the massive environmental monitoring data. A concrete dam in service for long-term periods was used as the case study, and three typical dam displacement monitoring points were used as the research objects. A series of state-of-the-art methods including statistical and ML methods were used as the benchmark methods. The evaluation of model performance was carried out from three aspects, including short-term and long-term prediction accuracy, and the model calculation efficiency and time cost. Moreover, the model interpretation capability for the contribution rate of factors affecting dam displacement was evaluated to improve the transparency of the monitoring model. The rest of the paper was organized as follows: Section 2 gives a brief introduction to the methodology of LGB, BO algorithm, and evaluation indicators. Then, the flowchart of the proposed BO-LGB framework for dam deformation behavior prediction and interpretation is described. In Section 3, a gravity dam in long-term service was used as the case study. The actual thermometer data collected from the dam body and its foundation was used for base model training. Section 4 discusses the model training and parameter optimization process. Then the model performance in short-term and long-term prediction is evaluated and compared with various state-of-the-art benchmark methods. Finally, the advantages and the limitations of the proposed framework have been discussed in Section 5. Figure 1 shows the flowchart of the proposed dam deformation monitoring and interpretation framework. Firstly, different from the conventional HST model, actual dam temperature field prototype monitoring data was introduced for model training. To deal with the problem of high-dimensional monitoring processing, LGB was proposed to mine the underlying relationship between environmental variables and dam deformation. Next, the BO parameter tuning algorithm was used to determine the optimal parameter of the proposed method. A concrete dam in long-term service was used as the case study, and three typical monitoring points were used to validate the model's effectiveness. A series of state-of-the-art methods in dam safety monitoring were used as the benchmark methods for model validation. Moreover, the evaluation of the importance rate of different environmental factors was also verified. from three aspects, including short-term and long-term prediction accuracy, and the model calculation efficiency and time cost. Moreover, the model interpretation capability for the contribution rate of factors affecting dam displacement was evaluated to improve the transparency of the monitoring model. The rest of the paper was organized as follows: Section 2 gives a brief introduction to the methodology of LGB, BO algorithm, and evaluation indicators. Then, the flowchart of the proposed BO-LGB framework for dam deformation behavior prediction and interpretation is described. In Section 3, a gravity dam in long-term service was used as the case study. The actual thermometer data collected from the dam body and its foundation was used for base model training. Section 4 discusses the model training and parameter optimization process. Then the model performance in short-term and long-term prediction is evaluated and compared with various state-of-the-art benchmark methods. Finally, the advantages and the limitations of the proposed framework have been discussed in Section 5. Figure 1 shows the flowchart of the proposed dam deformation monitoring and interpretation framework. Firstly, different from the conventional HST model, actual dam temperature field prototype monitoring data was introduced for model training. To deal with the problem of high-dimensional monitoring processing, LGB was proposed to mine the underlying relationship between environmental variables and dam deformation. Next, the BO parameter tuning algorithm was used to determine the optimal parameter of the proposed method. A concrete dam in long-term service was used as the case study, and three typical monitoring points were used to validate the model's effectiveness. A series of state-of-the-art methods in dam safety monitoring were used as the benchmark methods for model validation. Moreover, the evaluation of the importance rate of different environmental factors was also verified.

Dam Deformation Statistical Monitoring Model
As the most intuitive and reliable monitoring index, the dam deformation monitoring model has received extensive attention recently [9]. The dam horizontal displacement data can be denoted as the following three variables, including hydraulic variable, thermal variable, and time-varying variable.
The hydraulic variable can be denoted as follows: where H represents the upstream water level before the dam, n= 3 for the gravity dam, and n= 4 for the arch dam. The thermal variable is caused by the temperature changes of the dam body and its foundation. It is usually represented by the combination of simple harmonic periodic functions.
where N is usually selected as 2 and t denotes the cumulative days from the measurement date to the initial date. The actual engineering operation research indicates that the temperature variable is the main factor affecting the deformation variation of the arch dam. However, it is difficult to accurately simulate the dam temperature field by purely relying on simple harmonic functions. A more efficient and reliable solution is to utilize the prototypical thermometer data embedded at different elevations of the dam. The thermal variable can be denoted as follows [35].
where T i represents the observed temperature data collected from the thermometers and L denotes the number of thermometers embedded in the dam body and its foundation. The time-varying variable reflects the creep influence of dam concrete. They are denoted as follows.
where θ = t/100, and c 1 or c 2 denote the time-varying factor regression coefficient.

LGB
A series of tree-based methods, such as random forest [36], and XGBoost [37,38], have been used for dam behavior prediction. Compared with deep learning techniques, tree-based techniques have some significant advantages, such as higher interpretability for prediction results and strong processing capability for unbalanced data. However, with the advent and development of the big data era, both the feature dimension and the sample size of monitoring data show a significant increasing trend. The efficiency and scalability are unsatisfactory when dealing with high-dimension features and large-scale instances to estimate the possible split points, which are ineffective and time-consuming for big data prediction [39] data problems. This can mainly be attributed to these models having to scan all the data.
In this study, an improved tree-based technique, called LGB, was introduced to build dam monitoring models that are suitable for massive monitoring data.
LGB is an openaccess gradient boosting framework based on the decision tree to increase the model efficiency and reduce calculation burden, which was first proposed by Ke in 2017 [24]. It combines two novel techniques, including gradient-based one side sampling (GOSS) and exclusive feature bundling (EFB) to improve the model efficiency when dealing with big data and data with huge feature dimensions. Assuming there is a dam monitoring dataset The basic aim of LGB is to find an approximationf (x) to a certain function f * (x) to minimize the specific loss function L(y, f (x)) . The details can be described as follows.
A series of regression trees T ∑ t=1 f t (X) are integrated to give the final estimation result.
The formula can be denoted as The regression trees can be represented in the following type, which are where J denotes the number of leaves, denotes the decision rules in trees, and is used to represent the sample weight of leaf nodes. In this step, LGB can be trained in the following form as follows.
This object function can be further simplified by removing the constant term in Equation (9), which are denoted as follows.
where g i and h i represent the first and the second order gradient statistics of the loss function. where From the above-mentioned analysis, it can be inferred that the selection of hyperparameters will significantly influence the modeling performance and prediction accuracy of LGB. Thus, it is desirable to carefully select the number of hyperparameters to be adjusted and the parameter ranges when using LGB for dam safety monitoring modeling.

Bayesian Optimization and Cross-Validation
Model hyperparameter tuning is an important model training process for most MLbased algorithms. These hyperparameters can be further categorized into the parameter that defines the model structure itself and the parameter required by the objective function and optimization algorithm. Among them, model structure parameters will influence the results in both the training and the prediction stage, which is also the main research target. It is necessary to manually set these values during the training phase, and the whole process will consume a lot of time and labor costs to obtain good results through trial and error. Thus, it is desirable to automatically determine the value of hyperparameters for the construction of dam prediction and monitoring models.
Assuming the dam monitoring data D 1:t = {(x 1 , y 1 ), (x 2 , y 2 ), . . . (x t , y t )}, the objective function can be denoted as f , and then the posterior distribution probability can be represented as follows.
where p( f ) represents the prior probability distribution of f ; p(D 1:t | f ) denotes the likelihood distribution of y and p(D 1:t ) denotes the marginalized likelihood distribution of f . The probabilistic surrogate model and acquisition function are two main components of the BO algorithm. Specifically, the probabilistic surrogate model consists of the prior probability model and the observation model. By updating the probabilistic surrogate model, the posterior probability distribution can cover data information. The acquisition function can be obtained according to the posterior probability distribution, and the main aim is to determine the most probable evaluation point to minimize the loss in the evaluation point sequence. Figure 2 shows the intuitive diagram of the K-fold cross-validation. As can be seen from the figure, the original data is firstly divided into the training and validation sets. The training set is used for base model training, and the validation set is used to test the model prediction capability. The detailed step of K-fold cross-validation can be seen as follows.
Step 1: The training set is randomly divided into K disjoint subsets; Step 2: The 1st and K-1th subsets are used as the training set, and the Kth subset is used as the verification set. Then, the prediction accuracy of the K group subset is calculated; Step 3: The second to Kth group subsets are used as the training set, and the first group subset is used as the verification set to obtain the prediction accuracy of the Kth group subset test; Step 4: The average prediction accuracy of the above K model is taken as the performance index of the model under K-fold cross-validation.
Water 2022, 14, x FOR PEER REVIEW Step 4: The average prediction accuracy of the above K model is taken as the pe index of the model under K-fold cross-validation.

Project Description
An arch dam that has been in operation for many years was used as the

Project Description
An arch dam that has been in operation for many years was used as the case study. Figure 3 shows the top view of the arch dam used in this case study. The construction of this project was started in 1968, and completed in 1971. Then, the dam was further heightened by 6.5 m in 1976 after experiencing flooding. From April 1999 to May 2000, the project was reinforced. The left and right abutments were grouted for leakage control and corresponding management facilities were implemented.

Project Description
An arch dam that has been in operation for many years was used as the case study. Figure 3 shows the top view of the arch dam used in this case study. The construction of this project was started in 1968, and completed in 1971. Then, the dam was further heightened by 6.5 m in 1976 after experiencing flooding. From April 1999 to May 2000, the project was reinforced. The left and right abutments were grouted for leakage control and corresponding management facilities were implemented.
The control drainage area of the dam site is 165 km 2 , and the annual average rainfall is 650 mm. The design flood level of the dam is 481.75 m, and the dam foundation elevation is 389.5 m. The total storage capacity of the reservoir is 16.6 million m 3 . The dam is a concrete gravity arch dam with a fixed center and variable radius. The dam crest arc length is 154.28 m, and the dam crest central angle is 80°. The outer radius is 110.5 m, and the dam crest elevation is 490.5 m. The maximum dam height is 202 m, and the dam crest thickness is 4.5 m.  To monitor the environmental loads related to dam structural behavior, a series of sensors were embedded in the dam body and its foundation to monitor physical quantities, such as temperatures, water level, and rainfall. Figure 4 shows the layout of parts of thermometer monitoring points in the typical dam section. As can be seen from the image, thermometers are arranged along the dam to monitor the temperature variation in different elevations.
Deformation is the most intuitive monitoring indicator of structural behavior changes in arch dams. A plumb line system is utilized to monitor the horizontal displacement of the dam body and its foundation in different elevations. Figure 5 shows the typical monitoring point layout of PL and IP in this project. A total of two PL monitoring points and one IP monitoring point are utilized to measure the dam body deformation relative to the dam foundation. ent elevations.
Deformation is the most intuitive monitoring indicator of structural behavior changes in arch dams. A plumb line system is utilized to monitor the horizontal displacement of the dam body and its foundation in different elevations. Figure 5 shows the typical monitoring point layout of PL and IP in this project. A total of two PL monitoring points and one IP monitoring point are utilized to measure the dam body deformation relative to the dam foundation.

Data Collection and Preprocessing
In this study, actual temperature monitoring data was used for data-driven model construction. To monitor the actual temperature field distribution and the temperature changes in different parts, a series of thermometers are embedded. In this study, the actual

Data Collection and Preprocessing
In this study, actual temperature monitoring data was used for data-driven model construction. To monitor the actual temperature field distribution and the temperature changes in different parts, a series of thermometers are embedded. In this study, the actual monitoring data from a total of 30 thermometers was utilized for model construction. Figure 6 shows the process line diagram of environmental variables (i.e., water level and temperature) from 2006 to 2018. It can be seen from the figure that both water level and temperature data show regular changes in the annual cycle. However, it can also be seen that the monitoring data of different thermometers have significant differences in amplitude, which is mainly due to the differences in their buried positions and corresponding monitoring targets. Figure 7 shows the visual display of dam displacement time series of three typical monitoring points, including PL01, PL02, and IP01. It can be seen from the figure that the fluctuation of the displacement monitoring data of the PL is significantly larger than that of the inverted vertical line, which is mainly due to the monitoring of the displacement of the dam body and the displacement of the dam foundation monitored by the IP.

Experiment Environment Setting and Parameter Tuning
The software environment of the experiment and the configuration of the corresponding parameters is shown as follows. The proposed BO-LGB and benchmark methods were coded based on Python, and all the experiments were implemented on a PC server. The server configuration is Intel 7700 k,1 GPU is Nvidia GTX1070, and memory is 16 GB.
The selection of setting parameters will significantly affect the model performance of LGB. Table 1 shows the main six parameters of LGB that mainly determine the fitting capability of LGB and the corresponding parameter optimization scale. The details about these parameters can be seen as follows. n_estimator determines the depth of the tree, and a high value can enhance the model learning capability, but a too large value may also lead to model overfitting phenomenon. max_depth controls the maximum depth of the tree, which is capable of handing model overfitting. The parameter setting of f eature_ f raction_rate determines the subsampling of features. The combination implementation of both f eature_ f raction_rate and bagging_ f raction can accelerate the model calculation process and reduce overfitting. To further evaluate the model performance of the proposed method, a series of statistical and ML-based methods were utilized as the benchmark methods. These methods include the HST model, support vector machine (SVM), artificial neural network (ANN), random forest regression (RF), and gaussian process regression (GP). It should be noted that, except for the HST model, the input variables of the other benchmark methods were based on HTT models, i.e., hydraulic, thermal, and time-varying variables. The random search optimization algorithm was used to find the optimal parameters and training and validation data were the same as the proposed method.
In this study, three quantitative evaluation indicators, including correlation coefficients (R 2 ), mean absolute error (MAE), and root mean squared error (RMSE) were introduced to assess the prediction performance of the proposed and the benchmark models. The formulas of these indicators can be represented as follows.
whereŷ i is the predicted value of the i-th sample, y i is the corresponding actual value for total n samples, and y = 1 n ∑ n i=1 y i .

Project Description
In this study, two PL monitoring points and one IP monitoring point were used to validate the model prediction capability of the proposed model. A BO parameter tuning strategy was used to determine the optimal parameters of LGB. Figure 8 shows the visual display of BO optimization results for the three monitoring points. Table 2 shows the corresponding parameter optimization results. It can be seen from the table that the optimal parameter can be obtained after 25, 84, and 27 iterations for monitoring points PL1, PL2, and IP1. The correlation coefficients in the validation sets are 0.9446, 0.9853, and 0.8608, respectively.

Project Description
In this study, two PL monitoring points and one IP monitoring point were used to validate the model prediction capability of the proposed model. A BO parameter tuning strategy was used to determine the optimal parameters of LGB. Figure 8 shows the visual display of BO optimization results for the three monitoring points. Table 2 shows the corresponding parameter optimization results. It can be seen from the table that the optimal parameter can be obtained after 25, 84, and 27 iterations for monitoring points PL1, PL2, and IP1. The correlation coefficients in the validation sets are 0.9446, 0.9853, and 0.8608, respectively.

Model Generalization Capability Evaluation
In this study, the model performance evaluation was mainly carried out considering three parts: prediction accuracy for both short-term and long-term prediction periods, and the prediction efficiency for large-scale monitoring data.

Short-Term Prediction Performance Evaluation
Short-term prediction of dam displacement is an important basic work for dam safety management. To verify the predictive performance of the proposed model, a series of comparative models, including RD_LGB, HST, ANN, SVM, RF, and GP, were used as the benchmark methods. Table 3 shows the quantitative evaluation comparison of the proposed and comparative methods in three monitoring points. It can be inferred from the table that the proposed method shows better short-term prediction performance in terms of all three quantitative evaluation indicators. Moreover, it can be seen that the prediction accuracy of the BO-LGB model is significantly higher than the RD-LGB model. This indicates that the BO algorithm can easily find the optimal solution under a limited number of iterations. Figure 9 shows the visual comparison of the prediction results of the proposed and comparative method. From the figure, it can be seen that the prediction values of the BO-LGB model are closer to the actual observed values.

Model Generalization Capability Evaluation
In this study, the model performance evaluation was mainly carried out considering three parts: prediction accuracy for both short-term and long-term prediction periods, and the prediction efficiency for large-scale monitoring data.

Short-Term Prediction Performance Evaluation
Short-term prediction of dam displacement is an important basic work for dam safety management. To verify the predictive performance of the proposed model, a series of comparative models, including RD_LGB, HST, ANN, SVM, RF, and GP, were used as the benchmark methods. Table 3 shows the quantitative evaluation comparison of the proposed and comparative methods in three monitoring points. It can be inferred from the table that the proposed method shows better short-term prediction performance in terms of all three quantitative evaluation indicators. Moreover, it can be seen that the prediction accuracy of the BO-LGB model is significantly higher than the RD-LGB model. This indicates that the BO algorithm can easily find the optimal solution under a limited number of iterations. Figure 9 shows the visual comparison of the prediction results of the proposed and comparative method. From the figure, it can be seen that the prediction values of the BO-LGB model are closer to the actual observed values.

Long-Term Prediction Performance Evaluation
With the increase of the dam service period, the monitoring data related to th operation period continues to increase. Table 4 shows the quantitative result eval of the proposed and comparative methods in dam long-term displacement pred Figure 10 shows the intuitive display of the proposed and comparative methods in term prediction. It can be inferred that the proposed BO-LGB shows significa vantages in dam displacement sequence long-term prediction. Thust is can be con that, benefiting from the comprehensive application of the BO algorithm and the fiv cross-validation technology, the proposed model can fully mine the underlying mation related to dam displacement changes in the limited monitoring data.

Long-Term Prediction Performance Evaluation
With the increase of the dam service period, the monitoring data related to the dam operation period continues to increase. Table 4 shows the quantitative result evaluation of the proposed and comparative methods in dam long-term displacement prediction. Figure 10 shows the intuitive display of the proposed and comparative methods in longterm prediction. It can be inferred that the proposed BO-LGB shows significant advantages in dam displacement sequence long-term prediction. Thust is can be concluded that, benefiting from the comprehensive application of the BO algorithm and the five-fold crossvalidation technology, the proposed model can fully mine the underlying information related to dam displacement changes in the limited monitoring data.

Model Interpretability Assessment
A significant feature of the HTT model is the high dimension of the factor. Figure 11 shows the visual display of factor importance in input variables for the three monitoring points. It can be seen from the figure that the water level factor is the most important factor affecting the dam displacement changes for all three monitoring points. Moreover, temperature data has a significant impact on the dam displacement changes. However, the monitoring points embedded in different positions are affected by the temperature variation. Thus, it is desirable to select the thermometers with a high relationship with dam deformation according to the interpretation results of BO-LGB.

Model Interpretability Assessment
A significant feature of the HTT model is the high dimension of the factor. Fig shows the visual display of factor importance in input variables for the three moni points. It can be seen from the figure that the water level factor is the most important affecting the dam displacement changes for all three monitoring points. Moreover perature data has a significant impact on the dam displacement changes. Howev monitoring points embedded in different positions are affected by the temperature tion. Thus, it is desirable to select the thermometers with a high relationship with deformation according to the interpretation results of BO-LGB.

Conclusions
In this study, a dam displacement prediction, monitoring, and interception m was proposed based on LGB and BO algorithms. Different from the conventiona model, the proposed method directly takes the prototypical dam environmental mo ing data as the input variables.
LGB is combined with the BO algorithm to build a deformation monitoring and interpretation model using long-term prototypical mo ing data. The main contributions of this paper are summarized as follows.
1. The proposed BO-LGB model shows strong capability when dealing with the term dam monitoring data both in modeling accuracy and efficiency; 2. The proposed method achieves remarkable performance in a variety of dam dis ment prediction scenarios (both in short-term prediction and long-term predic 3. The proposed method can analyze the main factors affecting dam displace changes based on prototypical monitoring data. However, some limitations should also be addressed. First of all, the research of this study was a concrete dam, but other dam types like earth-rock dams or face ro dams should also be used as research items. Secondly, since dam displacement is a certain process affected by many factors, the stimulation of the uncertainty is an imp research content. There is a certain degree of correlation between different dam mo ing points. The main research in the future is to consider the correlation between multiple toring points and other types of dams.  H1  T5  t1  T13  T26  t2  H2  T16  T1  T11  T20  T10  T25  T30  H3  T3  T27  H4  T2  T4  T6  T7  T8  T9  T12  T14  T15  T17  T18  T19  T21  T22  T23  T24  T28  T29  Factor contribution rate (c) Figure 11. Factor importance assessment results of three typical monitoring sites: (a-c) denotes monitoring points PL1, PL2, and IP01.

Conclusions
In this study, a dam displacement prediction, monitoring, and interception model was proposed based on LGB and BO algorithms. Different from the conventional HST model, the proposed method directly takes the prototypical dam environmental monitoring data as the input variables.
LGB is combined with the BO algorithm to build a dam deformation monitoring and interpretation model using long-term prototypical monitoring data. The main contributions of this paper are summarized as follows.

1.
The proposed BO-LGB model shows strong capability when dealing with the longterm dam monitoring data both in modeling accuracy and efficiency; 2.
The proposed method achieves remarkable performance in a variety of dam displacement prediction scenarios (both in short-term prediction and long-term prediction); 3.
The proposed method can analyze the main factors affecting dam displacement changes based on prototypical monitoring data.
However, some limitations should also be addressed. First of all, the research object of this study was a concrete dam, but other dam types like earth-rock dams or face rockfill dams should also be used as research items. Secondly, since dam displacement is an uncertain process affected by many factors, the stimulation of the uncertainty is an important research content. There is a certain degree of correlation between different dam monitoring points. The main research in the future is to consider the correlation between multiple monitoring points and other types of dams.

Conflicts of Interest:
The authors declare no conflict of interest.