Safety Monitoring Method for the Uplift Pressure of Concrete Dams Based on Optimized Spatiotemporal Clustering and the Bayesian Panel Vector Autoregressive Model

: To establish a safety monitoring method for the uplift pressure of concrete dams, spatiotemporal information from monitoring data is needed. In the present study, the method of ordering points to identify the clustering structure is employed to spatially cluster the uplift pressure measuring points at different locations on the dam; three distance indexes and two clustering evaluation indexes are used to realize clustering optimization and select the optimal clustering results. The Bayesian panel vector autoregressive model is used to establish the uplift stress safety monitoring model for each category of monitoring point. For a nonstationary sequence, the difference method is selected to ensure that the sequence is stable, and the prediction is carried out according to the presence or absence of exogenous variables. The result is that the addition of exogenous variables increases the accuracy of the model’s forecast. Engineering examples show that the uplift pressure measurement points on the dam are divided into seven categories, and classification is based mainly on location and influencing factors. The multiple correlation coefficients of the training set and test set data of the BPVAR model are more than 0.80, and the prediction error of the validation set is lower than that of the Back Propagation neural network, XGBoost algorithm, and Support Vector Machines. The research in this paper provides some reference for seepage monitoring of concrete dams.


Introduction
The long-term health and service of concrete dams is important for the safety of water control and is an important public safety issue related to economic life and social stability [1][2][3].With the continuous construction of concrete dam projects, the geological conditions of dam site areas have become increasingly complex.To prevent engineering accidents caused by structural damage, it is necessary to adopt an appropriate dam safety monitoring strategy [4].In monitoring concrete dams for safety, the seepage safety of dam foundations is an important issue.According to statistics [5], a large proportion of concrete dam failures are caused by dam foundation seepage problems.The Bouzey gravity dam in France [6], the Austin gravity dam in the United States, and the Malpasset dam in France were affected by dam break accidents caused by seepage in the dam foundation.In dam seepage monitoring, commonly employed mathematical models include statistical models, hybrid models, fuzzy mathematical models, and time series models [7].These models are single-point monitoring models.When the number of measuring points is large, the possibility of false alarms greatly increases [8], and the spatial distribution information of the monitored quantities is not considered.
In recent years, to change the previous dam safety monitoring modelling methods from "point" analysis to "area" analysis, scholars have successively proposed methods such as spatiotemporal distribution models, principal component analysis, multioutput machine learning models, and panel data models.By introducing the coordinates of observation points as influencing factors, Gu et al. [9] formulated a spatiotemporal distribution model for arch dam deformation.Based on the single surveying point deformation monitoring theory, Wei et al. [10] established a space-time distribution model by introducing spatial coordinates and using the finite element method (FEM) to calculate the hydraulic component.Cheng et al. [11] successfully separated environmental effects and noise interference from monitoring data by analyzing the covariance matrix of the multi-dimensional monitoring data of dams.Building upon this analysis, they proposed two multivariate dam safety monitoring models.Popescu et al. [12] proposed unconventional technology based on blind source separation for main building monitoring and dam monitoring.Zhu et al. [13] used the data collected by the dam monitoring automation system to propose a least squares Support Vector Machine method, combining phase space reconstruction and a Bayesian framework for the defects of previous monitoring data verification methods in verifying the effectiveness of monitoring the physical quantity data.Xu [14] used Support Vector Machines and Relevance Vector Machines as research objects and constructed a dual-objective optimization prediction model of super-high arch dam displacement that integrates the spatial correlation of deformation by optimizing key parameters.Hu et al. [15] proposed a partition deformation prediction model for super-high arch dams based on a principal component hierarchical clustering method and panel data model.Based on the clustering method in the field of spatio-temporal data mining, Hu et al. [16] extracted the similarity characteristics of deformation sequences and established a cluster analysis model of high concrete dam deformation measuring points based on panel data analysis method.Wang et al. [17] created and validated a mixing coefficient panel model of dam displacement at multiple monitoring points.
The above time-space model of dam safety monitoring needs to predict the effect according to a forecast factoring environmental variables.When there is an absence of environmental variables or when the selection of environmental forecasting factors is difficult, a time series model can be utilized for analysis.On the other hand, the panel data model of a time series can capture the dynamic changes of data in time and space, and the spatiotemporal forecasting effect is good.A vector autoregressive (VAR) model is combined with panel data to form a panel data-based vector autoregressive (PVAR) model, which is a breakthrough from the planar to space-based time series model.The model can consider the relationship between multiple variables at the same time and has a wide range of applications [18][19][20].The benefit of employing multivariate modeling is that more accurate forecasting results can be obtained by pooling the data instead of only using the data of a single series [21].Under normal circumstances, the least squares method, method of moments, and maximum likelihood estimation methods are utilized for parameter estimation in the PVAR model.Pesaran [22] noted that due to cross-sectional heterogeneity, conventional estimation techniques are no longer suitable for panel data.Zellner [23] and Canova et al. [24] employed Bayesian estimation methods for the PVAR model.Assuming prior information, the posterior distribution of the model is derived using the Gibbs sampling method, yielding estimates for the parameters, and the prediction analysis involving multiple periods in the future can be realized [25].Compared with the traditional estimation method, the Bayesian panel vector autoregressive model is better in its mathematical properties and has less parameter estimation advantages when considering the spatial and temporal information of panel data.
This paper proposes a safety monitoring model that considers space-time information on the uplift pressure of a dam foundation.First, to identify the clustering structure [26], the ordering points are employed.The clustering method performs spatial clustering analysis on the uplift pressure monitoring data, calculates the distance matrix using different distance indicators, and selects the optimal clustering result based on the evaluation of the clustering indicators.Second, the stationarity test and optimal lag order calculation are carried out on the panel data of various measuring points, and the BPVAR model with exogenous variables is used to establish a safety monitoring model for various measuring Water 2024, 16, 1190 3 of 21 points.These steps address the problems where the temporal and spatial ranges covered by the monitoring data of a single pressure measuring tube are limited, that they only reflect the local seepage behavior at the location of the measuring point, and that the temporal and spatial laws of the uplift pressure of the dam foundation described are not uniform and coordinated.Lastly, an engineering example is selected to verify the application effect of the uplift pressure safety monitoring model proposed in this paper.

Time Series Similarity Measure
At present, commonly used time series difference measurement methods are mainly divided into two types: distance measurement and similarity measurement.Generally speaking, the distance function needs to satisfy the properties of non-negativity, symmetry, triangle inequality, the distance to itself is 0, and the size of the distance should be proportional to the degree of difference between sequences.Common distance measurement methods mainly include Euclidean distance, Manhattan distance, Mahalanobis distance, and so on.In contrast to the distance metric, the value of the similarity metric is inversely proportional to the difference.The most commonly utilized similarity methods include the Pearson correlation coefficient and Bharbyian distance.This paper describes three methods: (1) Cosine similarity [27] Cosine similarity is a method to measure the similarity between two vectors by calculating the cosine value between them.When the cosine similarity is 0, they are linearly independent; when the cosine similarity is 1, they are completely similar.The calculation formula is presented as follows: where x and y are n-dimensional vectors and are the i-th dimension data of vectors x and y, respectively.A smaller value represents a higher similarity.In contrast, a larger value represents a weaker similarity.
(2) Bilateral slope distance [28] Typically, the calculation of the vertical distance between two points relies solely on Manhattan or Euclidean distances, disregarding their shape characteristics.However, shape similarity is crucial in determining the matching mode of similar points.Relying solely on vertical distance may result in incorrect matching.The slope of a line segment connecting two points serves as a significant shape feature.Bearing this characteristic in mind, Hossein and Abbas et al. [29] proposed the utilization of bilateral slope distance as an alternative to the conventional distance metric, employing it to denote the slope.The bilateral slope distance is calculated based on the Euclidean distance and the slope distance of each segment, and the slopes of the sections on both sides are considered.In the time series TS = [x 1 ,x 2 , . ..,xL ], the value measured on a straight line is defined and calculated as follows: where l ∈ [1, 2, . ..,L]; t l+1 and t l are the corresponding time nodes for x l+1 and x l , respectively.Two matrices of measurement points are introduced: TS 1 = [x 1 1 , x 1 2 , . .., x 1 n ] and TS 2 = [x 2  1 , x 2 2 , . .., x 2 m ].The calculation formulas are presented as follows: where x 1 i and x 2 j are the TS 1 i a of the i-th item and j-th item, respectively; sin θ 1 i and sin θ 2 j are the x 1 i right slope and x 2 j right slope, respectively; and sin θ (3) Dynamic Time Warping (DTW) [30] First, two time series, Q = [q 1 , q 2 , . .., q m ] and C = [c 1 , c 2 , . .., c n ], are introduced and arranged into an m × n matrix.Each point (i, j) in the matrix represents the distance measure of q i and c j .In this paper, the absolute distance is used to calculate After constructing the matrix, a bending path is found by dynamic programming to minimize the cumulative distance between time series Q and C. The curved path W = {w 1 , w 2 , • • • , w k } is a grid point sequence, where K satisfies (max(m,n) ≤ K ≤ m + n − 1), and a mapping function is defined l w : (Q, C)→W.In this way, the correspondence between Q and C becomes a curved path, where the k-th element of the curved path is The curved path W needs to satisfy the following properties: (1) Bounded condition: (2) Continuity: (3) Monotonicity: After satisfying the distance calculation and bending path, the DTW distance of Q and C can be calculated.This distance represents the cumulative distance of the best alignment path obtained by dynamic warping, which is used to measure the similarity between the two sequences.
In the dynamic time warping algorithm, the cumulative distance of the curved path is calculated by the recursive relationship.The cumulative distance of each point can be expressed by the following formula: where y(i, j) is the cumulative distance of column j, row i, d(i, j) is the distance measure between the time series q i and c j , and min{γ(i − 1, j − 1), γ(i − 1, j), γ(i, j − 1)} are the minimum values of the cumulative distance between the three adjacent spots.

OPTICS Clustering Algorithm
The purpose of OPTICS is to perform clustering based on density, and OPTICS is an improved version of DBSCAN (density-based spatial clustering of applications with noise).In contrast to the DBSCAN algorithm, the OPTICS algorithm does not directly generate clustering results.Instead, it produces a cluster ordering for each point in the sample set, reflecting the density of data points and their distance to the nearest cluster center.The principle of the OPTICS algorithm is to start from a core sample in the sample set and obtain all the sample points related to it to generate a cluster.The advantages of the OPTICS algorithm are that it is insensitive to input parameters and is more suitable for use on large datasets.
The OPTICS algorithm needs two input parameters: the neighbourhood radius of the sample point and the minimum number of points (MinPts) within the neighbourhood radius.According to these two input parameters, the density of a sample point can be calculated, and based on the density, adjacent sample points with similar densities can be determined to be the same cluster.When at least one of the MinPts sample points is contained within the neighbourhood radius of the sample point, the sample point is referred to as a core point (set), and the set of all the core points is referred to as the core set.When the core point is not classified, it is put into the seed set (seeds).The core point satisfies the following condition: where N ε (x) is the sample point and xε is the number of neighbouring points in the neighbourhood.
The core distance of sample point x is defined as follows: The core distance of a sample point x is the minimum radius threshold that makes x a core point.When x is not a core point, the core distance is not defined.
The reachable distance of sample point y is defined as follows: If the distance from point y to the core point x exceeds the core distance of x, the reachable distance of point y is the actual distance from point y to point x.Contrarily, the reachable distance of point y is equal to the core distance of point x.
As shown in Figure 1, we assume that the initial parameter sets the minimum number of points in the neighborhood radius MinPts to three.At point Pε, if the count of neighboring points within the neighborhood radius exceeds three, then point P is marked as the core point and its core distance is the third closest point to it q 3 .The distance between this point and point P is cd P = distance (P,q 3 ) .The distance from P is less than cd p , q 1 , and q 2 .The reachable distance is the core distance P; that is, rd (P,q 1 ) = cd P , and the distance from P is greater than cd p of q 4 and q 5 .The reachable distance is the distance between them and P; that is, rd (P,q 4 ) = d (P,q 4 ) .

Clustering Index Evaluation
The clustering index can be roughly divided into two categories: one is the "external index", where the clustering results are evaluated by comparing the clustering results with the known models; the other category is "internal indicators", which directly check the clustering results.In this paper, two internal indicators are used to evaluate the clustering results.

Clustering Index Evaluation
The clustering index can be roughly divided into two categories: one is the "external index", where the clustering results are evaluated by comparing the clustering results with the known models; the other category is "internal indicators", which directly check the clustering results.In this paper, two internal indicators are used to evaluate the clustering results.
(1) Silhouette coefficient [31] The silhouette coefficient combines the similarity between the sample and the cluster to which it belongs and the dissimilarity with the nearest other clusters.The formula is as follows: where a is the average distance of the samples in the cluster and b is the average distance of the samples between clusters.For S, the value is between −1 and 1, and the closer to 1, the better the clustering result.
(2) Calinski-Harabasz index [32] The essence of the Calinski-Harabasz index is the ratio of inter-cluster distance to intra-cluster distance.Its calculation process is similar to the calculation of variance, so it is also called the variance ratio criterion.The formula is as follows: where k is the number of clusters, n is the total number of data points, BCSS (betweencluster sum of squares) is the weighted sum of squares between each cluster centroid and the overall data centroid, and WCSS (within-cluster sum of squares) is the data point and its respective sum of squares of the Euclidean distance between the cluster centroids.A higher value usually indicates a better clustering effect.

Unit Root Test for Panel Data
The unit root test is a commonly employed hypothesis testing method for testing the stationarity of time series data.If there is a unit root, it is a nonstationary series; if there is not, it is a stationary series.To verify the panel monitoring data of the piezometer, whether to include a unit root, the following panel autoregressive model is used [33]: where i = 1, 2, M; t = 1, 2, M; ρ i represents the autoregressive coefficient; z ′ i,t γ i represents the size of the individual effect; and ε i,t is the error term.
In view of the possible autocorrelation of the error term in Equation ( 16), Levin et al. [34] proposed the Levin-Lin-Chu test method to test whether the panel monitoring data of the pressure measuring tube contains the unit root.
where δ is the autoregressive coefficient, θ ij is the statistic, and p i is the lag order of the model.The LLC test requires that the δ values of the individuals are equivalent.This prerequisite is difficult to achieve in actual situations, which is a shortcoming of the LLC test.In order to solve this problem, Im et al. [35] proposed the Im-Parasram-Shin unit root test method.The test performed by the IPS is a Lagrangian multiplier test [36]: where δ i is the autoregressive coefficient.The Fisher-type test is a statistical test method which is usually used to compare the goodness of fit of two or more models.We used the four methods proposed by Choi [37] to test whether the panel monitoring data contain the unit root and synthesize the individual p values into Fishers' statistics.Using one of the four methods of "inverse chi-square change".
where T i represents the time dimension of measuring point i.Due to the negative sign, the larger the P statistic, the more inclined it is to reject the null hypothesis of the "panel unit root".
For the analysis and forecasting of nonstationary time series, some processing needs to be performed to make them stationary.The commonly employed processing methods for nonstationary time series include the following: (1) Difference method The difference method refers to performing first-order or multi-order differences on a nonstationary time series to obtain a stationary time series.The first-order difference usually refers to the difference between two adjacent terms and is calculated as follows: Multiple-order differencing can be sequentially performed until the series satisfies stationarity.
(2) Seasonal difference method If the time series has seasonality, it can be processed by using the seasonal difference method.The seasonal difference usually refers to the difference between two adjacent terms in each season, and the formula is presented as follows: where f represents the length of the season.The seasonal difference can be iterated until the series satisfies stationarity.
(3) Sliding average method [38] The moving average method computes the mean value of the time series within the moving window to smooth out the noise and trend.The sliding average is calculated by methods such as the simple moving average and weighted moving average.

Test of Lag Order on Panel Data
The lag order selection of panel data is important in panel data analysis because the selection of too high a lag order may lead to excessive complexity of the model, resulting in overfitting and model distortion.Too low may lead to information loss or residual autocorrelation.Therefore, for the optimal lag order test of panel data, scholars usually propose some information criteria to avoid over-fitting problems.The commonly applied criteria are the Akaike information criterion, Bayesian information criterion, and Hannan-Quinn information criterion.
The AIC [39] serves as a standard for assessing the goodness of fit of a model, expressed as follows: AIC = 2k − 2ln(L) (22) where k represents the number of parameters in the model and L denotes the likelihood function value of a given model.
Both the BIC and AIC are statistical metrics used for model selection and comparison.The main difference between the AIC and BIC is the degree to which they penalize model complexity.The AIC imposes a lighter penalty on model complexity, while the BIC imposes a heavier penalty.Thus, when selecting a model, the BIC is more likely to choose a simpler model, thus avoiding overfitting.The formula is as follows [40]: where b is the sample size.HQIC is similar to the AIC in model selection, considering the balance between goodness of fit and model complexity.Compared with the AIC, HQIC imposes stricter penalties on model complexity when the sample size is small, so it may be more suitable for model selection in some cases.The formula is as follows [41]:

Bayesian Estimation of PVAR
The general form of the panel vector autoregressive model is as follows: where y i,t is a c × 1 vector, which represents the c endogenous variables of the measuring point i at the time point t; A e ij,t is an n × n coefficient matrix, which represents the response of measuring point i to the e-th lag term of measuring point j at time t; x t is an m × 1 vector, representing exogenous variables; C i,t is an n × m coefficient matrix, which represents the correlation between endogenous variables and exogenous variables; and ε i,t is the n × 1 residual error vector of measuring point i.In the present study, the panel data of uplift pressure measuring points were input into the model as endogenous variables, and the upstream water level, precipitation, temperature, and timeliness were input as exogenous variables.After adding exogenous variables, the form is expressed as follows: where U i,t is the coefficient matrix relating the endogenous variables to the exogenous variables; x t is the water level factor {H u1 , H u2 , H u3 , H u4 , H u5 , H u6 , H d }, rainfall factor {P 1 , P 2 , P 3 , P 4 , P 5 , P 6 }, temperature factor {sin ( 2πl 365 ), cos ( 2πl 365 ), sin ( 4πl 365 ), cos ( 4πl 365 )}, and ageing factor {σ, ln σ}.An m × 1 exogenous vector consisting of H u1 , H u2 , H u3 , H u4 , H u5 , and H u6 is the reservoir water level on the observation day, the average reservoir water level on the first day, two days before, three to four days before, five to fifteen days before, and sixteen to thirty days before the observation day, respectively.H d is the downstream water level on the corresponding date; P 1 , P 2 , P 3 , P 4 , P 5 , and P 6 is the precipitation on the observation day and the average precipitation on the first day, two days before, three to four days before, five to fifteen days before, and sixteen to thirty days before the observation day, respectively; l is the number of days; and σ is the number of days from the initial stage of water storage or engineering measure divided by 100; that is, to increase by 1.0 for every 100 days [1].
Due to the complexity of the general form in practice, Zellner et al. [42] proposed an alternative approach, employing a hierarchical prior identification scheme, which essentially follows the method outlined by Jarocinski [43].In the alternative method proposed by Zellner et al. [42] the only estimated parameter is β.Other fundamental parameters are assumed to be known, including the group of residual covariance matrices Σ i and the vector autoregressive coefficient b, Σ b .The posterior distribution of the model is as follows: where π(β, b, Σ b , Σ |y ) is the complete posterior distribution, π(y|β, Σ) is the likelihood function, π(β|b, Σ b ) is the conditional prior distribution, π(b)π(Σ b ) is two overarching priors, and π(Σ) is the prior.The π(y|β, Σ) is as follows: The prior distribution of Σ i is the classical diffusion prior, which is given by the following formula: (29) The method provided by the Gibbs sampler is the basis for establishing the model [44].Hence, it is imperative to derive the posterior distribution of parameters β i , b, Σ b , and Σ i .The conditional distribution of β i is represented as follows, with any term not involving β i being treated as a proportionality constant: where β −i is used to represent all β coefficients minus β i the collection of variables.
The conditional distribution of b is represented as follows, with any term not involving b being treated as a proportionality constant: where β m is the arithmetic mean of vector β i .The conditional distribution of Σ b is represented as follows, with any term not involving Σ b being treated as a proportionality constant: where The conditional distribution of Σ i is represented as follows, with any term not involving Σ i being treated as a proportionality constant: where

Building Method of the Concrete Dam Uplift Pressure Safety Monitoring Model
A flowchart of the uplift pressure safety monitoring method for concrete dam foundations based on the OPTICS clustering method and BPVAR model proposed in this paper is shown in Figure 2, and the main steps are listed as follows: (1) The uplift pressure monitoring data sample set D and the neighbourhood radius at each measuring point are input.The minimum number of points in the neighbourhood radius MinPts.(2) The distance matrix is calculated based on the DTW, cosine similarity, and bilateral slope distance.(3) Based on the matrix calculated in (2), the OPTICS algorithm is used for clustering.(4) Spatial clustering results for different distance matrices using the clustering index silhouette coefficient and variance ratio criterion and the results with the silhouette coefficient closest to one and the largest Calinski-Harabasz index were selected.This result was the optimal clustering result.The uplift pressure measuring points with similar heights were utilized to create panel data.
(5) The stability of each type of uplift pressure measuring point's panel data is assessed through the application of LLC, IPS, and ADF-Fisher methods.( 6) If a series is nonstationary, the difference method is used to convert it to a stationary series.(7) According to Equations ( 22)-( 24), the order of the model was determined by using the AIC, BIC, and HQIC, and the minimum information criterion was utilized to ascertain the optimal lag order of the model.( 8) Whether there is monitoring data of exogenous variables in the data is determined.
If so, the exogenous variables (water level, precipitation, temperature, and time) are entered to establish the model according to Equation ( 26); otherwise, the model is created according to Equation ( 25). ( 9) By using the Gibbs sampling method to infer the posterior distribution of the model parameters, the fitting results of the uplift pressure monitoring data are obtained from the posterior probability distribution of the model parameters.The model uses onestep advance forecasting.For the case of no exogenous variables, the forecast result is calculated according to Equation (25), and it consists mainly of two parts: endogenous variables and the residual vector.The number of lag terms of endogenous variables is calculated by the optimal lag order determined.For the presence of exogenous variables, the forecast result is calculated according to Equation ( 26) and is composed of three parts: endogenous variables, exogenous variables, and residual vectors.The number of lag terms of endogenous variables is determined by the optimal lag order.The prediction interval of the BPVAR model represents a 95% confidence interval.

Project Overview
The water retention system of the hydropower station consists of a roller-compacted concrete gravity dam reaching a maximum height of 113.0 m, with a 308.5 m overall length of the dam crest, and a dam crest elevation of 179.0 m.Its main task is to generate electricity.The uplift pressure holes in the dam foundation are distributed in two areas: the first area is in the vertical foundation corridor, and the second area is in the horizontal corridor.

Engineering Examples 4.1. Project Overview
The water retention system of the hydropower station consists of a roller-compacted concrete gravity dam reaching a maximum height of 113.0 m, with a 308.5 m overall length of the dam crest, and a dam crest elevation of 179.0 m.Its main task is to generate electricity.The uplift pressure holes in the dam foundation are distributed in two areas: the first area is in the vertical foundation corridor, and the second area is in the horizontal corridor.The UP1~UP16 measuring points are located in the first area, and the UP17~UP25 side points are located in the second area.There are a total of 25 measuring points, as shown in Figure 3.The UP8, UP10, UP12, UP15, and UP16 measuring points lost more data and so did not appear.The values measured at all the points included manual and automated values.The period from November 2002 to November 2008 was the time series of automated monitoring, and the monitoring frequency was once a day.The dam is located in Yongding County, Fujian Province.The dam site is in the middle of the cotton beach canyon section of the main stream of the Tingjiang River.The valley of the dam site is narrow, a "V"-shaped valley with basically symmetrical terrain, and the mountains on both sides are strong.The bedrock is early Yanshanian biotite granite with medium-fine grain structure and massive structure, and the slightly weathered rock is dense and hard.There are also granite porphyry veins, diorite lamprophyre veins, and multiple sets of faults in the rock mass.The rock mass of the bank slope of the river valley is seriously weathered, except for the whole, strong, weak, and slightly weathered zones, and has the characteristics of spherical and interlayer weathering.There are many boulders left in the weathered rock, and the permeability of the rock mass is weak.The engineering geological conditions for dam construction are good.Yongding County, Fujian Province.The dam site is in the middle of the cotton beach canyon section of the main stream of the Tingjiang River.The valley of the dam site is narrow, a "V"-shaped valley with basically symmetrical terrain, and the mountains on both sides are strong.The bedrock is early Yanshanian biotite granite with medium-fine grain structure and massive structure, and the slightly weathered rock is dense and hard.There are also granite porphyry veins, diorite lamprophyre veins, and multiple sets of faults in the rock mass.The rock mass of the bank slope of the river valley is seriously weathered, except for the whole, strong, weak, and slightly weathered zones, and has the characteristics of spherical and interlayer weathering.There are many boulders left in the weathered rock, and the permeability of the rock mass is weak.The engineering geological conditions for dam construction are good.

Spatial Cluster Analysis
In the above piezometric tubes, the monitoring series UP1~UP7, UP9, UP11, UP13, UP14, and UP17~UP24 cover more than one year, and the data from these measuring points are reliable.Therefore, the OPTICS clustering method was selected to spatially analyze the above 20 measuring points.For the cluster analysis, the interval was between 1 January 2004 and 31 December 2008.A total of 1553 data points for each piezometric tube were included in the cluster analysis.Figure 4 shows the correlation analysis diagram for the 20 measuring points, and Figure 5 shows the 20 measuring points.The smallest cumulative distance map of the measuring points was constructed.The uplift pressure monitoring data of the 20 piezometers were clustered by the OPTICS density clustering method using the distance matrix calculated by the three distance indicators (cosine similarity, bilateral slope distance, and DTW), and a visualization diagram of the clustering results was obtained, as shown in Figure 6.Table 1 lists the evaluation indicators of the three clustering results.The silhouette coefficient uses the value of −1~1, and the value of the

Spatial Cluster Analysis
In the above piezometric tubes, the monitoring series UP1~UP7, UP9, UP11, UP13, UP14, and UP17~UP24 cover more than one year, and the data from these measuring points are reliable.Therefore, the OPTICS clustering method was selected to spatially analyze the above 20 measuring points.For the cluster analysis, the interval was between 1 January 2004 and 31 December 2008.A total of 1553 data points for each piezometric tube were included in the cluster analysis.Figure 4 shows the correlation analysis diagram for the 20 measuring points, and Figure 5 shows the 20 measuring points.The smallest cumulative distance map of the measuring points was constructed.The uplift pressure monitoring data of the 20 piezometers were clustered by the OPTICS density clustering method using the distance matrix calculated by the three distance indicators (cosine similarity, bilateral slope distance, and DTW), and a visualization diagram of the clustering results was obtained, as shown in Figure 6.Table 1 lists the evaluation indicators of the three clustering results.The silhouette coefficient uses the value of −1~1, and the value of the silhouette coefficient based on the DTW distance is closest to 1.The variance ratio criterion is in the range of 0~∞, and the variance based on the DTW distance is the maximum.Therefore, overall, the clustering result based on the DTW distance was the best.Based on the results of clustering evaluation indicators, the clustering results of OPTICS based on the DTW distance prevailed in the present study when the uplift pressure measuring points of the dam foundation were divided into seven categories.The water level of the piezometer for each type of measuring point is shown in Figure 7. piezometer for each type of measuring point is shown in Figure 7. Table 2 is the classification table of seven types of measuring points.piezometer for each type of measuring point is shown in Figure 7. Table 2 is the classification table of seven types of measuring points.

Category I UP1, UP2, UP3
The measuring point is located in front of the grouting curtain in the same dam section (dam Section 6).Category II UP4, UP5, UP13, UP14 The measuring point is adjacent to and arranged behind the grouting curtain.Category III UP6, UP7 The measuring point is located in the same dam section (dam Section 5) and near the right bank.Category IV UP9, UP11 The measuring point is located in the middle section of the dam and close to the riverbed.Category V UP17, UP18 The measuring points are located in the same lateral corridor (5 dam sections).Category VI UP19, UP20, UP21, UP23 The measuring point is located in the lateral corridor and near the upstream water level.

Category VII UP22, UP24
The measuring point is located in the lateral corridor and near downstream, which is greatly affected by the downstream water level.

BPVAR Model Construction 4.3.1. Stationary Test of Panel Data
For non-stationary panel data, the model estimation results may be biased.Therefore, before constructing the model, the unit root test should be performed on the panel data.Using the fourth type of measurement point as a reference, the data show a significant growth trend; at this time, it is a nonstationary time series.The processing method in the present study involves using the logarithmic difference in the variables, as shown in Figure 8, to convert the data into a stationary time series and then perform a stationarity test.In this study, the LLC, IPS, and ADF-Fisher tests are employed to examine the unit root of the panel data concerning uplift pressure.The specific test results are listed in Table 3.It can be seen from Table 3 that the p values are all less than 0.1, rejecting the null hypothesis of 'nonstationary panel data', so the panel data is stationary.The optimal lag order of the model was determined using the AIC, BIC, and HQIC criteria.The details are shown in Table 4.The optimal hysteresis order of the first type of monitoring point, fifth type of monitoring point, and sixth type of monitoring point is  The optimal lag order of the model was determined using the AIC, BIC, and HQIC criteria.The details are shown in Table 4.The optimal hysteresis order of the first type of monitoring point, fifth type of monitoring point, and sixth type of monitoring point is fourth; the optimal hysteresis order of the second type and fourth type of monitoring point is third; and the optimal hysteresis order of the third and seventh type of measuring points is second.The seventh type of measuring point data is adopted, and the fitting results of the model measuring points of UP22 and UP24 are shown in Figure 9.The multiple correlation coefficients between the fitted value and the measured value are 0.98 and 0.94, respectively, and the fitting effect is good.
ging several times, the overall tightness of the model is determined to be 0.5, the lag at tenuation parameter is 1, and the constant term is 0.
The seventh type of measuring point data is adopted, and the fitting results of th model measuring points of UP22 and UP24 are shown in Figure 9.The multiple correla tion coefficients between the fitted value and the measured value are 0.98 and 0.94, respec tively, and the fitting effect is good.The prediction results of the UP22 and UP24 models with and without the addition of exogenous variables are shown in Figure 10.Each measuring point in the test set was represented by 10 data samples.The prediction error, calculated as the difference between the predicted and actual values, was evaluated for each sample in the test dataset of the model with the addition of exogenous variables roughly fluctuated at approximately 0.1 m, indicating that the prediction accuracy of the BPVAR model improved after the addition of exogenous variables.In order to verify the accuracy of the BPVAR model prediction, a BPVAR model, BP (Back Propagation) neural network, XGBoost algorithm, and Support Vector Machine (SVM) are used to predict and analyze the uplift pressure of UP22 and UP24 measuring points, as shown in Figure 11.Based on the prediction outcomes, the BPVAR model demonstrates greater consistency between its predicted values and the actual measure- In order to verify the accuracy of the BPVAR model prediction, a BPVAR model, BP (Back Propagation) neural network, XGBoost algorithm, and Support Vector Machine (SVM) are used to predict and analyze the uplift pressure of UP22 and UP24 measuring points, as shown in Figure 11.Based on the prediction outcomes, the BPVAR model demonstrates greater consistency between its predicted values and the actual measurements compared to the other three models.Furthermore, the predicted values fall within the 95% confidence interval, suggesting that the BPVAR model exhibits clear advantages in prediction accuracy.To thoroughly assess the predictive accuracy of the BPVAR model, calculate an uate the mean absolute error, mean absolute percentage error, mean square error, an mean square error for both the BP model, SVM model, XGBoost model, and BPVAR The seventh type of measuring point data is adopted; the prediction error indexes fo model are presented in Table 5.A radar chart is constructed based on the error i from Table 5, as shown in Figure 12.Observing the radar chart, it becomes evide the MAE, MAPE, MSE, and RMSE of the BPVAR model surpass those of the BP SVM model, and XGBoost model.This highlights the superior predictive accuracy BPVAR model, offering valuable insights for uplift pressure prediction and analys  To thoroughly assess the predictive accuracy of the BPVAR model, calculate and evaluate the mean absolute error, mean absolute percentage error, mean square error, and root mean square error for both the BP model, SVM model, XGBoost model, and BPVAR model.The seventh type of measuring point data is adopted; the prediction error indexes for each model are presented in Table 5.A radar chart is constructed based on the error indexes from Table 5, as shown in Figure 12.Observing the radar chart, it becomes evident that the MAE, MAPE, MSE, and RMSE of the BPVAR model surpass those of the BP model, SVM model, and XGBoost model.This highlights the superior predictive accuracy of the BPVAR model, offering valuable insights for uplift pressure prediction and analysis.

Conclusions
The OPTICS algorithm was used to cluster the uplift pressure measuring points, and three different distance indexes were used to calculate the distance matrix.Clustering optimization was realized according to the two clustering evaluation indexes, and the dam foundation measuring points were divided.Then, a BPVAR safety monitoring model was established for each type of measuring point.The actual engineering data was verified, and the conclusions are summarized as follows: In this study, although the BPVAR model shows good interval prediction ability, the hyperparameter values in the model are still subjective; hyperparameter values refer to the overall tightness of the model and the lag attenuation parameter.Therefore, future research will focus on finding more effective methods to determine the most accurate hyperparameters.

Water 2024 , 23 Figure 1 .
Figure 1.Schematic of the core distance and reachable distance.

Figure 1 .
Figure 1.Schematic of the core distance and reachable distance.

Water 2024 , 23 Figure 2 .
Figure 2. Flowchart of the construction of the OPTICS-and BPVAR-based concrete dam uplift pressure safety monitoring models.

Figure 2 .
Figure 2. Flowchart of the construction of the OPTICS-and BPVAR-based concrete dam uplift pressure safety monitoring models.

Figure 3 .
Figure 3. Arrangement of uplift pressure observation points in the lateral corridor.

Figure 3 .
Figure 3. Arrangement of uplift pressure observation points in the lateral corridor.

Figure 5 .
Figure 5. Diagram of the cumulative distance.

Figure 5 .
Figure 5. Diagram of the cumulative distance.Figure 5. Diagram of the cumulative distance.

Figure 5 .Figure 6 .
Figure 5. Diagram of the cumulative distance.Figure 5. Diagram of the cumulative distance.

Figure 6 .
Figure 6.OPTICS clustering results for the uplift pressure measuring points on the dam foundation.(a) Based on the cosine similarity; (b) based on the bilateral slope distance; (c) based on the DTW distance.

Figure 7 .
Figure 7. Changes in the water level at the seven measuring points.(a) Category I; (b) category II; (c) category III; (d) category IV; (e) category V; (f) category VI; (g) category VII.

4. 3 . 3 .
Model Adaptation and Forecasting AnalysisThis paper creates panel data consisting of identical monitoring points and uses the BPVAR model to fit and predict.The number of pre-iterations and effective iterations of Gibbs sampling are set to 2000 and 1000, respectively.Panel data from seven types of monitoring points are partitioned into three segments: the learning set, the test set, and the verification set.The time span of the learning set is from 1 January 2004 to 10 December 2008, and includes a total of 1533 sets of data for model fitting and hyperparameter adjustment.The test set contains 10 sets of data from 11 December 2008 to 21 December 2008 to evaluate model performance and possible adjustments.The verification set covers 10 sets of data from 21 December 2008 to 31 December 2018 to verify the robustness and prediction error of the model.After debugging several times, the overall tightness of the model is determined to be 0.5, the lag attenuation parameter is 1, and the constant term is 0.

Figure 9 .
Figure 9. Fitting results for the seventh type of measuring point.(a) Fitting results of UP22 measur ing points; (b) fitting results of UP24 measuring points.

Figure 9 .
Figure 9. Fitting results for the seventh type of measuring point.(a) Fitting results of UP22 measuring points; (b) fitting results of UP24 measuring points.

Water 2024 ,Figure 10 .
Figure 10.Prediction results for the seventh type of measuring point.(a) UP22 measuring point prediction results (without adding exogenous variables); (b) UP24 measuring point prediction results (without adding exogenous variables); (c) UP22 measuring point prediction results (exogenous variables added); (d) UP24 measuring point prediction results (exogenous variables added).

Figure 10 .
Figure 10.Prediction results for the seventh type of measuring point.(a) UP22 measuring point prediction results (without adding exogenous variables); (b) UP24 measuring point prediction results (without adding exogenous variables); (c) UP22 measuring point prediction results (exogenous variables added); (d) UP24 measuring point prediction results (exogenous variables added).

, 16 Figure 11 .
Figure 11.Model validation results for the seventh type of measuring point.(a) UP22; (b) U

Figure 12 .
Figure 12.Radar diagram of prediction errors for the seventh type of surveying point.(a) UP22 measuring point prediction error radar chart; (b) UP24 measuring point prediction error radar chart.

( 1 )
Through the calculation of the clustering evaluation index, the DTW-based clustering results among the OPTICS clustering results calculated by three different distance indicators were found to be consistent with the variation pattern of the uplift pressure monitoring value.Research on engineering applications has shown that the uplift pressure measuring points of a water conservancy project dam foundation can be divided into seven types, and the measuring points of the same type show similar variation in the law of uplift pressure.(2)After adding exogenous variables to the BPVAR model, the multiple correlation coefficients between the fitted values and the measured values of the training set and the test set data exceeded 0.80, indicating that the modeling effect of the model was good, and the predicted uplift pressure fell within the 95% confidence interval, indicating that the BPVAR model performed well in interval prediction.The MAE, MAPE, MSE, and RMSE predicted by the BPVAR model were smaller than those of the BP model, the SVM model, and the XGBoost model.
Table 2 is the classification table of seven types of measuring points.

Table 2 .
Classification of the seven types of measuring points.

Table 3 .
P value of the unit root test for panel data.

Table 3 .
p value of the unit root test for panel data.

Table 4 .
Test of the optimal lag order for panel data.

Table 5 .
Evaluation of model prediction accuracy.

Table 5 .
Evaluation of model prediction accuracy.