1. Introduction
In the field of dam safety monitoring, numerous scholars have conducted extensive analysis and research on arch dam deformation monitoring data, leading to a wealth of valuable findings. However, most existing methods primarily focus on the influence of load on the deformability of arch dams, aiming to establish deformability representation models. These models are crucial for accurately analyzing and monitoring arch dam deformability, enabling operational management personnel to make informed judgments about dam safety.
Current methods for representing arch dam deformability typically model and analyze the deformability of monitoring points based on factors such as water pressure, temperature, and aging. These approaches include statistical models, deterministic models, and mixed models, all of which are widely used in engineering practice. For instance, Li et al. [
1] established a new approach for the prediction of effect quantity through increasing sample information based on discrete and regression analysis of two-dimensional normal information diffusion. Chen et al. [
2] developed a multi-objective prediction method that has been successfully applied to deformation prediction and anomaly detection in arch dams. Similarly, Kang et al. [
3] proposed a displacement change analysis model for concrete gravity dams based on Gaussian process regression, effectively analyzing dam deformation trends. Building on previous research, Li and Xu [
4,
5,
6], among others, proposed a partial least squares regression statistical model for dam deformation analysis, aimed at overcoming the limitations of the least squares regression method. Yang, Deng, and Wang [
7,
8,
9] replaced stepwise linear regression with partial regression models, achieving better deformation representation for dam monitoring points. In recent years, significant progress has also been made in deterministic modeling. For example, Li et al. [
10] proposed a method for determining the temperature field of arch dams by correlating dam temperature boundaries with air and water temperatures, subsequently establishing a deterministic displacement model for arch dams. Shen et al. [
11] used a deterministic model based on a viscoelastic constitutive framework to monitor deformation during the construction of the Three Gorges Dam, estimating the viscoelastic deformation of the dam body and foundation, thereby providing a theoretical basis for ensuring safety during construction. Regarding mixed models, additionally, Li et al. [
12] used the finite element method to calculate theoretical water pressure displacement for two dam sections under various conditions, characterizing dam deformation at different water levels and exploring the complementary use of theoretical calculations and statistical analysis.
In recent years, the rapid advancement of data mining and artificial intelligence technologies has led to the introduction of contemporary mathematical theories and swarm intelligence algorithms into the construction of dam deformation analysis models [
13,
14,
15,
16]. This integration has yielded several valuable research outcomes. Su et al. [
17] proposed an optimization method for support vector machine parameters and input vectors, which enhanced the efficiency of establishing dam safety monitoring models and dynamically described the mapping relationship between dam structural behavior and its influencing factors. Gabriella et al. [
18] developed three multi-objective support vector regression models, which were then used to construct deformability state representation models for dams. Wei et al. [
19] introduced a concrete dam deformation prediction model based on the Chicken Swarm Optimization (CSO) Support Vector Machine (RVM), which effectively addressed the complex nonlinear relationships between dam deformation and environmental factors, improving prediction accuracy. Li et al. [
20] developed a dam deformation prediction model that uses an improved particle swarm optimization algorithm to select the optimal parameters for an Extreme Learning Machine (ELM-IPSO), aiming to overcome the slow convergence and overfitting issues of traditional neural network models. Zhou et al. [
21] proposed a dam deformation representation and prediction model based on Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN), Phase Space Reconstruction (PSR), and Kernel Extreme Learning Machine (KELM). This model improved prediction accuracy by addressing the nonlinear and non-stationary characteristics of dam deformation monitoring sequences. Guo et al. [
22] established a concrete dam deformation prediction model using deformation monitoring data from concrete dam prototypes, leveraging the open-source deep learning framework TensorFlow, thereby providing a highly accurate means of predicting dam behavior.
The aforementioned research primarily focuses on the impact of load on the deformation behavior of arch dams, leading to the development of deformation state representation and early warning models. However, arch dams are highly nonlinear structures, and their deformation behavior is not only influenced by loads but also by complex correlations between deformation at different measurement points within the dam. Therefore, future research should further investigate the deformation representation model of each measurement point from the perspective of internal correlation and information fusion within the dam structure. This approach would better capture the interrelated effects across different parts of the dam and provide a new technical means for the deformation safety monitoring of arch dams.
2. Analysis of Deformation Correlation of Each Measurement Point of Arch Dam
In practical engineering (
Figure 1), the deformations observed at measurement points A, B, C, and D are primarily influenced by common factors such as hydraulic pressure, temperature, and aging. However, the deformation patterns at points C and D, located near the dam foundation and bank slope, differ significantly from those at points A and B, which are situated near the dam crest. This variation is closely linked to the combined effects of structural constraints, material properties, and environmental conditions specific to different regions of the dam, resulting in distinct deformation behaviors. When the deformation analysis model fails to account for these complex, unmonitored, and unquantifiable factors, parameter heterogeneity arises. Although traditional deformation models can capture the primary influencing factors through independent variables, they often overlook the unique deformation characteristics at different measurement points caused by these complex influences. Given the intricate nature of arch dam structures, the deformations at various measurement points are interrelated. Therefore, by analyzing the internal correlations within the arch dam structure and integrating prototype monitoring data, the deformation behavior of the dam can be assessed using information fusion techniques. This approach helps to mitigate the impact of specific, non-monitored environmental factors on the deformations at different measurement points.
In the following, the deformation of any two measurement points of the arch dam will be correlation analyzed through linear correlation coefficient and nonlinear correlation coefficient, and the deformation measurement points with similar characteristics will be analyzed by clustering.
2.1. Method to Identify Measurement Points with Strong Linear Correlation
To explore the correlation between deformation of two measurement points from the perspective of correlation, the Pearson correlation coefficient is used as the similarity index for cluster analysis. Measurement points within the same cluster exhibit strong linear correlation, while those in different clusters represent weak linear correlation. The expression for the Pearson correlation coefficient is [
23]:
where
T is number of time slices;
and
are the deformation time series of the measurement points
and
, respectively;
is the mean of
;
is the mean of
.
The basic principle of the deformation cluster analysis based on the Ward method [
24] for arch dam measurement points is as follows:
Assume that
deformation measurement points of the arch dam are divided into
clusters, denoted as
.
represents the number of deformation measurement points in the
cluster,
represents the similar index value of the measurement points
(
) in the
cluster (the paper refers to the deformation correlation coefficient), and
is the center of the index of
. Then,
(the sum of squares of deviations of the measurement points in the
) can be expressed as
The total sum of squares of the deviations of
clusters of the arch dam is
where
is the average value of the correlation coefficient of the points in the
.
When is determined, the classification which can be selected to make minute is the optimal classification. However, so far as practical engineering, the number of clusters is unknown in advance. In view of this situation, the paper conducts research on a Threshold Method to determine the number of deformation clusters in an arch dam.
Assuming that the clustering process merges
times in total,
(the ratio of the similarity distance between the clusters of the
clustering and the last clustering) is calculated, that is
where
is the global sum of squares of total deviations in the
partition;
is the global sum of squares of total deviations in the last partition.
() can be calculated by Equation (4). When the difference between and is small while the difference between and is large, (the distance between the corresponding clusters) will be used as the threshold of the arch dam deformation clustering according to which the clustering cluster is determined.
As mentioned above, assuming that there are measurement points and monitoring moments in the arch dam, according to the deformation cor relation coefficient measure index of the arch dam and the Ward clustering method, the partition clustering process of the arch dam deformation is as follows:
Step 1: Standardize the deformation eigenvalues and calculate the correlation coefficient between measurement points according to the Equation (1) to obtain the initial matrix .
Step 2: Initialize all measurement points form a cluster, the number of which , , for the type , and then perform Step 3 and Step 4 for the measurement points .
Step 3: For the correlation coefficient matrix obtained in Step 2, the two most correlated classes are merged into a new class according to the classification criterion.
Step 4: Calculate the correlation distance between the new class and other classes to obtain a new distance matrix . Repeat Step 3 and Step 4 until all measurement points are aggregated into one class.
Step 5: Determine the threshold of clustering , and determine the number of classifications and the measurement points in each clustering partition.
Step 6: Investigate the spatial proximity characteristics of the clustering measurement points. For the discontinuous regions of the clustered measurement points, the clustering is continued according to their spatial coordinates until the final clustering result is determined.
After the above-mentioned cluster analysis, for the same cluster, the set of measurement points {} where the deformation is linearly correlation at time and the measurement point can be obtained.
2.2. Method to Identify Measurement Points with Strong Nonlinear Correlation
This paper performs cluster analysis using the correlation coefficient between the deformation sequences of measurement points as the similarity index, and then other measurement point sets {} that are significantly linearly related to the deformation of the measurement point are determined, that is to say, the deformation of the measurement points in the same cluster is with a significant linearity correlation. For different clusters, the linear correlation between the deformation of the two measurement points is weak, but there may be a certain nonlinear correlation. Therefore, the nonlinear correlation between the deformation of two measurement points in different clusters will be delved into below to determine the points that have a significant nonlinear correlation with the measurement point.
The maximal information coefficient (
) is a correlation algorithm that evaluates the functional and statistical relationships between variables without making any assumptions about the data distribution. It was first proposed by Reshef et al. [
25] to measure the degree of correlation between variables. Assume that the deformation sequences in
T time at the two measurement points
,
of the arch dam are, respectively,
,
, and their mutual information is defined as
where
is the joint probability density of
and
;
and
are, respectively, the marginal probability density of
and
.
If the two constitute an ordered data set
, and the division
G is defined as a grid of
that the value range of the variable
and
are divided into
and
segments respectively, the probability distribution
will be obtained when the variable values in
D fall into the grid of
G where the
and
are both positive integers. If the number of grid divisions is fixed, different mutual information values will be obtained by changing the grid division position. The maximum mutual information formula for
D in this way is
In order to facilitate the comparison between different dimensions, the maximum normalized values
that are in the interval [0, 1] obtained under different divisions are formed into a characteristic matrix defined as
, whose expression is
Then the maximum information coefficient is
where
is the time length of the deformation monitoring sequence, that is, the scale of the data set;
, setting the condition
, is meant to limit the grid size to divide the area, where it is recommended that the value
is 0.6 during practical application according to the literature [
25]; namely,
.
Therefore, the between the deformation sequences of the two measurement points of the arch dam is essentially a normalized maximum mutual information, and its value range is . The correlation between the deformation sequences of the two measurement points of the arch dam can be excavated by using the . In terms of nonlinear measurement between two variables by utilizing , David N. Reshef et al. measured the nonlinear relationship between two variables by , where is the coefficient of determination of a simple linear regression model between the two variables. When the is larger and , it means that there is a strong nonlinear correlation between the two variables. This paper utilizes the hypothesis test method to determine the threshold value of the , that is, assuming that the random variables of the deformation of the two measurement points of the arch dam in the null hypothesis analysis are independent, when the null hypothesis is rejected and under a given significance level, it indicates that there is a strong nonlinear correlation between the two variables at this time, indicating that there is a strong nonlinear correlation relationship between the monitoring sequences of the deformation of the two points in the arch dam.
Compared with other traditional nonlinear statistical correlation coefficients, the possesses two obvious advantages: generality and equitability. Generality refers to the case where there are enough deformation sequence data sets of two measurement points of the arch dam, different types of correlations between the two can be explored by the , including functional relationships, such as parabolas, periodic functions, non-functional relationships, and even the hyperfunction of the synthesis of multiple functional relationships and so on; equitability refers to the case where the of different types of correlations with the same noise level do not differ significantly from one another. In summary, the calculation process of the model to determine the nonlinear correlation between the deformations of two measurement points of the arch dam is as follows:
- (1)
Arrange (the deformation measured data sequence of the measurement points to be analyzed) and (the deformation measured data sequence of other measurement points) in ascending order, then define the sorted as the abscissa axis to be divided into grids and the sorted as the ordinate axis to be divided into grids. In this case, a grid is constituted by , at which point the points in the data set fall into , and some of the cells are allowed to be empty sets.
- (2)
Find the probability distribution function of all cells in the divided grid . Let be the axis data division point in the , and be the axis data division in the ; let be the axis data division point in the , and be the axis data division in the . Find the mutual information , the maximum mutual information value and the eigenmatrix value in case of under the condition of satisfying certain constraints.
- (3)
Since different grids will form different , the globally optimal grid is locked through an exhaustive search for the characteristic matrix, so as to determine the maximum information coefficient that characterizes the nonlinear correlation between the deformations of the two measurement points of the arch dam.
By utilizing the analysis of two measurement points in different clusters, the set of measurement points, which is nonlinearly related to the deformation of the measurement point at time, can be obtained.
3. A Characterization Model for Deformation State of Arch Dams Based on Correlation Analysis
Section 2 of this paper explores methods for analyzing the correlation between deformations at various measurement points of the arch dam and identifies other measurement points that influence the deformation of the point under analysis. This section further investigates the deformation characterization model of each measurement point, focusing on the correlations between different parts of the dam. It also provides a foundation for developing a deformation monitoring model for each measurement point. The following section will discuss the construction of the deformation state representation model for individual measurement points of the arch dam.
Let the number of the deformation measurement points of the arch dam be
; the deformation monitoring value of the
measurement point at
time is
, and its data matrix form is
Equation (9) is a two-dimensional data set composed of multiple deformation monitoring data time series of the arch dam in the form of data. Considering that the arch dam is a whole structure, there is a correlation between the monitoring values of any two measurement points with time. Therefore, the deformation of any measurement point of the arch dam can be characterized by the deformation of other measurement points, namely
where
;
is the deformation monitoring value of the
measurement point at
moment;
is the column matrix formed by the deformation of other measurement points when there is a significant relationship between the two measurement points;
is the row matrix of the parameters to be estimated;
is a scalar constant representing the amount of deformation-specific effect produced by different parts of the arch dam under the condition of unique influencing factors;
represents random error.
Equation (10) can be expressed as a function
where
,
,
are, respectively, the deformation of the measurement points
, at the
moment;
is the deformation value metric function of the measurement point
that is significantly linearly related to the deformation of the measurement point
;
is the deformation value metric function of the measurement point
that is significantly nonlinearly related to the deformation of the measurement point
.
By performing cluster analysis on the deformation data of the arch dam monitoring points, other measurement points that are significantly related to the deformation of the measurement points to be analyzed are obtained. For the set of measurement points
that are significantly linearly related to the measurement point
to be analyzed, the deformation relationship can be expressed as
where
is the deformation of the measurement point
at
time;
is the deformation of the measurement point set
that is significantly linearly related to the measurement point
,
is the number of measurement points that are significantly linearly related to the measurement point
;
is the coefficient.
Based on
nonlinear correlation analysis, other measurement point sets
that are significantly nonlinearly related to the deformation of the measurement point
to be analyzed are obtained, and the deformation relationship can be expressed as
where
is the deformation of the measurement point
at
time;
is the deformation of the measurement point set
that is significantly nonlinearly related to the measurement point
;
is the number of measurement points that are significantly nonlinearly related to the measurement point
;
is the number of elementary function terms used to describe the nonlinear relationship between
and
;
is the elementary function describing the nonlinear relationship between
and
;
is the coefficient.
In order to further determine the functional form that can characterize the nonlinear relationship between and based on the discrete deformation monitoring data of the arch dam, this section aims to find the optimal combination of all elementary functions that can be composed to represent the nonlinear relationship between and through the research on genetic algorithms by expressing the elementary function in the form of a gene string. The specific implementation process is as follows:
Assuming that
is the deformation of the measurement point
at
time fitted by the functional expression representing the nonlinear relationship, the total sum of squared errors at each data point is
It is illustrated that the smaller the function value, the more accurate the determined function expression. Considering that the genetic algorithm is usually used to solve the maximum value of the function, the objective function to find the combination of elementary functions that form the optimal function expression is
where
is a large enough positive number.
In order to determine various forms of function expressions, it is necessary to decompose the function expression into basic operation units consisting of constants, functions, powers, operators, and so on:
, where
consists of commonly used elementary functions like
, etc.;
;
is constant. The basic operation units are connected by operators. In order to perform genetic operations, the above relationship needs to be mapped into binary codes. The constants directly adopt binary numbers, with a total of eight bits. The first six bits are the integer part, and the last two bits are decimals;
and
are represented by a three-bit binary code, and the operator is represented by a two-bit binary code. The coding scheme is shown in
Table 1.
A set of binary strings, each with a length of 19, is constructed according to a specific order of operations. Each binary string is referred to as a single gene. This fundamental structure can be employed to combine various functional forms. In the context of a genetic algorithm, individuals are represented by single genes. When converting a binary string into a function, it is necessary to discard the last operator.
When utilizing the genetic algorithm to determine the nonlinear relationship function of deformation between the measurement points, it is necessary to first generate the initial population, generate a random number between one and six, determine (the number of single genes in the individual), and then randomly generate a binary length of strings, forming a variable-length polygenic individual. By imitating the principle of survival of the fittest in nature, according to the elite strategy, the previous (, is the population size) individuals with the highest fitness function value in the current population are directly copied to the next generation. According to the principle that the larger the fitness function, the larger the probability of being selected, individuals are generated as the parent to participate in the reproduction of the next generation of the population.
The crossover operation generates new offspring by exchanging segments of information between two parent individuals to create potentially superior offspring. Since the length of the gene in each individual is variable, the standard crossover method must be adapted. During the crossover operation, two parents are randomly selected based on the crossover rate. Each parent independently generates a crossover point, and the crossover is then performed. If the length of the remaining segment exceeds 10, it is randomly appended to form a single gene; otherwise, it is discarded. Mutation operations introduce further diversity into the population by randomly altering individual genes, thus helping to prevent premature convergence. In the mutation process, the algorithm selects an individual from the parent generation according to the mutation rate, and then randomly selects and mutates a gene within that individual.
Continue to run the above algorithm until the sum of squared errors of an individual calculated according to the formula is less than 0.05 or the evolutionary algebra reaches the set maximum value. Then, the function expression representing the nonlinear relationship between and can be determined.
So far, the characterization model between the deformation of the measurement point
and the deformation of other measurement points at the
moment is
where
is the deformation of the measurement point
at
time;
is the deformation of the measurement point that is linearly related to the deformation of the measurement point
, and
is the number of measurement points;
is the deformation of the measurement point that is nonlinearly related to the deformation of the measurement point
, and
is the number of measurement points;
and
are coefficients;
are scalar constants, which represent the specific deformation effects of different parts of the arch dam under the conditions of unique influencing factors.
Utilizing the research results in
Section 2 of this paper, other deformation measurement points that are significantly linearly and nonlinearly related to the deformation of the measurement point
are determined, and the deformation data expression of the single measurement point of the arch dam can be obtained by substituting them into the above formula. According to
,
, and
, estimate the parameters
,
, and
by the least squares method, and finally determine the expression of the model.
The developed model for characterizing the deformation state of the arch dam measurement points can be used for tracking and predicting the deformation behavior of these points.