A Completion Method for Missing Concrete Dam Deformation Monitoring Data Pieces

: A concrete dam is an important water-retaining hydraulic structure that stops or restricts the ﬂow of water or underground streams. It can be regarded as a constantly changing complex system. The deformation of a concrete dam can reﬂect its operation behaviors most directly among all the effect quantities. However, due to the change of the external environment, the failure of monitoring instruments, and the existence of human errors, the obtained deformation monitoring data usually miss pieces, and sometimes the missing pieces are so critical that the remaining data fail to fully reﬂect the actual deformation patterns. In this paper, the composition, characteristics, and contamination of the concrete dam deformation monitoring information are analyzed. From the single-value missing data completion method based on the nonlocal average method, a multi-value missing data completion method using BP (back propagation) mapping of spatial adjacent points is proposed to improve the accuracy of analysis and pattern prediction of concrete dam deformation behaviors. A case study is performed to validate the proposed method.


Literature Reviews
A concrete dam can be regarded as a constantly changing complex system whose diverse and uncertain service behaviors are reflections to its special structure and working environment [1][2][3][4][5]. In operation status analyses of a concrete dam, the monitoring effect quantities such as deformation, seepage, stress, and strain can reflect the operation status patterns. Generally, deformation behaviors can show the operation status of the dam most directly [6][7][8][9][10]. A typical case is the Vaiont arch dam's failure in Italy [11,12]. After the water storage was completed in 1960, the left front bank landslide of the dam slowly wriggled, and the measured total displacement reached 429 cm on 7 October 1963. Affected by the heavy rain on 9 October 1963, the dam broke, causing nearly 3000 deaths. The monitoring data showed that the displacement rate was 0.14 cm/d before the spring of 1963. After the continuous heavy rain on 18 September 1963, the displacement rate increased sharply from about 1 cm/d. The maximum velocity before the crash had reached 80 cm/d. So, it is significant to study the behaviors of a concrete dam.
The long lifetime of a concrete dam consequently accumulates a huge amount of deformation monitoring informative data for basic concrete dam deformation behavior analyses and predictions.
The effectiveness of the dam safety monitoring and evaluation can be reduced by the missing monitoring data pieces due to monitoring instrument failures or automatic monitoring instability. The data missing from the key-position monitoring instrument can impede the dam health monitoring processes. Therefore, it is of practical significance to study the missing data completion strategies in the case of monitoring instrument failure to provide a reliable decision basis for the safe dam operation.
The missing data completion has been applied to many fields [13][14][15][16]. In the field of dams, Lv et al. [17] pointed out that the interpolation methods of observation data mainly include internal physical association interpolation and mathematical interpolation and introduced the principle and process of linear interpolation. To obtain the homogenized data required by the model, Li et al. [18] compared the commonly used mathematical interpolation methods and chose the cubic Hermite piecewise interpolation with smooth interpolation curves that made full use of the existing data information to build the homogenized processing of the data sequence. To deal with the disadvantage of the "Runge Phenomenon" in the interpolation interval of the traditional interpolation function at both ends, Tu et al. [19] utilized the fractal interpolation in deducing the integrity state through partial information of the object to the interpolation calculation of missing time series, while the interpolation results were in line with expectations. Wang et al. [20] found that the same monitoring items such as a series of points on the deformation have a high degree of similarity and suggested combining the monitoring information of relevant measuring points. Firstly, the kernel independent component analysis algorithm was used to extract the independent components of relevant measurement points, and then the optimal characteristic variables were found by using eigenvalue spectrum analysis. Finally, an interpolation method for dam missing data based on KICA-RVM was established by using a relevance vector machine. Hu et al. [21] used the deformation information of the spatial adjacent points to return the deformation value of the target measurement points and proposed a spatial adjacent points regression interpolation method as well as a spatial anti-distance weighted interpolation method with good interpolation results. Other scholars also proposed spatial interpolation methods for dam deformation [22][23][24][25] and built a good foundation for in-depth analyses of dam deformation behaviors. This paper analyzes the composition, characteristics, and contamination of the concrete dam deformation monitoring information. A multi-value missing data completion method using BP mapping of spatial adjacent points is proposed to improve the accuracy on analysis and pattern prediction of concrete dam deformation behaviors. The proposed method is validated by a case study.

Monitoring Data Characteristics
As an open system, a concrete dam has many factors and links that affect its deformation behaviors. These factors and links are the information sources of the concrete dam deformation monitoring. It can be seen from the composition of concrete dam deformation monitoring information in the previous section that the information sources have the following characteristics.

Multi-Systematic
The composition of a concrete dam is complex, which contains a large number of subsystems. The structure composition of a concrete dam includes the dam body, the dam foundation, and the dam near the reservoir area, and each represents a subsystem. Deformation monitoring can be divided into horizontal displacement monitoring, vertical displacement monitoring, crack opening monitoring, and each monitoring project can also be regarded as a subsystem. Different monitoring targets need different monitoring instruments to Appl. Sci. 2021, 11, 463 3 of 18 locate at different measuring points. Therefore, the monitoring information of the concrete dam deformation is multi-systematic.

Multi-Level
Different monitoring targets correspond to different monitoring methods, and in the concrete dam deformation monitoring system, the same monitoring project often contains different monitoring methods. Additionally, multiple measuring points are arranged in different parts of the dam, and the deformation behavior of the concrete dam is comprehensively reflected by measuring point information and different monitoring targets. Therefore, the monitoring information of the concrete dam deformation is multi-level.

Uncertainty
In the concrete dam deformation monitoring system, uncertainty arises since the interaction among the dam body, the dam foundation, and the monitoring system require different monitoring methods. The instrument precision variance, instrument performance degradation, and other factors also increase the uncertainty of the measured monitoring values [26,27]. In addition, the process of both manual and automatic monitoring can introduce errors and noises, which contributes to another source of uncertainty in monitoring data. Therefore, the monitoring information of the concrete dam deformation is uncertain.

Monitoring Data Contamination
From the characteristics of deformation monitoring information, it can be seen that the acquisition of concrete dam deformation information is affected by multiple factors. The information contamination is therefore inevitable and diverse, as shown in the following aspects: (1) Deficiency in information types The deficiency in information types of concrete dam deformation monitoring data is objective and unavoidable. First of all, as a large engineering structure, a concrete dam occupies a large space system, the concrete dam deformation changes dynamically, and the distribution of the deformation changes is heterogeneous. Therefore, it is difficult to fully describe the deformation behaviors with existing methods and rules. Consequently, there exists a certain level of deficiency in monitoring information types. Secondly, the concrete dam deformation monitoring system monitors the deformation of the dam body intermittently, which means the system can only obtain sub-samples of some characteristic periods instead of real-time data for the dam deformation. The above situation also contributes to the deficiency in information types. Additionally, considering the applicability in engineering aspects, monitoring positions are usually arranged in typical locations to observe the deformation in a certain area. In this case, the deformation information of other locations in the same area is missing, which results in the deficiency in information types.
(2) Incompleteness of a specific information type In the process of monitoring the deformation of a concrete dam, the data collection from the automatic system is generally intermittent with a constant step size. For example, the monitoring instrument may perform one measurement every six hours or one measurement every day. However, due to human errors, instrument damage, data loss, and other factors in a manual monitoring system, the time interval between each monitoring point is not always the same, which will bring difficulties to the subsequent modeling work.
Particularly, in both automation and manual systems, sometimes because of equipment degradation, some monitoring information will be lost in a long sequence of measurement intervals. For example, a two-year data sequence may lose a continuous one-month or two-month period of data points due to equipment degradation, and if those measuring points are located in key positions, the monitoring system will fail to observe the abnormal deformation of the dam, which makes the dam safety analysis difficult. The long-period interruption of monitoring data can hinder the overall deformation analysis and future deformation prediction.

(3) Errors in monitoring information
It can be seen from the monitoring methods of concrete dam deformation and the environmental influence factors that the deformation monitoring information cannot avoid errors. Generally, errors are divided into three groups-systematic errors, gross errors, and random errors. The expressions are as follows: where, ε is the total error of observation, ε s is the systematic error, ε G is the gross error, and ε n is the random error. Systematic errors can be generated by intrinsic errors of the instrument, wrong measurement practices, environment changes, improper monitoring methods, imprecise theories, or formulary approximations. This type of error usually has a certain regularity. Researchers can assign a constant, a trend, or a period, to represent a systematic error in an analytic formula, curve, or number table.
In the process of obtaining, conveying, and processing deformation monitoring information, some data that are obviously inconsistent with the facts are sometimes produced. The errors generated by this type of data are called gross errors. In terms of numerical values, data whose absolute values are larger than two times the mean square errors can be regarded as gross errors, which are manifested as abnormal sudden jumps or outliers. The outliers are not representative enough for dam deformation characterizations, so they should be opted out from the deformation behavior analysis.
Random errors are errors that are caused by a combination of unrelated random factors. In the cases of a single measurement, the random errors may show no regularity in a single measurement, but with enough measurements, this type of error obeys the statistical laws. The noise is a kind of random error. The deformation monitoring information can be divided into real data information and noise information. In deformation behavior analysis and prediction, the existence of noise information will severely affect the accuracy of behavior analysis and prediction, so it is necessary to extract effective information from monitoring information.

. Traditional Interpolation Completion Methods
For non-uniform time series with unequal time intervals, interpolation is usually used to homogenize them to satisfy the application requirements of building a statistical model. Frequently used interpolation methods include the piecewise linear interpolation, the nearest point interpolation, the cubic spline interpolation, and the cubic Hermite interpolation [18]. The principles of these methods are as follows: (1) Piecewise linear interpolation Linear interpolation refers to the interpolation method whose interpolation function is a first-degree polynomial. The linear interpolation approximates the original function by using a line passing through two endpoints and estimating the missing data by plugging points located between these two endpoints. The method is simple and convenient. The piecewise linear interpolation is a simple linear interpolation between each short interval [x i , x i+1 ], and the sub-interpolation polynomial on the interval [x i , x i+1 ] is: Appl. Sci. 2021, 11, 463

of 18
The interpolation function on the whole interval [x i , x n ] is: The definition of l i (x) is as follows: (2) Nearest point interpolation The nearest point interpolation estimates the function of the interpolation point by using the function of the nearest neighboring data point. This method is simple and intuitive, but the interpolation results are not so accurate.
Assuming the interpolation point is (x i , y i ), then: (3) Cubic spline interpolation The cubic spline interpolation, also called Spline interpolation for short, is an interpolation method to obtain the value of interpolation points by constructing a cubic spline interpolation function in the target interval. This method can effectively calculate the value of interpolation points and improve the smoothness of the interpolation curve. However, the computational cost of this interpolation is large.
Suppose there are interpolation nodes on the interval [a, b], a = x 1 < x 2 < · · · < x n = b, and the corresponding function values are y 1 , y 2 , · · · , y n . The cubic spline interpolation function S(x) satisfies that S(x i ) = y i (i = 1, 2, · · · , n), and is not larger than the cubic polynomial value on the interval [x i , x i+1 ], and it has a second continuous derivative on the interval [a, b]. Suppose the cubic polynomial on each subinterval [x i , x i+1 ] is: The function S(x) needs to meet: The expression of S(x) can be obtained from a fixed boundary condition: where By solving Equation (9) to get the parameters in Equation (8), the interpolation function on the interval [a, b] can be constructed. where

(4) Cubic Hermite interpolation
The Hermite interpolation method uses a curve to approximate the objective function, which not only requires that the interpolation curve strictly passes through the data points, but also needs to satisfy that the derivative value of each order at the data points is equal to the original function, to build a smooth interpolation curve. The cubic Hermite interpolation needs to know the function value of two nodes and the first derivative value to complete the construction. The algorithm is simple, and its interpolation results are close to real data, so it has been widely used.
Assuming that the two known nodes are x j−1 , y j−1 and x j , y j , and the corresponding first derivative values are y j−1 and y j , the interpolation polynomial H 3 (x) can be expressed as: Therefore, it can be solved by: Thus, the final expression can be obtained as: The function parameters of the interpolation point can be obtained by substituting the x-coordinate at the interpolation point into Equation (17).
The rationality of the traditional interpolation method lies in that the approximation of a small part of missing data does not affect the overall trend and law of deformation time series. When the non-uniform data information is few, this kind of interpolation method can be used to cover up and generate deformation time series with equal intervals.

Single-Value Missing Data Completion Based on NLM (Non-local Means) Method
According to the function value or derivative value of the existing data points, the traditional interpolation method approximates the objective function by constructing a curve satisfying the basic conditions through certain mathematical methods, which can effectively solve the problem of time series inhomogeneity to a certain extent. However, these traditional interpolation methods are only based on known data and do not consider the physical significance of practical problems. The homogenization of the time series of non-uniform deformation is a supplement to the deformation information of the concrete dam at the unknown time point, which needs to take into account the actual deformation laws of the dam. On the other hand, in the actual deformation time series, the deformation values at different moments cannot be represented by precise functional expressions, so the derivative values of the data points cannot be obtained, and in this case, the traditional interpolation method is not applicable to solve the above problems.
In view of the situation that the time series have a long span and an uneven distribution, this paper adopts the non-local means method (NLM algorithm) [28] using non-local knowledge of deformation information and the self-similarity of information laws at different moments in the deformation sequence to estimate the deformation value at the missing time periods. On this basis, a complete deformation sequence having the strongest correlation with the deformation trend of the target is introduced as the calculation basis. The aim of this method is to characterize the missing information by considering the self-correlation between the deformation values at different moments of the deformation sequence and the correlation between the measurement points corresponding to the position of the target.
The main idea of the NLM algorithm is to obtain a new image by weighting and averaging the gray values of all pixels in the original image regarding the weight coefficients of similarity. In this paper, it is applied to the homogenization of concrete dam deformation time series to solve the single-value missing problem.
Assuming that the measured value of A deformation measurement point of the dam body is uneven, in order to estimate single missing data, the following steps are performed.
First, from the perspective of the whole deformation time series of the measuring points, find the measuring point B with the strongest correlation with the deformation trend of measuring point A and complete sequence. Measuring point B can be found from many measuring points on the same monitoring perpendicular line of point A. In this paper, the Pearson correlation test is adopted to calculate the correlations among deformation data of measurement points. Pearson correlation coefficient is a statistical parameter used to quantitatively measure the correlation between variables, and its calculation formula is: where δ 1i and δ 2i represent the deformation value of measuring points A and B at the same time, and N represents the total number of sequences. It can be seen from Equation (18) that the value of Pearson's correlation coefficient varies between −1 and 1, and the greater the absolute value of the correlation coefficient, the stronger the correlation between the two variables. When the correlation coefficient is closer to 1 or −1, the correlation is stronger, the closer the correlation coefficient to 0, the weaker the correlation. In addition, when the correlation coefficient is greater than 0, the two variables are positively correlated.
Secondly, the deformation value in the deformation time series of point B and the interpolation point in the sequence of point A at the same time can be referred to as the hypothesis interpolation point, and the weight of the deformation value of this hypothesis interpolation point at other points in the sequence of point B may be calculated. In this paper, the Square of Euclidean Distance (SED) is used to measure the similarity of deformation values at different times. The formula for calculating the square of Euclidean distance is: where δ i and δ j represent the deformation values corresponding to the measuring points at time i and j.
In general, the smaller the difference is between the deformation values at different moments δ t i and δ t j , the more similar the deformation is at the two moments, and the larger Appl. Sci. 2021, 11, 463 8 of 18 the weight value is given in the calculation. The weight is calculated by the following formula: where h is the parameter that controls the increased or decreased speed of the exponential function and determines the weight. Finally, calculate the weight of each reference point relative to the assumed interpolation point based on the complete deformation sequence of point B, and assign the deformation value to the measuring point A at the corresponding time. Then, the value of the interpolation point can be calculated by a weighted average. The formula is: where I represents the set of moments of the selected entire time series. Assume the deformation time series of measuring points A and B are shown in Figure 1, where the sequence of measuring point B is complete, and there is a missing spot in the sequence of measuring point A. The dots in the figure represent the corresponding deformation values at different times, and the square point represents the missing value in the sequence of measuring points A, namely the interpolation point to be solved.
In general, the smaller the difference is between the deformation values at different moments i t δ and j t δ , the more similar the deformation is at the two moments, and the larger the weight value is given in the calculation. The weight is calculated by the following formula: where h is the parameter that controls the increased or decreased speed of the exponential function and determines the weight. Finally, calculate the weight of each reference point relative to the assumed interpolation point based on the complete deformation sequence of point B, and assign the deformation value to the measuring point A at the corresponding time. Then, the value of the interpolation point can be calculated by a weighted average. The formula is: where I represents the set of moments of the selected entire time series.
Assume the deformation time series of measuring points A and B are shown in Figure 1, where the sequence of measuring point B is complete, and there is a missing spot in the sequence of measuring point A. The dots in the figure represent the corresponding deformation values at different times, and the square point represents the missing value in the sequence of measuring points A, namely the interpolation point to be solved.  Figure 1 is a hypothetical interpolation point. Considering the similarity between other points and point 1 in the sequence, the deformation values of points 2, 3, and 4 are the same as that of point 1. According to the definition of Euclidean distance square, the weight of points 2, 3, and 4 is 1, and the closer the value is to the value of point 1, the larger the weight value is assigned. Through traversing the whole time series, the weight value of all points can be obtained. The weight value of each point in the B sequence is assigned to the corresponding point (points at the same time) in the A sequence, and the value of the interpolation points can be obtained after weighted averaging.  Figure 1 is a hypothetical interpolation point. Considering the similarity between other points and point 1 in the sequence, the deformation values of points 2, 3, and 4 are the same as that of point 1. According to the definition of Euclidean distance square, the weight of points 2, 3, and 4 is 1, and the closer the value is to the value of point 1, the larger the weight value is assigned. Through traversing the whole time series, the weight value of all points can be obtained. The weight value of each point in the B sequence is assigned to the corresponding point (points at the same time) in the A sequence, and the value of the interpolation points can be obtained after weighted averaging.

Completion Strategy for Multi Missing Values
When more information spots are missing in the deformation time series, the traditional interpolation method is not able to carry out effective interpolation calculation. Even though the NLM interpolation algorithm can calculate the value of each missing point, it needs to calculate the weight of existing points in the reference sequence to the missing points one by one, and then calculate the weight of each missing point in the target sequence. Although this method is feasible, the computational workload is large. To solve the above problem, this paper introduces a multi-value missing processing method and proposes a multi-value data missing completion method based on spatial adjacent point BP mapping.

Nonlinear Regression Analysis
The regression analysis studies the influence the degree of one variable to the other and estimates or predicts other variables' changes. However, in practice, the changes of most variables are not one variable-but multi-variable-dependent. Moreover, the relation between the explained variables and many explanatory variables, such as the concrete dam deformation, is non-linear. According to the theory of the statistical model, the concrete dam deformation is mainly affected by three components-water pressure, temperature, and time effect. Each component includes more than one influence factor, so it is a multivariable nonlinear regression problem. The statistical model uses several factors to fit the deformation trend and obtains the multiple regression equation of deformation. Therefore, when a continuous multi-value is missing in the deformation time series of a certain measuring point, with known environmental quantity data, the regression relationship between the two can be established from known values in the sequence. The expression of the multivariable nonlinear regression analysis model is: where f represents the general function between δ t and the influence factor, and φ i is the influence factor of concrete dam deformation. After the equation between the deformation value and its influence factor is established, according to the measured data, the coefficients of each factor in the model can be determined by the least square method, and the multiple regression model is thus established. The value of missing information can be obtained by substituting the influence factor data of missing information segments into the above expression.
Given that the statistical model is established based on statistical methods and combined with dam theory, when the monitoring data sequence is long, if the factors in the statistical model are representative, the model can accurately reflect the deformation trend of concrete dams. Therefore, the multivariable nonlinear regression analysis model has been widely recognized in the dam construction field.

Spatial Adjacent Point Regression
When the fitting accuracy of the regression model to the deformation sequence is low or the environmental variables of the missing segment are unknown, the accuracy of the above completion method is low. Since the single section of concrete gravity dam and the whole dam body of concrete arch dam can be regarded as a whole, the deformation is naturally integrated and coherent, so the deformation in local areas is correlated to a certain extent. In other words, the missing information of a target measurement point can be estimated according to the deformation value of its adjacent measurement point.
Assume that there are three monitoring points, A, B, and C, with similar locations and structures in the local area of a specific concrete dam section. The deformation sequences of measuring points A and C are complete, and a partial sequence of measuring point B is missing, as shown in Figure 2. Considering the correlation of dam deformation at measuring points A, B, and C, there is a certain correlation between the deformation value of measuring point B and the deformation value of measuring points A and C. Therefore, according to the modeling Considering the correlation of dam deformation at measuring points A, B, and C, there is a certain correlation between the deformation value of measuring point B and the deformation value of measuring points A and C. Therefore, according to the modeling idea of the statistical model, this paper takes the deformation value of measuring points A and C as the influence factors, and the deformation value of measuring point B as the target output to establish the correlation between measuring point B and measuring points A and C. The expression is: where f (δ A , δ C ) represents the general function between δ B and two influence factors, δ A , δ C , and the function relation can be expressed by polynomial as: where λ Ai and λ Ci represent the coefficients of each polynomial of δ A and δ C respectively, K A and K C represent the highest order of δ A and δ C , and β B is the translation term. If Equation (24) is expanded, let the number of adjacent measurement points in the local area of the target measurement point be abstracted as L, then: where δ it and δ jt represent the deformation values of the measuring point i and the adjacent measuring point j at time t respectively, and λ ij represents the influence coefficient of each factor. From the above analysis, with the known deformation information of the target measurement point and the adjacent measurement points, the least square method can also estimate the influence coefficients, and the expression of the model is thus established. By substituting the hypothesis missing information of the adjacent measurement point into Equation (25), the missing information of the target measurement point can be estimated.

BP Mapping of Spatial Adjacent Points
The spatial adjacent point regression interpolation method establishes the correlation between the deformation value of the target measurement point and the deformation value of the adjacent points, which can effectively reveal the relationships among the deformation values of the spatial adjacent measurement points. However, the measuring points that are located on the same deformation body, such as the measuring points on the same section of the concrete gravity dam and concrete arch dam, have integrity, correlations, mutual influences, and correlations in deformation, so the specific relationships between the deformation of these measuring points are complex. However, the spatial adjacent point regression method, which is based on the modeling idea of the statistical model, only regresses the power series expansion of finite integer terms of variables, so it is difficult to fully describe the unknown relationships between the deformation of measurement points, and therefore this regression method has limitations.
It is difficult to represent the complex and unknown relationship between the deformation of spatial measurement points by specific mathematical expressions. But the BP neural network, with strong nonlinear mapping ability, can delineate the complex information relationship behind the data through learning and training of samples. Meanwhile, the BP neural network also has strong generalization ability, so that the trained network can effectively process new input samples and give appropriate output results. Therefore, in order to improve the accuracy of the missing value completion and find the true value of deformation closest to the missing time, the BP neural network is introduced in this section to deal with the unknown relationship between the deformation of spatial measurement points. A corresponding missing value completion method is proposed.
The BP network is a kind of multi-layer feedforward neural network, which realizes operation through forward signal propagation and back error propagation. It has three layers: input layer, hidden layer, and output layer. Each layer is composed of nodes (namely neurons). The upper and lower nodes are connected by weight, and the nodes of the same layer are independent of each other. Through the connection weights between the upper and lower neurons, the network transforms the output of the upper neuron to the input of the lower neuron, thus realizing the learning calculation of the samples.
Assume that there are n monitoring points that are spatially adjacent and structurally related at a concrete dam body, such as points of a concrete gravity dam that are on the same vertical line or points within the same deformation zone of a concrete arch dam (partition method is not explained in detail in this paper), when the deformation information of the ith measuring point is missing due to some reasons, the known information of other m = n − 1 measuring points can be used to estimate the information of point i. The steps to establish a multi-value data missing completion method based on BP neural network mapping are as follows: Suppose that the sample set contains Z pattern pairs between the input vector and the output vector, randomly select a pattern pair k, while the input pattern vector is A k = a k 1 , a k 2 , · · · , a k m , and the expected output vector is Y k = y k 1 . The input vector of the middle layer element is S = s 1 , s 2 , · · · , s p (p is the number of hidden layer nodes, the same below), and the output vector is B k = b 1 , b 2 , · · · , b p . The input vector of the output layer element is L k = l 1 , l 2 , · · · , l p , and the output vector is C = (c). The connection weight between the input layer and the hidden layer is w w = w ij , i = 1, 2, · · · , m; j = 1, 2, · · · , p . The connection weight between the hidden layer and the output layer is v v = v j , j = 1, 2, · · · , p . The output threshold of each unit in the hidden layer is θ θ = θ j , j = 1, 2, · · · , p . The output threshold of the output layer unit is γ = (γ).
(2) Input vector A k , connection weight w, and threshold θ are used to calculate the input S of the hidden layer. Calculate the output B k of the hidden layer through Sigmoid function with S, namely: (3) The output B k , connection weight v, and threshold γ of the hidden layer are used to calculate the input L k of the output layer element, and then the output vector c of the output layer element is calculated with L k , namely: The expected output vector Y k and the actual network output c are used to calculate the generalized error d k of the output layer element, namely: (5) The connection weight v, the generalization error d k of the output layer, and the output B k of the hidden layer are used to calculate the generalization error e k of each element of the hidden layer, namely: Use the generalized error d k of the output layer element and the output B k of each element in the middle layer to correct the connection weight v and threshold γ, that is: where η stands for learning efficiency and take η = 0.01 ∼ 0.8. α is the momentum factor and take α = 0.9. (7) The connection weight w and threshold θ are modified by the generalized error e k j and input mode vector A k of each element of the hidden layer, namely: (9) Calculate the global error function E of the network, and its formula is: If E is less than a preset error value, the network stops learning; otherwise, repeat steps (3)-(8) for the next round of learning and training of the sample set.
(10) The trained network is saved, while new samples are input, and the output result of missing information completion is obtained.
Take n = 5 as an example, the input layer is the deformation sequence of four relevant measurement points, and the output layer is the deformation sequence of the target measurement points. The network structure is shown in Figure 3.
(8) Randomly select another learning pattern pair in the training sample set and repeat steps (3)-(6) until all pattern pairs are trained.
(9) Calculate the global error function E of the network, and its formula is: If E is less than a preset error value, the network stops learning; otherwise, repeat steps (3)-(8) for the next round of learning and training of the sample set.
(10) The trained network is saved, while new samples are input, and the output result of missing information completion is obtained. Take 5 n = as an example, the input layer is the deformation sequence of four relevant measurement points, and the output layer is the deformation sequence of the target measurement points. The network structure is shown in Figure 3.

Case Study
In order to verify the feasibility and effectiveness of the incomplete information processing and gross error detection methods proposed in this chapter, the deformation data of a concrete gravity dam is used in this analysis.
This gravity dam is located at the junction of Yibin County, Sichuan Province, and Shuifu County, in Yunnan Province. The dam serves various purposes: power generation, improvement of navigation conditions, flood and sand control, and irrigation. The mountains on both sides of the dam toe incline slightly to the downstream. The bedrock

Case Study
In order to verify the feasibility and effectiveness of the incomplete information processing and gross error detection methods proposed in this chapter, the deformation data of a concrete gravity dam is used in this analysis. This gravity dam is located at the junction of Yibin County, Sichuan Province, and Shuifu County, in Yunnan Province. The dam serves various purposes: power generation, improvement of navigation conditions, flood and sand control, and irrigation. The mountains on both sides of the dam toe incline slightly to the downstream. The bedrock surface of the dam (riverbed) is slightly inclined upstream, and there are coherent grooves on both sides. Bedrock lithology and lithofacies change abruptly, thus, the cross-stratification develops. Eleven small faults are found over the riverbed and dam foundation. The planform of the dam is shown in Figure 4. cross-stratification develops. Eleven small faults are found over the riverbed and dam foundation. The planform of the dam is shown in Figure 4.  The average annual rainfall of the reservoir is 1000 mm, with the maximum level of the annual daily rainfall over 90 mm, or the medium annual daily rainfall level in the Sichuan Province. The upstream water level (recently) is 380 m and has remained high for a long time. The downstream water level is usually around 270 m.
In order to monitor the horizontal displacement of the dam, vertical lines are arranged in each important dam section. In this section, the monitoring data of each measuring point on the positive vertical line of the dam sluice Section 1 are taken as an example for analysis ( Figure 6). The horizontal displacement process line of the six measuring points is shown in Figure 7. The frequency of measurement is once a day. It cross-stratification develops. Eleven small faults are found over the riverbed and dam foundation. The planform of the dam is shown in Figure 4.  The average annual rainfall of the reservoir is 1000 mm, with the maximum level of the annual daily rainfall over 90 mm, or the medium annual daily rainfall level in the Sichuan Province. The upstream water level (recently) is 380 m and has remained high for a long time. The downstream water level is usually around 270 m.
In order to monitor the horizontal displacement of the dam, vertical lines are arranged in each important dam section. In this section, the monitoring data of each measuring point on the positive vertical line of the dam sluice Section 1 are taken as an example for analysis ( Figure 6). The horizontal displacement process line of the six measuring points is shown in Figure 7. The frequency of measurement is once a day. It The average annual rainfall of the reservoir is 1000 mm, with the maximum level of the annual daily rainfall over 90 mm, or the medium annual daily rainfall level in the Sichuan Province. The upstream water level (recently) is 380 m and has remained high for a long time. The downstream water level is usually around 270 m.
In order to monitor the horizontal displacement of the dam, vertical lines are arranged in each important dam section. In this section, the monitoring data of each measuring point on the positive vertical line of the dam sluice Section 1 are taken as an example for analysis ( Figure 6). The horizontal displacement process line of the six measuring points is shown in Figure 7. The frequency of measurement is once a day. It can be found that the horizontal displacement process line of these measuring points has strong correlation.
Appl. Sci. 2021, 10, x FOR PEER REVIEW 16 of 20 can be found that the horizontal displacement process line of these measuring points has strong correlation.

Single-Value Missing Data Completion
Take the measuring point PL5-3 in Figure 6 as an example, the strongest correlation reference sequence with the deformation sequence of the measuring point is searched. The correlation between the sequence of target measurement points and the sequence of other measurement points is shown in Table 1.  can be found that the horizontal displacement process line of these measuring points has strong correlation.

Single-Value Missing Data Completion
Take the measuring point PL5-3 in Figure 6 as an example, the strongest correlation reference sequence with the deformation sequence of the measuring point is searched. The correlation between the sequence of target measurement points and the sequence of other measurement points is shown in Table 1.

Single-Value Missing Data Completion
Take the measuring point PL5-3 in Figure 6 as an example, the strongest correlation reference sequence with the deformation sequence of the measuring point is searched. The correlation between the sequence of target measurement points and the sequence of other measurement points is shown in Table 1.  Suppose that the deformation data of 10 September 2014 is missing, use the interpolation method based on non-local means proposed in this paper and the traditional method to estimate the missing value, respectively. The results are shown in Table 3. It can be seen from the calculation results of each interpolation method that the estimation result of the proposed single-value missing completion method based on non-local means is close to the original monitoring values. At the same time, it can be found that when the missing value is not within the range of two values before and after the missing value, the traditional interpolation method is difficult to estimate such deformation value effectively. However, the NLM interpolation method overcomes this limitation by using self-similarity of deformation sequences and introducing reference sequences, which increases the accuracy of missing value estimation.

Multi-Value Missing Data Completion
Similarly, take the deformation sequence of PL5-3 as an example. A one-month missing segment is constructed manually (from 1 September 2014 to 2 October 2014). The BP neural network is used to establish the mapping relationship between other measurement points on the same vertical line and the deformation sequence of target measurement points. First, the training samples made up of known deformation data of each measurement point are imported into the BP neural network for learning. Second, the deformation values of the missing time of other measurement points are constituted and imported into the trained network to calculate the missing data of target measurement points. The calculation results of each method are shown in Figure 8, and the completion accuracy is shown in Table 4. points. The calculation results of each method are shown in Figure 8, and the completion accuracy is shown in Table 4.  As can be seen from the results in Figure 8 and Table 4, the completion accuracy of the multi-value missing completion method proposed in this paper based on spatial adjacent point BP mapping is higher than that of the other three methods. The coefficient of determination and root-mean-square error all achieve satisfactory results. The interpolation method of spatial adjacent points also has a good estimation result, but since this method only uses the deformation information of the upper and lower measurement points for regression analysis, it cannot fully dig out the relevant information of the deformation of the target measurement points. The nonlinear regression utilizes the idea of a statistical model, which can be used for completion under the condition of tiny changes of environment quantity. The effect of linear interpolation is poor, so it is difficult to estimate multi-value missing data.

Conclusions
This paper proposed a completion method for the missing deformation monitoring data of concrete dams. The main points are as follows: (1) The monitoring data missing of concrete dam deformation was discussed, including deformation monitoring data characteristics and data contamination types.
(2) A data completion method with high accuracy, good stability, and strong adaptability validated through a case study was proposed. By reviewing the traditional processing methods to deal with incomplete information, this paper discussed the principle and weakness of traditional missing value completion methods in the case of single-value missing and multi-value missing. For the single-value missing in monitoring data, the non-local mean method was studied, and the regression interpolation method of spatial adjacent points was improved to accomplish data  As can be seen from the results in Figure 8 and Table 4, the completion accuracy of the multi-value missing completion method proposed in this paper based on spatial adjacent point BP mapping is higher than that of the other three methods. The coefficient of determination and root-mean-square error all achieve satisfactory results. The interpolation method of spatial adjacent points also has a good estimation result, but since this method only uses the deformation information of the upper and lower measurement points for regression analysis, it cannot fully dig out the relevant information of the deformation of the target measurement points. The nonlinear regression utilizes the idea of a statistical model, which can be used for completion under the condition of tiny changes of environment quantity. The effect of linear interpolation is poor, so it is difficult to estimate multi-value missing data.

Conclusions
This paper proposed a completion method for the missing deformation monitoring data of concrete dams. The main points are as follows: (1) The monitoring data missing of concrete dam deformation was discussed, including deformation monitoring data characteristics and data contamination types. (2) A data completion method with high accuracy, good stability, and strong adaptability validated through a case study was proposed. By reviewing the traditional processing methods to deal with incomplete information, this paper discussed the principle and weakness of traditional missing value completion methods in the case of single-value missing and multi-value missing. For the single-value missing in monitoring data, the non-local mean method was studied, and the regression interpolation method of spatial adjacent points was improved to accomplish data completion. For the multivalue missing data completion, the nonlinear regression and the spatial adjacent point regression were used, and the BP mapping of spatial adjacent points was proposed to complete the missing data pieces. The method proposed in this work is simple and effective to complete long data sequences and can meet the requirements of safety monitoring during dam operation. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.