Research on a Seepage Monitoring Model of a High Core Rockfill Dam Based on Machine Learning

The seepage of a rockfill dam with a high core wall is an important and difficult issue in the safety monitoring of a core rockfill dam, something about which managers are immensely concerned. Seepage of a high core rockfill dam is mainly affected by factors such as water level, rainfall, temperature, filling height, and aging. The traditional research method is to establish a multiple linear regression model to analyze the influence factors of seepage. However, the multicollinearity between these factors affects parameter estimation, and random errors in the data cause the regression model to fail to be established. This paper starts with data collected by an osmometer, uses the 3δ criterion to process the outliers in the sample data, uses the R language to perform principal component analysis on the processed data to eliminate the multicollinearity of the factors, and finally uses multiple linear regression to model and analyze the data. Taking the Nuozhadu high core rockfill dam as an example, the influencing factors of seepage in the construction period and the impoundment period were studied and the seepage was then forecasted. This method provides guidance for further studies of the same type of dam seepage monitoring model.


Introduction
A core-wall rockfill dam is economical to invest in, simple to construct, and locally sourced. It has the advantages of good adaptability to dam foundation conditions, full use of construction excavation materials, and good seismic performance, and it plays an essential role in the development of water resources at home and abroad. It is also a type of earth and rock dam that is widely used in countries all over the world at present [1]. In the 1990s, soil core-wall dams and concrete-face rockfill dams flourished. From the development of core-wall rockfill dams, the construction of high core-wall rockfill dams in China is still in the stage of accumulated experience. The core-wall rockfill dam projects which have been built, such as Lubuge (104 m), Xiaolangdi (160 m), Pubugou (188 m), and Nuozhadu (261.5 m), work well. High core rockfill dams, such as the Lianghekou (295 m) and Shuangjiangkou (314 m) projects currently under construction, are extensive dam projects with a world-class level exceeding 300 m [2].

Introduction of the Nuozhadu High Core-Wall Rockfill Dam
The Nuozhadu Hydropower Station is located at the lower reaches of the Lancang River at the junction of Cuiyun District and Jixian County, Simao City, Yunnan Province (the dam site is between the Kanjie River and the Burn Village Ditch) It is the fifth level of the eight cascade planning in the middle and lower reaches of the Lancang river. The project is a large (I) first-class project, and the permanent major hydraulic structure is a class 1 building. The project is mainly designed for electricity generation and also has comprehensive utilization benefits such as flood control, irrigation, aquaculture, and tourism. The reservoir has many years of regulating performance. The total storage capacity of the reservoir is 237.03 × 108 m 3 , and the installed capacity of the power station is 5850 MW (9 × 650 MW). The project consists of a core-wall rockfill dam, an open spillway on the left bank, a flood discharge tunnel on the left and right banks, an underground water diversion system on the left bank, a ground 500 kV switch station, and diversion projects. The core-wall rockfill dam is 627.87 m in length at the top, 18 m in width at the crest, 821.5 m in height at the crest, and the largest dam height is 261.5 m. It is currently the third largest in the world and the largest-scale core-wall rockfill dam in Asia [19,20]. The actual map of the Nuozhadu Hydropower Station is shown in Figure 1. In order to effectively monitor the seepage of core-wall rockfill dams, osmometers are embedded in the dam body for monitoring. The osmometers are mainly arranged on the four monitoring sections A, C, D, and E, with 24 of the osmometers in the A section, 53 in the C section, 24 in the D section, and 7 in the E section [21]. The osmotic pressure gauge layout of the C section is shown in Figure 2.

Layout of Typical Osmometer
The C-section typical osmometer DB-C-P-35 was chosen as the research object. The osmometer, selected from the American GEOKON vibrating string sensor GK-4500S 3 MPa, is shown in Figure 3.
The main technical indicators are shown in Table 1.  In order to effectively monitor the seepage of core-wall rockfill dams, osmometers are embedded in the dam body for monitoring. The osmometers are mainly arranged on the four monitoring sections A, C, D, and E, with 24 of the osmometers in the A section, 53 in the C section, 24 in the D section, and 7 in the E section [21]. The osmotic pressure gauge layout of the C section is shown in Figure 2. In order to effectively monitor the seepage of core-wall rockfill dams, osmometers are embedded in the dam body for monitoring. The osmometers are mainly arranged on the four monitoring sections A, C, D, and E, with 24 of the osmometers in the A section, 53 in the C section, 24 in the D section, and 7 in the E section [21]. The osmotic pressure gauge layout of the C section is shown in Figure 2.

Layout of Typical Osmometer
The C-section typical osmometer DB-C-P-35 was chosen as the research object. The osmometer, selected from the American GEOKON vibrating string sensor GK-4500S 3 MPa, is shown in Figure 3.
The main technical indicators are shown in Table 1.

Layout of Typical Osmometer
The C-section typical osmometer DB-C-P-35 was chosen as the research object. The osmometer, selected from the American GEOKON vibrating string sensor GK-4500S 3 MPa, is shown in Figure 3.
The main technical indicators are shown in Table 1.  The osmometer is buried inside the dam body, and the sensor cable is pulled to the observation room outside the nearby dam body. The schematic diagram of burial of typical osmometer (DB-C-P-35) and cable traction is shown in Figure 4. The osmometer was manually measured by a vibrating wire reader before access automation. Generally, the manual measurement process is as follows: read the osmometer three times, take the average of the three times as the final reading, manually record the results of the three readings and the average value, and fill it in the table. The observation frequency is twice a month during the construction period [22]. After access automation, the osmometer is measured by a high-precision measurement control unit and the observation frequency is once a day [23].

Study Steps and Processes
The following steps were carried out in this study: 1. Select respectively independent variables and dependent variable data for the construction period and the storage period. 2. Artificially collect sensor data manually to identify errors and reject them. Automated data acquisition uses a 3δ criterion to automatically identify errors and reject them. 3. Perform multicollinearity diagnosis on the remaining error-free data in the second step. If there is multicollinearity between the factors, go to the fourth step. The osmometer is buried inside the dam body, and the sensor cable is pulled to the observation room outside the nearby dam body. The schematic diagram of burial of typical osmometer (DB-C-P-35) and cable traction is shown in Figure 4. The osmometer is buried inside the dam body, and the sensor cable is pulled to the observation room outside the nearby dam body. The schematic diagram of burial of typical osmometer (DB-C-P-35) and cable traction is shown in Figure 4. The osmometer was manually measured by a vibrating wire reader before access automation. Generally, the manual measurement process is as follows: read the osmometer three times, take the average of the three times as the final reading, manually record the results of the three readings and the average value, and fill it in the table. The observation frequency is twice a month during the construction period [22]. After access automation, the osmometer is measured by a high-precision measurement control unit and the observation frequency is once a day [23].

Study Steps and Processes
The following steps were carried out in this study: 1. Select respectively independent variables and dependent variable data for the construction period and the storage period. 2. Artificially collect sensor data manually to identify errors and reject them. Automated data acquisition uses a 3δ criterion to automatically identify errors and reject them. 3. Perform multicollinearity diagnosis on the remaining error-free data in the second step. If there is multicollinearity between the factors, go to the fourth step. The osmometer was manually measured by a vibrating wire reader before access automation. Generally, the manual measurement process is as follows: read the osmometer three times, take the average of the three times as the final reading, manually record the results of the three readings and the average value, and fill it in the table. The observation frequency is twice a month during the construction period [22]. After access automation, the osmometer is measured by a high-precision measurement control unit and the observation frequency is once a day [23].

Study Steps and Processes
The following steps were carried out in this study:

1.
Select respectively independent variables and dependent variable data for the construction period and the storage period.

2.
Artificially collect sensor data manually to identify errors and reject them. Automated data acquisition uses a 3δ criterion to automatically identify errors and reject them. 3. Perform multicollinearity diagnosis on the remaining error-free data in the second step. If there is multicollinearity between the factors, go to the fourth step. 4.
Using principal component analysis to eliminate multicollinearity between factors, extract principal components and construct a regression model.

5.
Restore the normalized independent variable to the original independent variable to obtain the regression coefficient of the original independent variable. 6.
Use the established seepage monitoring model to predict the construction period and the impoundment period, respectively.
The main flow chart is shown in Figure 5.
Sensors 2018, 18, 2749 5 of 14 4. Using principal component analysis to eliminate multicollinearity between factors, extract principal components and construct a regression model. 5. Restore the normalized independent variable to the original independent variable to obtain the regression coefficient of the original independent variable. 6. Use the established seepage monitoring model to predict the construction period and the impoundment period, respectively.
The main flow chart is shown in Figure 5.

Abnormal Value Judgment
For manually observed osmometer data, if abnormal values are found in the recording process, they can be manually identified and removed to ensure that these abnormal data are not used in the calculation. For the data from automatic monitoring, due to problems such as sensor wiring looseness, interface aging, voltage instability, communication error, and hardware failure in the actual projects, the observation data produces random errors that obey normal distribution [24]. If these abnormal values are directly involved in the establishment of the regression model, the stability and reliability of the model will be affected. In order to eliminate these outliers, the 3δ criterion (Puata criterion) was used in this paper [25], the main calculation process is presented below.
The osmotic pressure value measured by the percolometer is recorded as {y 1 , y 2 , . . . , y i }. Calculate the arithmetic mean y and dispersion V i = y i − y(1 ≤ i ≤ n). The standard error is calculated according to the Bessel formula [26], as shown in Equation (1): If the absolute value of the dispersion is greater than 3 times the standard deviation, which is |V i | ≥ 3σ, y i can be considered an abnormal value and should be removed.

Principal Component Analysis
The seepage of core-wall rockfill dams is mainly affected by water level, rainfall, temperature, filling elevation, aging, etc. When there is multicollinearity between factors, the least-squares method for regression analysis will be affected, resulting in parameter distortion and thus weakening the model's predictive function [27]. To judge multicollinearity, this study generally used the variance inflation factor (VIF) [28].
Related studies have proposed ridge regression and principal component analysis to solve the problem of multicollinearity between factors [29]. Ridge regression mainly reduces the mean squared error by ridge parameters, but the selection of ridge parameters has not been completely solved theoretically [30]. This paper mainly solved the problem of multicollinearity through principal component analysis. The basic principle of principal component analysis is to reduce the dimensionality of the original multivariable high-dimensional system and convert multiple indexes into a few comprehensive indicators, which provides a more intuitive understanding of the contribution rate of each independent variable to the dependent variable and ensures that the contained information does not duplicate. The calculation steps of the principal component analysis method are as follows [31]: Standardize the original data. Transform the sample data according to Equations (2) and (3): and Among them, Z ij is the standardized data and X ij is the original data.

2.
Find the correlation coefficient matrix R for the normalized matrix Z [32].  3. Solve the characteristic equation R − λI p = 0 (I is the identity matrix) of the correlation matrix R to get P eigenvalues. Generally, take the cumulative contribution rate of corresponding to the eigenvalues λ 1 , λ 2 , · · · λ m of 1st, 2nd, . . . , Mth principal component.
After principal component analysis, multicollinearity between the factors was eliminated. From the cumulative contribution rate, the principal components were extracted and the principal component regression model was constructed. Finally, the regression coefficient of the original independent variable was obtained by the inverse transformation reduction to the original independent variable.

Achievement
Experimental data was collected from the typical osmometer DB-C-P-35 at the C section of the Nuozhadu core-wall rockfill dam. The time period was selected from the installation of the sensor to the completion of the dam filling (9 December 2010 to 4 December 2012). The storage period was from 1 July 2013 to 15 September 2013. The method described in this paper was used to establish and predict the seepage model during construction and operation.

Percolation Monitoring Model during Construction
The osmotic pressure inside the core during the construction period is mainly affected by factors such as dam filling, temperature, rainfall, and time. During the construction period of the Nuozhadu core-wall rockfill dam, the osmometers were all observed artificially and errors were identified and eliminated artificially while valuing. The regression model between the establishment of osmotic pressure and each influencing factor is shown in Equation (4): δ P is the osmotic pressure, δ H is the impact factor of the filling elevation, δ T is the temperature influence factor, δ J is the influence factor of rainfall, and δ T is the time impact shadow.
A detailed decomposition was performed using Equations (4) and (5): In this formula, P (n) is the osmotic pressure at time n, unit: Mpa; H (n) is the filling height of core wall at time n; H 0 is the elevation of the sensor, unit: m; T (n) is the temperature at time n; T is the average temperature, unit: • C; T (n) is the current time; T 0 is the time for the sensor to be buried; J (n) is the amount of rainfall at time n, unit: m; and α 0 , α 1 , α 2 , α 3 , α 4 is the undetermined coefficient.
The data of the construction period was selected and the error value was manually subtracted and substituted into Equation (6). The variance inflation factor VIF = 11.225 > 10 was solved and the multicollinearity between factors was found. As shown the flow chart of research work in Figure 5, the first step is analyzing the principal components. Using the function prcomp in the R language to calculate cumulative contribution rate of the first three principal components was 98.85%. So the three principal components were extracted and established to principal component regression, which were From Equation (6), it can be seen that the first principal components Z 1 and X 1 , X 3 have a positive correlation. At the same time, it can be seen from the process line that the osmotic pressure increases with the increase of time and filling elevation. The process line is shown in Figure 6.
Sensors 2018, 18,2749 From Equation (6), it can be seen that the first principal components 1 Z and 1 X , 3 X have a positive correlation. At the same time, it can be seen from the process line that the osmotic pressure increases with the increase of time and filling elevation.
The process line is shown in Figure 6. The coefficients in the Equation (6) are after normalization. If we want to solve the coefficients before standardization, it should carry out inverse transformation using the function apply in the R language.
Then, using inverse transformation to restore the original argument, the coefficient of the regression equation was

Seepage Monitoring Model in Water-Storage Period
During the impoundment period, since no change occurs in the filling elevation, the pressure in the core wall is mainly affected by factors such as upstream water level, temperature, rainfall, and time. The dam safety monitoring automation system was implemented during the impoundment period. The data of the osmometer is automatically collected by the measurement control unit (MCU). During the collection process, problems such as loose sensor wiring, aging of the interface, voltage instability, and hardware failure may cause random errors in the observed data. The 3δ criterion was used to automatically eliminate the error, and the regression model between the osmotic pressure and the influencing factors was established for the data without measuring errors, as shown in Equation (8):  The coefficients in the Equation (6) are after normalization. If we want to solve the coefficients before standardization, it should carry out inverse transformation using the function apply in the R language.
Then, using inverse transformation to restore the original argument, the coefficient of the regression equation was

Seepage Monitoring Model in Water-Storage Period
During the impoundment period, since no change occurs in the filling elevation, the pressure in the core wall is mainly affected by factors such as upstream water level, temperature, rainfall, and time. The dam safety monitoring automation system was implemented during the impoundment period. The data of the osmometer is automatically collected by the measurement control unit (MCU). During the collection process, problems such as loose sensor wiring, aging of the interface, voltage instability, and hardware failure may cause random errors in the observed data. The 3δ criterion was used to automatically eliminate the error, and the regression model between the osmotic pressure and the influencing factors was established for the data without measuring errors, as shown in Equation (8): δ P is the osmotic pressure, δ H is the impact factor of upstream water level, δ T is the temperature influence factor, δ J is the influence factor of rainfall, and δ T is the time impact factor.
A detailed decomposition was performed using Equations (8) and (9): In this formula, P (n) is the osmotic pressure at time n, unit: Mpa; H (n) is the upstream reservoir level at time n; H 0 is the elevation of the sensor, unit: m; T (n) is the temperature at time n; T is the average temperature, unit: • C; T (n) is the current time; T 0 is the time for the sensor to be buried; J (n) is the amount of rainfall at time n, unit: m; and α 0 , α 1 , α 2 , α 3 , α 4 is the undetermined coefficient.
The variance inflation factor VIF = 79.865 > 10 was solved and the multicollinearity between factors was found. As shown the flow chart of research work in Figure 5, the first step is analyzing the principal components. Using the function prcomp in the R language to calculate cumulative contribution rate of the first three principal components was 98.59%. So the three principal components were extracted and established to principal component regression, which were From Equation (10), it can be seen that the first principal components Z 1 and X 1 , X 3 have a positive correlation. At the same time, it can be seen from the process line that the water level converted by osmotic pressure rises with the increase of time and the elevation of upstream reservoir water level.
The process line is shown in Figure 7.
The variance inflation factor VIF = 79.865 > 10 was solved and the multicollinearity between factors was found. As shown the flow chart of research work in Figure 5, the first step is analyzing the principal components. Using the function prcomp in the R language to calculate cumulative contribution rate of the first three principal components was 98.59%. So the three principal components were extracted and established to principal component regression, which were From Equation (10), it can be seen that the first principal components The process line is shown in Figure 7. The coefficients in the Equation (10) are after normalization. If we want to solve the coefficients before standardization, it should carry out inverse transformation using the function apply in the R language.
Then, using inverse transformation to restore the original argument, the coefficient of the regression equation was  The coefficients in the Equation (10) are after normalization. If we want to solve the coefficients before standardization, it should carry out inverse transformation using the function apply in the R language.
Then, using inverse transformation to restore the original argument, the coefficient of the regression equation was

Comparison between Traditional Method and Recommended Method
The traditional research method had been to directly establish a multiple linear regression model to analyze the influence factor of seepage, not considering whether there are outliers in the sample data and multicollinearities between seepage impact factors.
A multiple linear regression model was established using Equation (9) and the water-storage period data in the traditional research method through SPSS tools. The result of the analysis was shown in Table 2. So, the regression equation was From Equation (12), it can be seen that the coefficient of time (where it is −61.811) is a negative number. However, the coefficient of time (where it is 1.613) in Equation (11) is a positive one. Actually, the water level rises with the increase of time and the coefficient of time is always positive.
At the same time, it can be seen from Table 2 that the variance inflation factor (VIF) > 10 and there are the multicollinearity among factors. In contrast, there are no multicollinearities among factors in the recommended method because multicollinearities had been eliminated.
In the traditional research method, the data of the dependent variable during the water-storage period was substituted into Equation (12) to calculate the predicted value of the osmotic pressure. The time series of the predicted values in traditional and recommended method and measured values was then plotted, as shown in Figure 8.

Comparison between Traditional Method and Recommended Method
The traditional research method had been to directly establish a multiple linear regression model to analyze the influence factor of seepage, not considering whether there are outliers in the sample data and multicollinearities between seepage impact factors.
A multiple linear regression model was established using Equation (9) and the water-storage period data in the traditional research method through SPSS tools. The result of the analysis was shown in Table 2.
From Equation (12), it can be seen that the coefficient of time (where it is −61.811) is a negative number. However, the coefficient of time (where it is 1.613) in Equation (11) is a positive one. Actually, the water level rises with the increase of time and the coefficient of time is always positive.
At the same time, it can be seen from Table 2 that the variance inflation factor (VIF) > 10 and there are the multicollinearity among factors. In contrast, there are no multicollinearities among factors in the recommended method because multicollinearities had been eliminated.
In the traditional research method, the data of the dependent variable during the water-storage period was substituted into Equation (12) to calculate the predicted value of the osmotic pressure. The time series of the predicted values in traditional and recommended method and measured values was then plotted, as shown in Figure 8. Note Ra 2 as a goodness of fit [33] for the traditional method and Rb 2 for the recommended method.
Ra 2 = 0.19 and Rb 2 = 0.97 were calculated respectively indicating that the recommended method is prior than the traditional method. Note R a 2 as a goodness of fit [33] for the traditional method and R b 2 for the recommended method.
R a 2 = 0.19 and R b 2 = 0.97 were calculated respectively indicating that the recommended method is prior than the traditional method.

Percolation Prediction
The data of the dependent variable during the construction period was substituted into Equation (7) to calculate the predicted value of the osmotic pressure, and the time series of the predicted and measured values was then plotted, as shown in Figure 9.

Percolation Prediction
The data of the dependent variable during the construction period was substituted into Equation (7) to calculate the predicted value of the osmotic pressure, and the time series of the predicted and measured values was then plotted, as shown in Figure 9. A goodness of fit [33] of R 2 = 0.97 indicates that the fitting effect is good and the regression equation is suitably predictive.
Similarly, the data of the dependent variable during the water-storage period was substituted into Equation (11) to calculate the predicted value of the osmotic pressure, and the time series of the predicted and measured values was then plotted, as shown in Figure 10. A goodness of fit of R 2 = 0.97 indicates that the fitting effect is good and that the regression equation can be used to predict the osmotic pressure during the water-storage period. A goodness of fit [33] of R 2 = 0.97 indicates that the fitting effect is good and the regression equation is suitably predictive.
Similarly, the data of the dependent variable during the water-storage period was substituted into Equation (11) to calculate the predicted value of the osmotic pressure, and the time series of the predicted and measured values was then plotted, as shown in Figure 10.

Percolation Prediction
The data of the dependent variable during the construction period was substituted into Equation (7) to calculate the predicted value of the osmotic pressure, and the time series of the predicted and measured values was then plotted, as shown in Figure 9. A goodness of fit [33] of R 2 = 0.97 indicates that the fitting effect is good and the regression equation is suitably predictive.
Similarly, the data of the dependent variable during the water-storage period was substituted into Equation (11) to calculate the predicted value of the osmotic pressure, and the time series of the predicted and measured values was then plotted, as shown in Figure 10. A goodness of fit of R 2 = 0.97 indicates that the fitting effect is good and that the regression equation can be used to predict the osmotic pressure during the water-storage period. A goodness of fit of R 2 = 0.97 indicates that the fitting effect is good and that the regression equation can be used to predict the osmotic pressure during the water-storage period.

Discussions and Conclusions
Seepage of a high core rock fill dam is mainly affected by factors such as water level, rainfall, temperature, filling height, and aging. The monitoring of seepage is done by vibrating wire osmometer. In order to make the research more universal, the monitoring data of construction period and water storage period are selected to establish the seepage monitoring model in the current research. The recommended method in this paper effectively eliminates the abnormal data collected by the measurement control unit and makes the sample data standard. The principal component analysis is used to solve the problem of the multicollinearity among independent variables, and the influence factors of seepage are correctly analyzed by principal component regression. Meanwhile, using the established model to predict the seepage, the fitting accuracy is high. The traditional multiple linear regression analysis method which is directly established through the sample data, without considering the abnormal values of the sample data and the multicollinearity among the influence factors. So, the established model in traditional method cannot effectively explain the relationship between seepage and influencing variables, and the model has low prediction accuracy or even failed to model.
The recommended method in this paper has established the seepage monitoring model for the construction period and the storage period of a high core-wall rockfill dam and the model was used to predict the seepage flow. The results were shown to be good, which suggests this model should have strong guiding significance for related research on the same type of dam and great practical value in actual project.
There are also some limitations in this paper that the article uses the established regression analysis for the seepage prediction, which needs to know the value of each impact factor and is unable to study autonomously. So, it is a passive prediction method. Further research should be conducted on the seepage prediction method, such as Long Short-Term Memory (LSTM) [34], to establish a deep learning method that can study autonomously, so as to improve the accuracy and efficiency of seepage prediction.
The method of this paper is focused on the measured data of sensors. Actually, the finite element method (FEM) of dam should be built, verifying and contrast with the recommended method in this paper, which is also a further research content in the future.