Next Article in Journal
Development of a Small CNC Machining Center for Physical Implementation and a Digital Twin
Previous Article in Journal
Development of an Electric Pulse Device for Coal Grinding
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Online Identification and Correction Methods for Multi-Type Abnormal Values in Seepage Pressure of Earth-Rock Dams

1
College of Hydraulic and Civil Engineering, Xinjiang Agricultural University, Urumqi 830052, China
2
Xinjiang Key Laboratory of Hydraulic Engineering Safety and Water Disaster Prevention, Xinjiang Agricultural University, Urumqi 830052, China
3
Fengxiang District Water Resources Construction Engineering Team, Baoji 721400, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(10), 5550; https://doi.org/10.3390/app15105550
Submission received: 15 April 2025 / Revised: 12 May 2025 / Accepted: 13 May 2025 / Published: 15 May 2025

Abstract

With the increasing service duration of dams, the analysis of seepage pressure monitoring data plays a crucial role in ensuring the safety of seepage behavior. However, seepage pressure monitoring systems are often subject to environmental disturbances, sensor failures, and other interfering factors, leading to anomalous measurements during data acquisition. To objectively reflect the true operational state of dams and address the limitations of conventional detection and identification methods—such as low efficiency, high subjectivity in evaluation, and ineffective recognition of multi-category outliers—this study constructed an online detection, identification, and correction method for multi-category anomalous values. Specifically, an enhanced particle filter incorporating a Bernoulli probability model is constructed to characterize multi-category outliers in seepage pressure monitoring data, building upon the traditional particle filter framework. Following online detection and identification, the MissForest imputation method is employed to rectify the anomalous values. In the case study, both the false detection rate and missed detection rate ranged between 0% and 10%. Comparative experiments with three alternative methods revealed significant differences in data reconstruction performance, with the proposed method achieving the highest R2 score (0.861) and the lowest RMSE (0.050) and MAE (0.052). The results demonstrate that the proposed method effectively identifies outliers, achieves superior reconstruction of seepage pressure data, and minimizes errors. Furthermore, this research provides a novel approach for detecting anomalous seepage pressure measurements and evaluating dam safety conditions.

1. Introduction

Seepage pressure monitoring data can accurately and effectively reflect the safety status changes of earth-rock dams [1], playing a critical role in ensuring their structural health and operational safety [2]. However, these data are often affected by human errors, sensor failures, noise, and unknown environmental disturbances [3], leading to abnormal measurements [4]. Generally, anomalies can be classified into three categories: random errors, systematic errors, and gross errors [5,6,7].
If distorted monitoring data are not timely identified, removed, and corrected before being used for analysis [8], they may result in false alarms and missed alarms [9]. Therefore, it is essential to perform anomaly detection and correction on monitoring data to prevent misjudgments of dam safety conditions and erroneous data interpretations [10]. Effective anomaly detection and precise imputation of missing data are of significant practical importance for data preprocessing, subsequent monitoring analysis, and early warning predictions.
Currently, extensive research has been conducted on anomaly detection and processing [11], primarily focusing on data dimensions, anomaly types, or methodological characteristics [12]. Existing time-series anomaly detection methods can be categorized into statistical approaches, traditional machine learning, and deep learning techniques [13,14,15,16,17].
Statistical anomaly detection methods typically identify anomalies by analyzing the probability distribution of time-series data. For instance, Li et al. [18] addressed misjudgments in anomaly detection by proposing an online robust identification and early warning model combining robust statistics with confidence intervals. Huang et al. [19] developed an unsupervised interpretable univariate anomaly detection method based on quantiles and skewness coefficients. Yu et al. [20] introduced a sliding-window anomaly detection approach by extracting the slope radius of confidence intervals from subsequences.
Traditional machine learning methods commonly include distance-based, density-based, clustering-based, tree-based, classification-based, and ensemble learning approaches. For example, Bai et al. [21] addressed the problem of distributed density-based outlier detection for large-scale data by proposing a distributed LOF calculation method to parallelize density-based outlier detection, with simulation experiments verifying the efficiency and effectiveness of the approach. Huang et al. [22] tackled the challenge of unknown anomalies in database outliers by introducing a novel outlier clustering detection algorithm called ROCF, which can automatically compute the anomaly rate of a database and effectively detect both individual outliers and outlier clusters. Maria et al. [23] focused on the need for timely dynamic anomaly detection in monitoring data streams and proposed a new parameter configuration method for continuous outlier monitoring in sliding window-based data streams using distance-based techniques, validating the approach on both real-world and synthetic datasets.
Deep learning-based methods for time series anomaly detection typically include convolutional neural networks (CNNs), recurrent neural networks (RNNs), graph neural networks (GNNs), and transformer-based approaches. For instance, Xing et al. [24] proposed an automatic anomaly diagnosis method using a multimodal deep neural network to address various abnormal patterns in structural health monitoring data. Li et al. [25] proposed a multi-feature deep fusion model for unstable log anomaly detection to handle challenges such as concept drift, noise interference, and ambiguous feature relationships, thereby improving detection accuracy in dynamic environments. Elhalwagy et al. [26] proposed a multi-channel input autoencoder framework for multivariate time series anomaly detection to overcome limitations of deep learning, including poor generalization capability and the requirement for large labeled datasets.
In summary, while traditional outlier detection methods can effectively identify anomalies in monitoring data, they require manual setting of outlier criteria or optimization algorithms to adjust detection parameters. This introduces strong subjectivity or operational complexity. Moreover, such methods are typically limited to detecting a single type of outlier and fail to effectively identify multiple types of anomalies. To address these limitations, this study constructed a novel approach for detecting, identifying, and imputing multi-type anomalous data. Building upon traditional particle filtering, an improved particle filter based on a Bernoulli probability model is constructed to characterize multi-type anomalies in monitoring data [27,28,29]. By extending the probability model parameters to the seepage pressure state vector, online detection and identification of multi-type anomalies are achieved. After removing anomalous data, the MissForest method—an iterative imputation algorithm based on random forest—is introduced for data correction. Subsequent case studies and comparative experiments validate the effectiveness of the proposed method. The results further demonstrate that the proposed approach overcomes key shortcomings of traditional methods, such as offline detection, low efficiency, strong subjectivity in evaluation, and inability to identify multi-type anomalies. Additionally, it enhances the capability for seepage pressure data reconstruction and provides reliable data support for dam safety assessment and abnormal seepage pressure monitoring.

2. Theory and Methodology

2.1. Theory of Traditional Particle Filter Algorithm

As a nonparametric implementation of nonlinear Bayesian filtering, the particle filter (PF) approximates the posterior probability distribution through random sampling and importance resampling. Unlike classical methods such as the Kalman filter, which struggle with highly nonlinear systems, the PF demonstrates strong parameter identification capabilities in nonlinear and non-Gaussian systems without relying on linear dynamic models or Gaussian assumptions. This makes PF more effective and versatile for practical applications. The traditional particle filtering algorithm for estimating the seepage pressure state vector [28] proceeds as follows:
Step 1: Importance Sampling (Initialization)
For i = 1, 2, …, N, sampling new particles x k i According to the
x k i ~ q x k i x k 1 i , z k
where   x k i   represents the particle state sequence at k time, and q(·) denotes the importance sampling density function.
Step 2: Weight Update
Based on current observations   z k , calculate the weights for the new set of particles x k i i = 1 N :
ω k i = ω k 1 i p z k x k i p x k i x k 1 i / q x k i x k 1 i , z k
where   z k   denotes the observation sequence, w k i represents the particle weight at k time, and p(·) stands for the probability density function.
Normalization: ω k i = ω k i / i = 1 N s ω k i
Step 3: Resampling
Calculate the number of effective particles: N e f f = 1 / i = 1 N s ω k i 2
If it is less than the given threshold, N t h : N e f f < N t h resample to get a new set of particles x k i , 1 / N i = 1 N s
Step 4: State Estimation
The seepage pressure state is estimated by calculating the weight and state of all particles x ^ k = i = 1 N s ω k i x k i
Variance estimation: p k = i = 1 N s ω k i x k i x ^ k x k i x ^ k T and return to step 2.

2.2. Improved Particle Filter Online Recognition Method

The conventional particle filter (PF) suffers from limitations including suboptimal importance density selection, particle degeneracy, and high computational burden. Furthermore, direct processing of anomalous measurements without proper treatment introduces bias in model predictions [27]. To address these challenges while mitigating the impact of abnormal data on estimation accuracy, this study proposes an improved particle filtering framework that integrates statistical modeling of multi-type anomalies [30].
The state vector (particle) M is randomly generated by the probability density function f(y0) of the initial state vector: y0(i), i = 1, 2, …, M; y0 is the seepage pressure state vector.
Taking into account the deployment of NZ during seepage pressure monitoring for a sensor, the measurement equation can be expressed as in Equation (1):
A k = h y k + r k
where y k R N y is the state vector of the system at k time; A k R N z is the measurement vector at k time; h is the measurement function; r k R N z is the measurement noise vector at k time, which can be simulated as a normal distribution with zero mean and Rk covariance matrix.
After measuring the quantity Ak at k time, the likelihood function of p A k y k k 1 ( i ) each particle y k k 1 ( i ) can be calculated in Equation (2):
p A k y k k 1 ( i ) 1 2 π N z 2 R k 1 / 2 exp 1 2 A k h y k k 1 ( i ) R k 1 A k h y k k 1 ( i ) T i = 1 , 2 , , M
Normalize the likelihood values for all particles:
w ˜ k k 1 ( i ) = p A k y k k 1 ( i ) i = 1 M p A k y k k 1 ( i ) i = 1 , 2 , , M
where w ˜ k k 1 ( i ) is the weight. Then the yk state is estimated in Equation (4):
y ^ k k 1 = 1 M i = 1 M y k k 1 ( i ) w ˜ k k 1 ( i )
Finally, resampling can be used to avoid particle degradation. A set of particles is randomly generated based on w ˜ k k 1 ( i ) i = 1 , 2 , , M weighted resampling, and the resampled particles are defined as y k k ( i ) i = 1 , 2 , , M .
Statistical characteristic model with multiple types of outliers in measurement data:
A k = h y k + r k + I K , 1 Θ u k + I k , 2 Θ v k
where the symbol Θ represents the Hadamard product; u k R N z is the random error in the measurement data at k time, which can be simulated as a normal distribution with zero mean and covariance matrix k u ; v k R N z is the coarse error in the measurement data at k time, which can be simulated as a normal distribution with mean mk and covariance matrix k v , and mk can be represented by the following Markov model:
m k = I k 1 , 0 + I k 1 , 1 Θ m k 0 + I k 1 , 2 Θ m k 1 + Δ m k 1
where Δ m k 1 is simulated as a normal distribution with zero mean and covariance matrix k Δ ; m k 0 it can be written as in Equation (7):
m k 0 = m k 01 , , m k 0 l T
where m k 0 l l = 1 , , N z is the l component of the vector m k 0 , and m k 0 l is simulated as a uniform distribution on the interval a l , b l .
I k , n R N z , n 0 , 1 , 2 is a multi-Bernoulli random vector, which can detect the occurrence of multiple types of outliers on different measurement channels. It is denoted as in Equation (8):
I k , n = δ J k 1 n , , δ J k N z n T         n 0 , 1 , 2
where δ is the Dirac function; vectors I k , n are used to represent the occurrence, location, and timing of different types of outliers in multiple measurement channels. For example, I k , 0 = 1 , 1 , , 1 T represents no outliers in the monitored data at a k moment, I k , 1 = 1 , 0 , , 0 T represents the first measurement channel showing random errors at a k moment, and I k , 2 = 0 , 1 , , 0 T represents coarse errors appearing in the second measurement channel at a k moment. The elaboration of the probability of outliers in monitoring data based on Bernoulli probability theory is grounded in the fact that outlier identification is essentially a type of binary classification problem, to which the Bernoulli probability model is well suited for probabilistic description. Thus, the binary nature of outlier detection aligns closely with the Bernoulli probability statistical method, and the model architecture is interpretable and easy to understand. Given the assumption of independence in Bernoulli experiments—where the outcome of each trial does not affect the others—the anomaly status of each data point can be analyzed independently during outlier detection without considering the influence of other data points. This significantly reduces the computational complexity of the detection process and improves processing efficiency. And J k it can be written as in Equation (9):
J k = J k 1 , , J k N z T           J k l 0 , 1 , 2       l = 1 , , N z
where J k l is a Markov chain, transition probabilities is p J k l = β J k 1 l = α . For any l m , J k l and J k m are independent of each other. J k l 0 , 1 , 2 represents whether the l measurement at k time is abnormal and the type of anomaly. For example, J k = 0 , 0 , , 0 T ( I k , 0 = 1 , 1 , , 1 T ) indicates no anomalies; J k = 1 , 1 , , 1 T ( I k , 1 = 1 , 1 , , 1 T ) indicates random errors; and when J k = 2 , 2 , , 2 T ( I k , 2 = 1 , 1 , , 1 T ) it indicates gross errors. It can be seen that J k and I k , n can substitute for each other when indicating whether an anomaly exists and its type.
To account for multiple types of outliers, an improved particle filtering algorithm is proposed, where the augmented state vector is defined as in Equation (10):
B k = y k T , m k T , J k T T
where yk is the seepage pressure state vector; mk and Jk are the parameter vectors of the outlier detection model, and they are independent of each other.
Use the initial probability density p B 0 to randomly generate an augmented state vector M:
B 0 ( i ) = y 0 ( i ) T , m 0 ( i ) T , J 0 ( i ) T T       i = 1 , 2 , , M
Predicting the next moment particle B k k 1 ( i ) at every moment k = 1 , 2 , 3 , :
B k k 1 ( i ) = y k k 1 ( i ) T , m k k 1 ( i ) T , J k k 1 ( i ) T       i = 1 , 2 , , M
where B k k 1 ( i ) it is composed of three particles: y k k 1 ( i ) , m k k 1 ( i ) and J k k 1 ( i ) , y k k 1 ( i ) can be obtained by calculation, and m k k 1 ( i ) can also be obtained by prior probability density function, which can be written as in Equation (13):
p m k m k 1 k 1 ( i ) , J k 1 k 1 ( i ) = l = 1 N z δ J k 1 k 1 ( i ) l + δ J k 1 k 1 ( i ) l 1 u ( m k l ; a l , b l ) + δ J k 1 k 1 ( i ) l 2 N ( m k l ; m k 1 k 1 ( i ) l , σ k Δ l 2 )
where the superscript l is the l element of the vector; J k 1 k 1 ( i ) l is the value of the l dimension of the particle J k 1 k 1 ( i ) ; m k 1 k 1 ( i ) l is the value of the l dimension of the particle m k 1 k 1 ( i ) ; σ k Δ l 2 is the variance of the l dimension of the Δ m k 1 ; u is a uniform distribution on the interval a l , b l .
Finally, J k k 1 ( i ) it can also be obtained through the prior probability density function, which can be written as in Equation (14):
p J k J k 1 k 1 ( i ) = l = 1 N z 1 3 δ J k l n       n 0 , 1 , 2
After the measured vector A k is obtained at k moment, the likelihood function p A k B k k 1 ( i ) of each particle B k k 1 ( i ) is:
p A k B k k 1 ( i ) 1 2 π N z / 2 1 / 2 exp 1 2 ε 1 ε T       i = 1 , 2 , , M
where it can be written as in Equation (16):
= R k + d i a g I k k 1 , 1 ( i ) Θ k u + d i a g ( I k k 1 , 2 ( i ) ) Θ k v
where d i a g I k k 1 , 1 ( i ) and d i a g ( I k k 1 , 2 ( i ) ) are the main diagonal elements, and are the diagonal matrix of the vector I k k 1 , 1 ( i ) and I k k 1 , 2 ( i ) , ε it can also be written as in Equation (17):
ε = A k h y k k 1 ( i ) I k k 1 , 2 ( i ) Θ m k k 1 ( i )
Normalize the likelihood probability density of all particles:
w ˜ k k 1 ( i ) = p A k B k k 1 ( i ) i = 1 M p A k B k k 1 ( i )       i = 1 , 2 , , M
where w ˜ k k 1 ( i ) is the weight. The augmented state vector B k can be expressed as in Equation (19):
B ^ k k 1 = 1 M i = 1 M B k k 1 ( i ) w ˜ k k 1 ( i )       i = 1 , 2 , , M
Finally, resampling is carried out. A group of particles are randomly generated based on weighted w ˜ k k 1 ( i ) resampling and defined as B k k ( i ) ( i = 1 , 2 , , M ) .
Thus, an improved particle filtering method incorporating multi-type outlier detection is established. The flowchart of the proposed algorithm is shown in Figure 1.

2.3. MissForest Filling and Correction Method

The MissForest imputation method [29], based on the random forest algorithm, is introduced as a non-parametric approach for missing value imputation. Its key advantage lies in its low dependency on data distribution assumptions, making it suitable for handling mixed datasets containing both discrete and continuous variables. The core principle involves training a random forest predictive model using the complete portion of the dataset, then iteratively predicting and optimizing missing values until the imputed results converge. The hypothesis M = M 1 , M 2 , , M n is a matrix of n × p dimensions with missing values to be filled, Ms presents any variable containing missing values, Samples with missing values in the variable Ms are denoted as i m i s ( s ) 1 , 2 , , n , while samples with observed values Ms are denoted as i o b s ( s ) 1 , 2 , , n ; y o b s ( s ) represents the Ms observed value, y m i s ( s ) represents the Ms missing value, x o b s ( s ) represents Ms other observed values, and x m i s ( s ) represents Ms other missing values. The implementation workflow of the algorithm is as follows:
Preprocessing Phase: Perform initial imputation on missing values M using methods such as mean filling or linear interpolation.
Variable Sorting: Sort variables in ascending order based on Ms their missing rate to generate the initial imputation matrix M o l d ( s ) .
Model Training: For each target variable Ms, x o b s ( s ) treat as input variables and y m i s ( s ) as output variables, then train a random forest prediction model.
Prediction and Imputation: Input x m i s ( s ) into the trained model to obtain predicted missing values y m i s ( s ) , updating the imputation matrix as M n e w i m p .
Iterative Optimization: Repeat steps 3–4 until convergence is met. The final imputed matrix is stored as M n e w i m p , while the previous version M n e w i m p as M o l d i m p   is retained.
Convergence Criterion: Use the difference Δ N between two consecutive iterations as the metric. Terminate the process when the difference Δ N first increases, preserving the optimal imputation result.

2.4. Development of a Reconstruction Model for Seepage Pressure Anomaly Data

The proposed online detection, identification, imputation, and correction method. Substantively, online detection and identification are first performed based on the Bernoulli probability model which autonomously establishes outlier determination criteria without manual intervention, while simultaneously representing the occurrence timing, location, and magnitude of multi-type anomalies across different measurement channels. Identified anomalous data and removal: The MissForest imputation method is finally applied to reconstruct missing values, yielding corrected seepage pressure data for each monitoring point. The workflow of the proposed seepage pressure multi-type anomaly reconstruction model is illustrated in Figure 2.

3. Case Analysis

3.1. Project Overview

The Wuluwati Water conservancy project is located in Hotan, Xinjiang province. It has comprehensive benefits such as irrigation, flood control, power generation, and ecological improvement. The reservoir dam site controls a basin area of 1.99 million hm2. The total storage capacity is 333.6 million m3. The project consists of several important buildings, including a river dam, an overflow channel, a flood discharge and sand discharge tunnel, a sand flushing tunnel, a power generation water diversion tunnel, and a hydropower station building. Through the regulation of the reservoir, the project has added 46,000 hm of irrigated area2 and improved irrigation area of 75,300 hm2. With an average annual power generation of 197 million KW, it has made important contributions to the sustainable development of local society and economy.

3.2. Seepage Pressure Data Acquisition and Outlier Processing

The seepage pressure data were obtained from the seepage monitoring historical records of the Wuluwati Hydro Project’s operational data monitoring platform. Considering the varying influencing factors and variation patterns of dam foundation piezometric monitoring points, representative measurement points were selected, including P1 (grout curtain anti-seepage line), P3 (cushion layer area), P4 (upstream section), and P9 (downstream section). To ensure data completeness of environmental variables, the dataset spanning from 1 January 2015 to 5 April 2019 (totaling 1511 days of monitoring records) incorporating both environmental parameters and seepage monitoring data was selected for analysis. Figure 3 illustrates the layout of the osmotic pressure monitoring points, while Figure 4 presents the corresponding time-series data curves for each measurement point.
Initial implementation of the Bernoulli filtering algorithm for particle state tracking at monitoring point P1 provides an intuitive evaluation of the enhanced particle filter method’s potential in measurement point identification.
From Figure 5, it can be observed that the improved particle filter achieves satisfactory performance in particle state tracking. Although certain locations exhibit relatively large prediction errors, the algorithm successfully tracks the initial data overall.
The fitting results reveal fluctuations in filtering accuracy, which primarily reflect the dynamic variations in particle states in the time series. Over time, the distribution of particles in the state space is influenced by system noise and observational factors, leading to positional fluctuations. Despite these error variations, the update mechanism of the particle filter algorithm progressively corrects state estimations, reduces deviations, and ultimately achieves state convergence.
As demonstrated in Figure 6a by the comparison between true values and filtered values, the improved particle filter method successfully identifies and tracks the true values despite detection fluctuations. Even in the presence of anomalous data, the filtered values gradually converge toward the actual measurements.
Figure 6b further validates this conclusion through error analysis curves. The detection and identification errors based on the Bernoulli filtering algorithm stabilize and remain at a low level as the number of iterations increases, demonstrating the algorithm’s capability to effectively correct state estimations and reduce deviations during prolonged processing.
In order to verify the effectiveness of the proposed method, considering the problem of filter graph reduction when there are too many time steps, the seepage pressure data time steps of multiple measurement points P1, P3, P4, and P9 at a certain time of 200 days (from 9 April 2015 to 25 October 2015) were selected for example analysis.
Figure 7 shows the online detection and identification results of the measurement point parameters mk and Jk, where mk is the particle filter identification value and Jk is the particle filter detection value. The measurement point P1 is combined with parameters mk and Jk. It can be seen that the time step is [30,50], [60,75], and [170,200], which are inconsistent with the true value, resulting in coarse error; at the same time, random error is generated between measurement points [20,25] and [95,125]. Similarly, measurement point P3 is combined with parameters mk and Jk. It can be seen that the time step is inconsistent with the true value at [40,50], [50,100], [100,150], and [190,200], resulting in coarse error; random error is generated between measurement points [20,30] and [170,190]. Measurement point P4 is combined with parameters mk and Jk. It can be seen that time steps [50,85], [115,125], [150,175], and [185,200] are inconsistent with the true values, resulting in coarse errors; random errors occur between measurement points [15,30] and [125,150]. Measurement point P9 combines parameters mk and Jk. It can be seen that the time steps are [50,75], [120,145], and [175,185], which are inconsistent with the real value, resulting in coarse error; the measurement points between [15,30], [75,105], [150,175], and [190,200] produce random error.
To quantitatively assess the outlier detection performance of the proposed method, two evaluation metrics were introduced: false negatives (FN) and false positives (FP). A false negative occurs when abnormal monitoring data are incorrectly identified as normal data, while a false positive refers to normal monitoring data being mistakenly classified as abnormal data. Details are provided in Table 1.
The monitoring data from points P1, P3, P4, and P9 were subsequently corrected using the following procedure: Based on the historical seepage monitoring operational data of the hydraulic complex, the improved particle filter algorithm was first employed for real-time detection and identification of multi-type outliers. Following the removal of anomalous data points, the resulting missing values were then reconstructed and corrected using the MissForest iterative imputation method.
According to Figure 8, the measurement point P1 time step is in [30,50], and the coarse error of [60,75] and [170,200] is drift, and the measurement point is in random error in [20,25] and [95,125] and fluctuates up and down around the true value. Similarly, the P3 time step is in [40,50], [100,150], and [190,200]; coarse error is drift; in [50,100], coarse error is deviation; the measurement point is random error in [20,30] and [170,190], and it fluctuates around the true value. The P4 time step is drift in [115,125] and [185,200] and deviation in [50,85] and [150,175]; the measurement point is random error in [15,30] and [125,150] and can fluctuate around the true value. The P9 time step is drift in [50,75] and [120,145] coarse error and deviation in [175,185]; the measurement point is random error in [15,30], [75,105], [150,175], and [190,200], fluctuating up and down around the true value.

3.3. Comparative Experimental Analysis and Data Reconstruction Effectiveness

To evaluate the effectiveness of the proposed outlier detection, identification, and imputation-correction method in seepage pressure data reconstruction, we conducted a comparative experimental validation using four different method combinations based on actual engineering operational datasets. Details are provided in Table 2.
In order to better evaluate the accuracy of data reconstruction, the coefficient of determination (R2), root mean square error (RMSE), and mean absolute error (MAE) were used as evaluation indexes to comprehensively judge the accuracy of seepage pressure data reconstruction. R2 is an indicator for measuring the fit of a method; the closer it is to 1, the better the result. RMSE directly reflects the error in data reconstruction; the smaller its value, the closer the detected values are to the actual measured values. MAE is unaffected by the scale of the data; the smaller its value, the higher the data reconstruction accuracy of the method.
The calculation equation of each evaluation index is in Equations (20)–(22):
R 2 = 1 i = 1 n y ^ i y i 2 / i = 1 n y ¯ i y i 2
R M S E = 1 n i = 1 n y i y ^ i 2
M A E = 1 n i = 1 n y i y ^ i
where yi is the true value; y ^ i is the detection value; n is the number of samples; y ¯ i is the average of the true value.
The experiment evaluates the reconstruction accuracy and stability of each method by comparing the differences in data reconstruction effect among four combination methods (traditional particle filter + mean filling, traditional particle filter + MissForest filling, improved particle filter + mean filling, and improved particle filter + MissForest filling).
From Figure 9, it can be seen that the particle scatter distribution under filtering indicates the effectiveness of methods in the following order: method 4 > method 3 > method 2 > method 1. The improved particle filter combined method can to some extent determine the concentration of particle distribution, as shown by the particle distributions under method 3 and method 4. Moreover, the particle distribution based on the MissForest filling method is superior to that of the mean filling method and more stable. The combined method based on method 4 in Figure 9 demonstrates better data reconstruction capability, indicating its method is more reliable and effective.
From the distribution chart in Figure 10, method 4 (improved particle filter + MissForest Imputation) demonstrates concentrated and stable particle distribution with the smallest error range, indicating superior data filtering performance. In contrast, the filter particles of method 1 (traditional particle filter + mean filling method) have a larger and more fluctuating error range, indicating poor method effectiveness and significant data fluctuations. Compared to method 1, the error distribution of the filter particles in method 2 is more concentrated, suggesting that the MissForest filling method effectively reduces data fluctuations and provides higher precision reconstruction results. The error distribution of the filter particles in method 3 is even better than that in method 2, indicating that the improved particle filter method can better achieve data reconstruction and reduce data fluctuations. Through the comparison of these four methods, method 4 further confirms that the filter particles under its combined method approach closer to real data. This validates that the combination of the improved particle filter and MissForest filling method offers superior performance in data reconstruction, as well as advantages in controlling data fluctuations and errors.
As can be seen from the distribution diagram in Figure 11, method 4 determines the coefficient of determination R2, the root mean square error RMSE, and the average absolute error MAE are the best compared with the other three methods. The R2 maximum value is 0.861; the minimum RMSE is also achieved, with the lowest being 0.050; the minimum MAE is also reached, with the lowest values of 0.052, further confirming the superiority of method 4 (improved particle filter + MissForest filling method). This indicates that the data processed by method 4 has a higher fit degree and better performance; the smallest RMSE value also reflects smaller reconstruction errors, verifying that the detected values are closer to the actual measured values; the minimum MAE value, considering it unaffected by data scale, further validates the advantages of method 4 in data processing and comprehensively demonstrates that method 4 has the highest accuracy in seepage pressure data reconstruction.

4. Discussion

Comprehensive case studies and comparative experiments verify the effectiveness and superiority of the proposed method. The results demonstrate that the presented anomaly detection and data reconstruction approach (method 4) exhibits significant advantages in multiple aspects, including anomaly identification accuracy, data fluctuation control, error distribution stability, and data reconstruction performance. Based on the research findings, the following discussions are presented:
  • Anomaly Detection Performance Analysis
The proposed method can effectively detect abnormal data that significantly deviate from normal patterns in real time, with both false positive and false negative rates controlled within 10%, outperforming traditional methods. By combining mk and Jk indicators, the method can not only distinguish between random error data, systematic drift data, and gross error data but also accurately identify the position and magnitude of anomalies. During data imputation, the MissForest algorithm can fill missing values following the operational patterns of osmotic pressure, avoiding the bias introduced by mean imputation methods.
2.
Data Reconstruction Performance Comparison
Particle distribution: the improved particle filter significantly enhances particle concentration through an optimized resampling strategy, and its combination with MissForest further improves distribution stability. Data fluctuation control: the adaptive noise parameter adjustment in the improved particle filter effectively reduces reconstruction data fluctuation, while MissForest’s preservation of nonlinear relationships between variables further decreases random errors in imputed data. Error distribution characteristics: method 4 shows the smallest and most concentrated error range, proving more effective for seepage pressure state estimation.
3.
Reconstruction Accuracy Validation
Comprehensive evaluation using R2, RMSE, and MAE metrics confirms method 4’s superior reconstruction accuracy: the highest R2 (0.861) indicates a stronger correlation between reconstructed and measured data. Lower RMSE (0.050) and MAE (0.052) demonstrate better error control. These results align well with the observed particle concentration and error distribution stability.
4.
Method Limitations
Despite the synergistic effect of improved particle filter and MissForest in enhancing reconstruction quality, two limitations remain: MissForest’s computational efficiency with high-dimensional data requires further optimization. Anomaly detection criteria rely on historical data statistics and may need dynamic adjustment in non-stationary environments. Future research could focus on improving online learning mechanisms to improve adaptability.
5.
Implications
The methodology constructed in this study provides an improved approach for dam safety assessment and identification of abnormal seepage pressure measurements, addressing the limitations of traditional detection methods. It offers reliable data support for seepage pressure data reconstruction and state evaluation. This research contributes to advancing dam health monitoring and diagnostics, helping to prevent catastrophic dam failures and seepage-related safety incidents. Ultimately, it supports the stable development of society and safeguards public lives and property.

5. Conclusions

This study achieves effective reconstruction of seepage pressure monitoring data through an integrated approach combining improved particle filtering for multi-type anomaly detection and MissForest imputation for data correction, validated by case studies and comparative experiments. The key conclusions are as follows:
  • The proposed method can accurately identify multiple types of anomalies (including random errors, systematic biases, and gradual drifts) in dam seepage pressure monitoring data, while the MissForest imputation algorithm effectively fills missing values with data that conform to the characteristic patterns of seepage pressure behavior.
  • The analysis of practical examples demonstrates that the proposed method eliminates the need for manual setting of outlier detection criteria. By combining mk and Jk, it can accurately identify the types of outliers while also determining their magnitude and location. This provides a novel solution to the limitation of conventional methods, which can only detect a single type of outlier and fail to effectively determine multiple types of outliers.
  • The experimental comparison results demonstrate that the proposed method achieves better reconstruction performance for anomalous data than the other three methods. The improved particle filter exhibits significantly better detection performance than the traditional particle filter, with RMSE and MAE reduced by 34.6% and 51.0%, respectively. Meanwhile, the MissForest imputation method outperforms mean imputation, reducing RMSE and MAE by 18.8% and 34.7%, respectively.
  • The proposed method effectively addresses the limitations of conventional detection approaches in dam safety assessment and seepage pressure anomaly identification, such as offline detection, low efficiency, strong subjectivity in evaluation, and the inability to accurately identify multiple types of outliers. Additionally, it enables effective reconstruction of seepage pressure data, providing reliable data support for seepage state assessment.

Author Contributions

Methodology, software, conceptualization, validation, writing—original draft preparation, K.F.; formal analysis, writing—review, supervision, funding acquisition, C.Y.; software, validation, L.P.; conceptualization, writing—review, J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the 2024 Graduate Research and Innovation Program of Xinjiang Agricultural University (XJAUGRI2024015), the Graduate Education Reform Project of Xinjiang Agricultural University (xjaualk–yjs–2018005).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors gratefully acknowledge the support from the 2024 Graduate Research and Innovation Program of Xinjiang Agricultural University (XJAUGRI2024015), the Graduate Education Reform Project of Xinjiang Agricultural University (xjaualk–yjs–2018005), and the author thanks Xinjiang Agricultural University for providing research practice platform.

Conflicts of Interest

The authors have no relevant financial or non-financial interests to disclose.

References

  1. Yang, X.; Xiang, Y.; Wang, Y. A Dam Safety State Prediction and Analysis Method Based on EMD-SSA-LSTM. Water 2024, 16, 395. [Google Scholar] [CrossRef]
  2. Ma, C.; Xu, X.; Yang, J.; Cheng, L. Safety Monitoring and Management of Reservoir and Dams. Water 2023, 15, 1078. [Google Scholar] [CrossRef]
  3. Meng, Z.; Wang, Y.; Zheng, S.; Wang, X.; Liu, D.; Zhang, J.; Shao, Y. Abnormal Monitoring Data Detection Based on Matrix Manipulation and the Cuckoo Search Algorithm. Mathematics 2024, 12, 1345. [Google Scholar] [CrossRef]
  4. Gu, C.; Wang, Y.; Gu, H.; Hu, Y.; Yang, M.; Cao, W.; Fang, Z. A Combined Safety Monitoring Model for High Concrete Dams. Appl. Sci. 2022, 12, 12103. [Google Scholar] [CrossRef]
  5. Zhu, Z.; Meng, Z.; Zhang, Z. Robust particle filter for state estimation using measurements with different types of gross errors. ISA Trans. 2017, 69, 281–295. [Google Scholar] [CrossRef]
  6. Hussain, A.C.; Muhammad, T.; Momin, U. A Robust Bayesian Approach for Online Filtering in the Presence of Contaminated Observations. IEEE Trans. Instrum. Meas. 2021, 70, 1–15. [Google Scholar]
  7. Zhang, W.; Li, H.; Shi, D.; Shen, Z.; Zhao, S.; Guo, C. Determination of Safety Monitoring Indices for Roller-Compacted Concrete Dams Considering Seepage–Stress Coupling Effects. Mathematics 2023, 11, 3224. [Google Scholar] [CrossRef]
  8. Valluru, J.; Patwardhan, S.C.; Biegler, L.T. Development of robust extended Kalman filter and moving window estimator for simultaneous state and parameter/disturbance estimation. J. Process Control 2018, 69, 158–178. [Google Scholar] [CrossRef]
  9. Prakash, G.; Dugalam, R.; Barbosh, M.; Sadhu, A. Recent Advancement of Concrete Dam Health Monitoring Technology: A Systematic Literature Review. Structures 2022, 44, 766–784. [Google Scholar] [CrossRef]
  10. Pang, G.; Shen, C.; Cao, L.; Hengel, A.V.D. Deep Learning for Anomaly Detection: A Review. ACM Comput. Surv. 2021, 54, 38. [Google Scholar] [CrossRef]
  11. Mao, Y.; Li, J.; Qi, Z.; Yuan, J.; Xu, X.; Jin, X.; Du, X. Research on Outlier Detection Methods for Dam Monitoring Data Based on Post-Data Classification. Buildings 2024, 14, 2758. [Google Scholar] [CrossRef]
  12. Boukerche, A.; Zheng, L.; Alfandi, O. Outlier Detection: Methods, Models, and Classification. ACM Comput. Surv. 2020, 53, 1–37. [Google Scholar] [CrossRef]
  13. Garg, A.; Zhang, W.; Samaran, J. An evaluation of anomaly detection and diagnosis in multivariate time series. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 2508–2517. [Google Scholar] [CrossRef] [PubMed]
  14. Jain, P.K.; Bajpai, M.S.; Pamula, R. A modified DBSCAN algorithm for anomaly detection in time-series data with seasonality. Int. Arab J. Inf. Technol. 2022, 19, 23–28. [Google Scholar] [CrossRef]
  15. Choi, K.; Yi, J.; Park, C. Deep learning for anomaly detection in time-series data:Review, analysis, and guidelines. IEEE Access 2021, 9, 120043–120065. [Google Scholar] [CrossRef]
  16. Li, G.; Jung, J.J. Deep learning for anomaly detection in multivariate time series:Approaches, applications, and challenges. Inf. Fusion 2023, 91, 93–102. [Google Scholar] [CrossRef]
  17. Mao, J.X.; Wang, H.; Spencer, B.F., Jr. Toward data anomaly detection for automated structural health monitoring:exploiting generative adversarial nets and auto-encoders. Struct. Health Monit. 2021, 20, 1609–1626. [Google Scholar] [CrossRef]
  18. Li, X.; Li, Y.; Lu, X. An online anomaly recognition and early warning model for dam safety monitoring data. Struct. Health Monit. 2020, 19, 796–809. [Google Scholar] [CrossRef]
  19. Huang, Y.; Liu, W.; Li, S. Interpretable Single-dimension Outlier Detection (ISOD): An Unsupervised Outlier Detection Method Based on Quantiles and Skewness Coefficients. Appl. Sci. 2024, 14, 136. [Google Scholar] [CrossRef]
  20. Yu, Y.; Zhu, Y.; Li, S. Time series outlier detection based on sliding window prediction. Math. Probl. Eng. 2014, 1, 879736. [Google Scholar] [CrossRef]
  21. Mei, B.; Xite, W.; Junchang, X.; Guoren, W. An efficient algorithm for distributed density-based outlier detection on big data. Neurocomputing 2016, 181, 19–28. [Google Scholar]
  22. Huang, J.; Zhu, Q.; Yang, L.; Cheng, D.; Wu, Q. A novel outlier cluster detection algorithm without top-n parameter. Knowl. Based Syst. 2017, 121, 32–40. [Google Scholar] [CrossRef]
  23. Maria, K.; Anastasios, G.; Apostolos, N.; Papadopoulos, K.T.; Yannis, M. Efficient and flexible algorithms for monitoring distance-based outliers over data streams. Inf. Syst. 2016, 55, 37–53. [Google Scholar]
  24. Nong, X.; Luo, X.; Lin, S.; Ruan, Y.; Ye, X. Multimodal Deep Neural Network-Based Sensor Data Anomaly Diagnosis Method for Structural Health Monitoring. Buildings 2023, 13, 1976. [Google Scholar] [CrossRef]
  25. Li, M.; Sun, M.; Li, G.; Han, D.; Zhou, M. MDFULog: Multi-Feature Deep Fusion of Unstable Log Anomaly Detection Model. Appl. Sci. 2023, 13, 2237. [Google Scholar] [CrossRef]
  26. Elhalwagy, A.; Kalganova, T. Multi-Channel LSTM-Capsule Autoencoder Network for Anomaly Detection on Multivariate Data. Appl. Sci. 2022, 12, 11393. [Google Scholar] [CrossRef]
  27. Harada, R.; Shigeta, Y. Selection rules for outliers in outlier flooding method regulate its conformational sampling efficiency. J. Chem. Inf. Model. 2019, 59, 3919–3926. [Google Scholar] [CrossRef]
  28. Wan, H.P.; Ni, Y.Q. Bayesian multi-task learning methodology for reconstruction of structural health monitoring data. Struct. Health Monit. 2019, 18, 1282–1309. [Google Scholar] [CrossRef]
  29. Li, X.; Wen, Z.; Su, H. An approach using random forest intelligent algorithm to construct a monitoring model for dam safety. Eng. Comput. 2021, 37, 39–56. [Google Scholar] [CrossRef]
  30. Xu, Y.; Huang, H.; Li, Y. A three-stage online anomaly identification model for monitoring data in dams. Struct. Health Monit. SAGE Publ. 2022, 21, 1183–1206. [Google Scholar] [CrossRef]
Figure 1. Algorithm flow chart.
Figure 1. Algorithm flow chart.
Applsci 15 05550 g001
Figure 2. Flowchart of the multi-type anomaly reconstruction model for osmotic pressure data.
Figure 2. Flowchart of the multi-type anomaly reconstruction model for osmotic pressure data.
Applsci 15 05550 g002
Figure 3. Layout diagram of seepage pressure monitoring at the measuring point.
Figure 3. Layout diagram of seepage pressure monitoring at the measuring point.
Applsci 15 05550 g003
Figure 4. Process curve diagram of the corresponding measurement points.
Figure 4. Process curve diagram of the corresponding measurement points.
Applsci 15 05550 g004
Figure 5. Particle state tracking image of Bernoulli filtering algorithm.
Figure 5. Particle state tracking image of Bernoulli filtering algorithm.
Applsci 15 05550 g005
Figure 6. Filtering results and error diagram. (a) True value and filtered value curve; (b) error analysis curve.
Figure 6. Filtering results and error diagram. (a) True value and filtered value curve; (b) error analysis curve.
Applsci 15 05550 g006
Figure 7. Online detection and identification diagram of measurement points. (a) parameter mk value at monitoring point P1; (b) parameter mk value at monitoring point P3; (c) parameter Jk value at monitoring point P1; (d) parameter Jk value at monitoring point P3; (e) parameter mk value at monitoring point P4; (f) parameter mk value at monitoring point P9; (g) parameter Jk value at monitoring point P4; (h) parameter Jk value at monitoring point P9.
Figure 7. Online detection and identification diagram of measurement points. (a) parameter mk value at monitoring point P1; (b) parameter mk value at monitoring point P3; (c) parameter Jk value at monitoring point P1; (d) parameter Jk value at monitoring point P3; (e) parameter mk value at monitoring point P4; (f) parameter mk value at monitoring point P9; (g) parameter Jk value at monitoring point P4; (h) parameter Jk value at monitoring point P9.
Applsci 15 05550 g007
Figure 8. Reconstruction of measurement point data. (a) Anomaly correction of monitoring point P1; (b) anomaly correction of monitoring point P3; (c) anomaly correction of monitoring point P4; (d) anomaly correction of monitoring point P9.
Figure 8. Reconstruction of measurement point data. (a) Anomaly correction of monitoring point P1; (b) anomaly correction of monitoring point P3; (c) anomaly correction of monitoring point P4; (d) anomaly correction of monitoring point P9.
Applsci 15 05550 g008
Figure 9. Comparison chart of experimental method distributions. (a) Comparison of measurement methods at P1 monitoring point; (b) comparison of measurement methods at P3 monitoring point; (c) comparison of measurement methods at P4 monitoring point; (d) comparison of measurement methods at P9 monitoring point.
Figure 9. Comparison chart of experimental method distributions. (a) Comparison of measurement methods at P1 monitoring point; (b) comparison of measurement methods at P3 monitoring point; (c) comparison of measurement methods at P4 monitoring point; (d) comparison of measurement methods at P9 monitoring point.
Applsci 15 05550 g009
Figure 10. Residual distribution comparison plot of experimental methods. (a) Comparative analysis of error distributions for monitoring point P1; (b) comparative analysis of error distributions for monitoring point P3; (c) comparative analysis of error distributions for monitoring point P4; (d) comparative analysis of error distributions for monitoring point P9.
Figure 10. Residual distribution comparison plot of experimental methods. (a) Comparative analysis of error distributions for monitoring point P1; (b) comparative analysis of error distributions for monitoring point P3; (c) comparative analysis of error distributions for monitoring point P4; (d) comparative analysis of error distributions for monitoring point P9.
Applsci 15 05550 g010
Figure 11. Comparative distribution plot of evaluation parameters for experimental methods. (a) Size of R2 parameter distribution; (b) size of RMSE parameter distribution; (c) size of MAE parameter distribution.
Figure 11. Comparative distribution plot of evaluation parameters for experimental methods. (a) Size of R2 parameter distribution; (b) size of RMSE parameter distribution; (c) size of MAE parameter distribution.
Applsci 15 05550 g011
Table 1. Evaluation of detection and identification level of abnormal values.
Table 1. Evaluation of detection and identification level of abnormal values.
Osmotic Pressure Measurement PointNumber of Missed DetectionsNumber of False DetectionsLoss (%)False Detection Rate (%)
Point 1391.54.5
Point 37153.57.5
Point 44132.06.5
Point 9281.04.0
Table 2. Experimental comparison and verification design.
Table 2. Experimental comparison and verification design.
ExperimentAbnormal Value Identification MethodRefine the Correction Method
Method oneTraditional particle filter algorithmMean filling method
Method twoTraditional particle filter algorithmMissForest filling method
Method threeImprove the particle filter algorithmMean filling method
Method fourImprove the particle filter algorithmMissForest filling method
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fan, K.; Yue, C.; Pi, L.; Shi, J. Online Identification and Correction Methods for Multi-Type Abnormal Values in Seepage Pressure of Earth-Rock Dams. Appl. Sci. 2025, 15, 5550. https://doi.org/10.3390/app15105550

AMA Style

Fan K, Yue C, Pi L, Shi J. Online Identification and Correction Methods for Multi-Type Abnormal Values in Seepage Pressure of Earth-Rock Dams. Applied Sciences. 2025; 15(10):5550. https://doi.org/10.3390/app15105550

Chicago/Turabian Style

Fan, Ke, Chunfang Yue, Lilang Pi, and Jiachen Shi. 2025. "Online Identification and Correction Methods for Multi-Type Abnormal Values in Seepage Pressure of Earth-Rock Dams" Applied Sciences 15, no. 10: 5550. https://doi.org/10.3390/app15105550

APA Style

Fan, K., Yue, C., Pi, L., & Shi, J. (2025). Online Identification and Correction Methods for Multi-Type Abnormal Values in Seepage Pressure of Earth-Rock Dams. Applied Sciences, 15(10), 5550. https://doi.org/10.3390/app15105550

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop