1. Introduction
In the wake of unprecedented large-scale infrastructure building, the ensuing long-term operation phase is marked by an increasing prevalence of deterioration mechanisms, including structural aging, material degradation, and damage initiation and propagation. These deterioration processes compromise the safety and functionality of infrastructure systems, while the associated demands for inspection, repair, and retrofitting continue to escalate. Structural health monitoring (SHM), a critical field bridging engineering, data science, and materials research, enables the detection of potential anomalous behaviors, complementing visual inspections and supporting condition assessment and long-term operation and maintenance [
1,
2,
3]. Recent advances in sensing technologies, including smartphones [
4], computer vision [
5], non-contact testing [
6], and unmanned aerial vehicles [
7], together with developments in communication networks such as the Internet of Things [
8], cloud computing [
9] and even blockchain [
10], have expanded the capability and accessibility of SHM to acquire richer structural responses and extract features from raw measurements. Naturally, harnessing these data and features while interpreting them for structural anomaly detection has become increasingly critical.
SHM data interpretation faces substantial challenges from environmental variability, including time-varying temperature, humidity, wind, and uneven solar exposure, which can obscure or mimic genuine signatures of structural deterioration or anomalies [
11,
12,
13]. Among these factors, temperature is one of the most pervasive and problematic: variations in thermal conditions can alter elastic moduli and sensor responses, generating signals that are difficult to distinguish from those associated with actual deterioration [
14]. One conventional solution is direct compensation, which corrects new measurements by constructing regression models that relate recorded environmental factors, e.g., temperature, to structural responses [
15,
16]. While conceptually straightforward, such methods face inherent limitations: the complexity of environmental influences often defies complete measurement, and as structural changes, damage progression, and aging effects occur, the originally fitted regression models may become obsolete, thereby diminishing their effectiveness.
To overcome these limitations, unsupervised data normalization offers an efficient alternative, primarily relying on data-driven learning. These approaches streamline implementation and circumvent the challenges posed by long-term, intricate environmental observations [
12]. In practice, unsupervised machine learning (ML) models, applied either individually or in combi-nation, often adopt a residual strategy that exploits redundancy arising from spatiotemporal correlations in the measured dataset, rendering it intrinsically low-rank. This involves constructing a low-rank/dimensional subspace dominated by environmental variability and computing residuals between the raw data and its projection onto this subspace, thereby isolating the influence of environmental variability and enabling anomaly detection. The implementation of this strategy contingent upon constructing the low-rank/dimensional subspace via orthogonal projection (OP), employing classical matrix decomposition techniques [
17,
18,
19,
20] such as principal component analysis, eigenvalue decomposition, factor analysis, and independent component analysis, as well as through cointegration analysis [
21] and autoencoder neural networks [
22]. Other unsupervised approaches include clustering [
23] and transfer learning [
24], in which a baseline pattern is first learned and anomalies are subsequently detected via shifts in feature distributions.
Nevertheless, the quality of the acquired data is not always sufficient for the effective implementation of the aforementioned unsupervised ML algorithms. Corrupted or missing data can severely compromise data normalization processes such as principal component analysis, thereby disrupting the structural anomaly detection pipeline [
12,
25,
26]. From a data cleansing perspective, the low-rank structure inherent in the measured dataset can be fully leveraged. Low-rank matrix recovery approaches constitute a family of computational approaches, including robust principal component analysis (RPCA), matrix completion, non-negative matrix factorization and low-rank representation. Among them, RPCA model, which decomposes the raw data matrix into the sum of a low-rank matrix and a sparse noise matrix, has proven particularly effective in handling grossly corrupted entries [
27,
28]. Yang and Nagarajaiah [
29,
30] introduced the low-rank and sparse matrix decomposition methods for dynamic-imaging-based inspection of local structural damage, and later extended its application to two-dimensional strain fields with dense sensor layouts, successfully identifying sparse damage patterns. They further put forward the concept that multi-channel noisy structural vibration responses possess an intrinsic low-rank data structure, which can be exploited for system identification and anomaly detection [
31,
32]. Song et al. [
33] employed RPCA to remove sparse noise from distributed strain data obtained via fiber optic sensing, thereby enabling more accurate detection of structural microcracks.
Over the past decades, vibration-based SHM has emerged as a primary means of evaluating global structural condition. Within this framework, natural frequency has been the most widely adopted feature, owing to its direct link to intrinsic structural properties and its relative ease of extraction through operational modal analysis. However, as highlighted in the preceding discussion, natural frequency is profoundly sensitive to environmental variability, making anomaly detection highly susceptible to confounding influences. Unsupervised ML models, particularly those grounded in data normalization process, such as OP, provide effective tools to mitigate such effects, yet their successful deployment hinges on the reliability of the underlying data [
34]. In this context, corrupted or missing entries in datasets present a critical bottleneck, and addressing this issue has become essential to ensure the robustness of the anomaly detection pipeline. Maes et al. [
35] demonstrated that the low-rank data structure in sets can be harnessed to support principal component analysis, thereby enabling robust anomaly detection even when the data is corrupted. Xu et al. [
36] employed low-rank matrix approximation to pre-process imperfect frequency dataset prior to cointegration analysis, enhancing robustness in the presence of noisy or missing data. Notably, multi-order natural frequencies are prone to substantial missing entries as a consequence of the low success rate of in situ modal identification. When frequency datasets with such gaps are used for structural anomaly detection, the unavoidable removal of samples significantly undermines the capacity to assess structural condition in a timely and reliable manner.
This paper proposes exploiting the low-rank data structure inherent to robustly tackle structural anomaly detection under environmental variability. While OP-based residual strategies can isolate the influence of environmental variability, the presence of corrupted or missing data often hinders the reliable implementation of such data normalization process. To address this issue, a noisy low-rank matrix completion (NLRMC) model is introduced as a preprocessing step to recover the corrupted and missing data. The NLRMC model is expected to exert its advantage by simultaneously performing low-rank and sparse decomposition together with matrix completion, enabling the elimination of potential corruptions in the raw data matrix while imputing the missing entries, and ultimately ensuring the smooth execution of unsupervised pipeline. In addition, a moving window-based online anomaly-detection procedure is established following the integration of the NLRMC-OP approach with feature fusion and classification steps under unsupervised ML framework, from which a novelty indicator is extracted to assess the structural condition.
The structure of this paper is as follows.
Section 2 presents the methodology underlying the proposed online robust anomaly-detection procedure. This section emphasizes the theoretical exposition of the OP-based residual strategy and its limitations, as well as the introduction of the NLRMC model to compensate for the shortcomings of RPCA. It further includes theories of feature-fusion and classification steps that are essential for novelty-indicator extraction. The datasets and their subsets, to which the above methodology is applied are described in
Section 3. In
Section 4, the effectiveness and robustness of the proposed approach are substantiated under two subsets with different levels of data completeness.
2. Methods
2.1. OP-Based Residual Strategy for Anomaly Detection
In the context of long-term SHM activities, the measured structural response data are inevitably influenced by various environmental factors, which may obscure the presence of structural anomaly. In an unsupervised setting, orthogonal projection (OP)-based residual strategy is capable of decoupling environmental variability from structural anomaly-related components in the measured responses [
17,
18,
19,
20]. The core principle underlying the OP-based models lies in the fact that multi-channel sensing data or multi-order modal features inherently exhibit a low-rank/dimensional structure, which can be explicitly modeled [
12,
34,
35].
Herein, the term low-rank is primarily used to describe the mathematical property of the data matrix, reflecting its approximate rank deficiency due to the strong correlations among variables, whereas low-dimensional is adopted when interpreting the OP of the raw data matrix onto a low-rank/dimensional subspace or hyperplane, where deviations from this subspace or hyperplane can be regarded as indicative of structural anomalies induced by structural changes or damage progression. Unless otherwise stated, the two terms are closely related and the term low-rank is used more frequently throughout this paper.
At time
, the incoming measuring samples are organized into a column vector
, with
corresponding to the total number of sensing channels or identified modal orders. We define a baseline low-rank subspace
that captures the normal-state variability of structural response data related to time-varying environmental factors. This subspace is learned from baseline samples assumed to be free from structural changes or damage progression. Given a new sample
, the OP loss of
onto the low-rank subspace of
, denoted residual vector
, is computed as:
where
is the OP operator onto
. The residual vector
quantifies the orthogonal component of
that cannot be explained by the low-rank subspace
, and is thus hypothesized to reflect structural anomalies. Moreover, the norm of
serves as an anomaly score, where a larger magnitude indicates a higher likelihood of deviation.
Mathematically, under the assumption that each dimension of the data matrix follows an independent Gaussian distribution, the explicit model can be constructed through eigenvalue decomposition. If
denotes the data matrix with
samples of
, the eigenvectors of the covariance matrix of
can serve as the basis vectors for the OP model. The covariance of
is equivalent to
, where
is the sample mean of
with each row centered. Computationally, eigenvalue decomposition is then applied to obtain:
where
is an orthonormal matrix comprising with singular vectors or called eigenvectors;
represents a zero-truncated eigenvalue value matrix with
diagonal elements, each corresponding to an eigenvalue.
From the perspective of cumulative variance contribution, the eigenvalues are sorted in descending order, and the first
eigenvectors form the matrix
(with
), which are primarily influenced by a limited number of time-varying environmental factors and account for the largest proportion of the projection energy. The basic vectors in
thus span a low-rank subspace. The residual matrix
, i.e., the OP loss of
onto the low-rank subspace, is estimated as:
By comparing Equation (1) and Equation (3), it can be observed that the OP operator can be given by . Based on the operator learned under the normal state, and Equation (1), anomaly detection, also referred to as novelty detection, can be performed.
During long-term SHM implementation, corrupted entries may arise in the dataset owing to measurement irregularities under harsh in-service environments. In particular, extreme low-temperature conditions can generate outliers in natural frequencies [
35,
36], thereby limiting the applicability and performance of Gaussian noise-based OP models, which are inherently non-robust to such corruptions. As illustrated in
Figure 1, the presence of corrupted data points (black crosses) can substantially bias the estimated subspace obtained through OP (blue dashed line). Instead of aligning with the underlying true low-rank subspace defined by clean observations (red crosses along the gray line), the projection is distorted toward corrupted data, thereby undermining the reliability of the data normalization process and hindering accurate anomaly detection.
2.2. NLRMC
To overcome the limitations of OP model, RPCA [
27,
28] aims to recover the intrinsic low-rank structure of the measured data matrix
by decomposing it into two additive components: a low-rank matrix
that captures the dominant structural responses, and a sparse matrix
that accounts for corrupted measurements. The optimization problem is classically formulated as:
where the rank function enforces the low-dimensional structure of
, the
-norm promotes sparsity in
, and
controls the trade-off between enforcing the low-rank structure and suppressing sparse corruption.
Since directly solving this nonconvex problem is NP-hard, convex relaxations are typically adopted, enabling the separation of informative low-rank subspaces from sparse outliers in a computationally tractable manner. The rank function is replaced with the nuclear norm
, which is the sum of singular values of
, and the
-norm is replaced with the
-norm to promote sparsity in
. The relaxed problem can thus be written as:
It is worth noting that the application of the RPCA-assisted OP model differs substantially from its widespread use in computer vision. In visual detection, RPCA is often employed to extract the low-rank background component of images and capture both the onset and progression of local damage from the sparse component. This distinction underscores RPCA’s role not merely as a background–foreground separator but as a data cleansing mechanism tailored for long-term SHM without relying on the assumption of small Gaussian noise.
By isolating outliers into the sparse matrix
, RPCA effectively eliminates their influence on the estimation of the low-rank subspace
. Although RPCA provides a powerful framework for recovering the underlying low-rank structure from corrupted measurements, its basic formulation assumes that all entries of the data matrix are fully observed. Incomplete observations, such as missing data, cannot be properly addressed under the standard RPCA formulation, i.e., Equation (5). Therefore, when both outliers, as illustrated in
Figure 1 and missing entries are present, a more comprehensive approach is required that can handle these two issues simultaneously.
In fact, when RPCA was first proposed, Candès et al. [
28] had already recognized the potential occurrence of missing entries, which later motivated the development of matrix completion methods. However, matrix completion techniques, while effective for imputing missing values, remain inherently vulnerable to sparse outliers that can distort the recovered structure. To overcome these complementary limitations, this study introduces the noisy low-rank matrix completion (NLRMC) model [
37,
38], which unifies the advantages of RPCA and matrix completion within a single framework. This synergistic formulation extends standard RPCA from merely denoising complete data to robustly reconstructing partially observed and corrupted datasets.
Formally, given a data matrix
, let
denote the set of observed entries, and also define the projection operator
such that:
The recovery problem is then formulated as
where the projection constraint
ensures consistency with available observations.
The NLRMC model provides a robust data cleansing mechanism that yields a nearly low-rank representation of the raw data matrix. This preprocessing step jointly performs low-rank and sparse decomposition together with matrix completion, thereby achieving simultaneous suppression of gross sparse noise and recovery of missing entries. It is worth noting that when the data matrix contains no missing entries, the NLRMC model naturally degenerates to the RPCA formulation.
In this study, we adopt the solver proposed by Lu et al. [
39], which addresses the NLRMC problem through an Alternating Direction Method of Multipliers (ADMM) framework enhanced by Majorization Minimization. The optimization variables are updated in two super-blocks, consisting of the pair (
,
) and an auxiliary variable. In this formulation,
is computed via proximal nuclear norm minimization, whereas
is obtained through soft-thresholding or related proximal operators depending on the chosen loss function. This strategy not only ensures stable convergence but also provides the flexibility to accommodate different noise models. Compared with conventional nuclear norm-based approaches, the nonconvex formulation and Majorization Minimization-augmented ADMM solver yield tighter approximations to the true matrix rank and significantly accelerate convergence in practice.
2.3. Novelty Indicator Extraction Within Unsupervised ML Framework
The NLRMC model provides a robust preprocessing step to recover the low-rank data matrix, upon which the OP-based data normalization process can calculate residual vectors as anomaly scores, referred as the NLRMC-OP approach in this paper. In addition, the discriminative capability of these scores alone remains limited for reliably separating normal and abnormal states. Therefore, subsequent efforts focus on translating these residuals into actionable novelty detection outcomes within an unsupervised ML framework. Given the unknown evolution trends of multi-channel sensing data or multi-order modal features under structural changes, feature-level fusion is required to derive a unified indicator that consolidates the available information. Once this fused indicator is obtained, control chart techniques are applied to extract the final indicators, enabling the classification of structural patterns into normal or abnormal states.
Mahalanobis distance (MD) is computed on the OP-based residual vectors to fuse multi-channel sensing data or multi-order modal features into a single statistic that captures the unified evolutionary trend. Let
and
be the mean and covariance estimated from the baseline residuals; for a residual vector
, the squared MD is
Using baseline-only and aligns the indicator with the normal-state distribution and provides scale and correlation normalization, which improves discriminability for coordinated shifts while attenuating uninformative variance. The resulting MD sequence serves as the fused indicator that feeds the subsequent control chart step.
Among the control charts used in SHM, including the X-bar, Shewhart-T, and Hotelling T
2 charts [
40], this study adopts the exponentially weighted moving average (EWMA) chart. EWMA remains reliable when the fused indicator departs from normality and, by combining current observations with historical information through exponential weighting, is particularly sensitive to small, sustained shifts that enable early detection of subtle anomalies. Let
denote the fused indicator (MD) corresponding to the sample
at time
. The EWMA statistic is updated as:
where
is a constant that determines the proportion of information contributed by the current sample relative to preceding samples;
is set to the mean value of the baseline MDs; and
is the controlled variable to be detected, serving as the novelty indicator (NI) in this study for evaluating the structural condition.
With the control variable defined, an abnormal trend is identified when the statistic exceeds predetermined bounds. For the EWMA chart, these bounds are specified by the upper control limit (UCL) and lower control limit (LCL), which are computed as
where
is the standard deviation of the baseline MDs and
is a tunable parameter that defines the width of control limit in the EWMA control chart.
2.4. Implementation of Online Unsupervised Procedure
The moving-window strategy, using short data matrices at fixed intervals, refers to the implementation of online anomaly-detection procedure. The motivation arises from the fact that a limited number of sensing channels or modal features (
dimensions), combined with the large volume of observations (
samples) accumulated in long-term SHM activities, yields a slender data matrix of size
(
), which imposes computational burdens. The window size
should adapt to the dominant environmental period yet remain shorter than the available baseline data [
41].
The proposed online robust anomaly-detection procedure based on moving windows is illustrated in
Figure 2. In this framework, the two key steps are the aforementioned NLRMC and OP, which act as data cleansing and normalization operations within each moving window. At each iteration
, a window of length
is constructed starting from the current sample
and encompassing the subsequent
samples; once the analysis within the current window is completed, the window is shifted forward to update the computation. The entire procedure consists of two phases: (i)
Training phase, during which baseline features are extracted and control limits are established, and (ii)
Monitoring phase, in which new samples are sequentially processed to enable online anomaly detection.
Training phase: After initializing the moving-window size and the parameters of the control chart, the data matrix constructed from the first window is adopted as the baseline for unsupervised training. NLRMC is then applied to perform data cleansing, which simultaneously completes missing entries and isolates corrupted components, thereby producing a baseline low-rank matrix. Subsequently, OP-based residual strategy is employed to derive the eigenvector matrix and compute the corresponding residual vectors. At the end of the training phase, MD-based feature fusion is performed based on these residual vectors, and the EWMA-based control limits (CLs) are established for subsequent monitoring.
Monitoring phase: The moving-window process is implemented in real time, and within each updated window the unsupervised learning procedure described above is executed. The data matrix is updated by appending one column of the newly acquired feature data and removing the earliest column, as depicted in
Figure 2. For each updated data matrix, NLRMC and OP are applied to compute the residual vectors, followed by feature fusion and control charting to obtain the NI. Anomaly detection is then carried out by comparing this indicator against the CLs obtained in the training phase. The structure is considered to remain in its normal state during that window. Conversely, a NI value exceeding the UCL or LCL signals the occurrence of a structural anomaly. From a subspace perspective, when no structural changes or damage progression are present, new samples are expected to lie within the hyperplane spanned by the low-rank subspace of the normal state. In contrast, abrupt changes or progressive damage drive the features away from this hyperplane, producing substantially larger residuals that indicate an abnormal state.
3. Application to the KW51 Bridge
Because real-world cases of structural anomaly monitoring are rare, validating structural anomaly detection approaches often relies on artificially induced scenarios, such as generating frequency shifts from numerical models. In this study, however, we employ the publicly available dataset of the KW51 railway bridge released by Maes and Lombaert [
35], which has become a benchmark in the SHM community for its ability to capture genuine structural changes across three operational states -before, during, and after retrofitting. The KW51 bridge is a 115 m long, 12.4 m wide steel tied-arch structure with a dual-track deck suspended by inclined hangers, located on the L36N railway line between Leuven and Brussels and serving various passenger trains since 2003. Between May and September 2019, the bridge underwent retrofitting to strengthen bolted connections between the braces, the deck, and the arch; scaffolding installed during this period temporarily altered its mass and stiffness, as illustrated in
Figure 3. After reinforcement with welded steel plates, the bridge returned to service. This distinctive sequence of operational states can therefore be used to evaluate the effectiveness and robustness of the proposed procedure for detecting anomalies in real-world structural conditions.
The monitoring campaign of the KW51 bridge spanned three operational states: a 7.5-month period before retrofitting (2 October 2018–15 May 2019), the retrofitting period (16 May–27 September 2019) and a 3.5-month period after retrofitting (28 September 2019–15 January 2020). During this time, the sensor network was progressively enhanced, yielding a dataset comprising acceleration, strain, displacement, temperature and humidity measurements. Structural vibration responses from train-induced and ambient sources were captured by 12 accelerometers installed on the deck and arches. Operational modal analysis was conducted on an hourly basis based on ambient vibration responses to track the evolving dynamic characteristics of the KW51 bridge [
35]. The extraction process of modal parameters was conducted using a reference-based covariance-driven stochastic subspace identification method, followed by the inspection of stabilization diagrams and an additional clustering step to automatically interpret and group the identified modes.
A total of 14 modal frequencies were identified throughout the monitoring campaign and categorized into two groups [
35]. The first is an increasing group (modes 6–8 and 10–14), whose frequencies rose after retrofitting owing to the stiffening of the connections between the diagonals, the deck, and the arches. The second is a decreasing group (modes 1–5 and 9), which exhibited reductions as a result of the added mass from the retrofitted steel boxes near the arches. Ref. [
24] examined these features in their original spaces and histograms, revealing a pronounced masking effect in which environmental variability, particularly temperature, dominated the frequency distributions. Only six natural frequencies were selected to verify the proposed approach, as shown in
Figure 4, with particular attention to the NLRMC-OP approach. The six modes include three from the increasing group (modes 6, 10, and 13) and three from the decreasing group (modes 3, 5, and 9).
Furthermore, to comprehensively assess the approach’s performance under varying levels of data completeness, two subsets were defined. Dataset 1 excludes all frequency vectors containing missing entries, yielding 2524 and 677 samples corresponding to the operational states before and after retrofitting, respectively. Dataset 2 removes only those vectors with all entries missing, resulting in 4090 and 2555 samples before and after retrofitting, respectively. The sample No. and corresponding timestamps of both subsets are listed in
Table 1. The key distinction lies in the gap-removal operation applied to Dataset 1, where any frequency vector containing even a single gap was entirely removed, as data vectors with unfilled entries cannot undergo subsequent data normalization. Previous studies [
34,
35] on the KW51 bridge dataset adopted a similar practice. In contrast, Dataset 2 retains these incomplete vectors with missing entries to evaluate the effectiveness of the proposed approach when the NLRMC model is employed as a preprocessing step. Together, these datasets provide a rigorous basis for evaluating the proposed approach in handling corrupted and missing data under realistic SHM applications.
Furthermore, Dataset 1, which excludes all frequency vectors containing missing entries, substantially reduced the utilization rate of the raw frequency data. Referring to
Table 2, two quality indexes are reported for the six selected modes: the success rate (SR), which represents the modal identification success rate of each modal frequency, and the utilization rate (UR), which denotes the proportion of measurement samples that remain usable in the multi-order frequency dataset after removing missing entries. Mode 10 in
Table 2 exhibited a relatively low success rate of modal identification, resulting in numerous gaps in the frequency dataset.
It should be emphasized that the removing gaps of samples in Dataset 1 caused the URs of the raw frequency data for the other modes to fall to less than half. More importantly, the first sample in Dataset 1 after retrofitting (i.e., the 2525th sample) corresponds to a timestamp of 18:00 on 28 September 2019, even though the actual retrofit having been completed on 27 September 2019. In reality, monitoring of such structural changes due to after retrofitting started at midnight on 28 September, so the gap-removal operation in Dataset 1 deleted all samples between 00:00 and 18:00, also seen in
Table 1. This omission prevented the timely detection of structural changes that occurred in the bridge, thereby leading to false negatives. Hence, Dataset 2, with 100% utilization yet containing substantial missing entries, is indispensable for demonstrating the effectiveness of the proposed online robust anomaly-detection procedure.
For methodological comparison, the proposed online anomaly detection procedure was applied without the NLRMC-OP steps, that is, by directly performing feature fusion and classification on the multi-order raw frequency data, i.e., Equations (8)–(11), yielding a pseudo novelty-detection result. As shown in
Figure 5, the substantial structural changes in the KW51 bridge before and after retrofitting enabled a relatively straightforward distinction between the two operational states. Nevertheless, the reliability of the NIs obtained under this setting remains highly questionable. When the initial control-limit width
was set to 5, a large number of outliers appeared in the indicators before-retrofitting (normal) state, which typically arose from material nonlinearities near 0 °C caused by freezing effects [
35], ultimately leading to false positives. Even after adjusting the control-limit width
to 25, structural anomalies after retrofitting were still misclassified as normal, as evidenced by the data points after the blue dashed line in
Figure 5, reflecting more severe false negatives. These deficiencies highlight the lack of robustness in anomaly detection using statistical models alone and underscore the necessity of incorporating data normalization and cleansing steps into the proposed unsupervised pipeline to avoid misleading decision-making.
5. Conclusions
This paper addresses the challenge of robust anomaly detection in the SHM community. Although conventional OP techniques, such as principal component analysis, have been shown to be effective in isolating environmental effects, its performance can be impaired by corrupted or missing data. Motivated by the intrinsic low-rank nature of the measured dataset, we introduced the NLRMC model as a preprocessing step for OP-based data normalization. By jointly performing low-rank and sparse decomposition and matrix completion, the NLRMC model mitigates the adverse influence of sparse outliers while simultaneously imputing missing entries. Building upon this foundation, the integration of the NLRMC-OP approach with MD-based feature fusion and EWMA control chart under unsupervised ML framework forms a fully unsupervised online anomaly-detection procedure. The proposed approach was substantiated using the KW51 bridge, whose distinctive sequence of operational states provides a rigorous basis for assessing its effectiveness and robustness under realistic SHM conditions. The main conclusions of this study are as follows:
Firstly, when used without NLRMC-OP, the raw frequency dataset could still separate the states before and after retrofitting. However, freezing at near 0 °C introduced sparse outliers that caused false positives, and enlarging the control-limit width misclassified structural anomalies as normal, raising from false negatives.
Secondly, using Dataset 1 (after gap-removal operation), the proposed NLRMC-OP approach effectively handled sparse outliers caused by freezing, thereby suppressing environmental variability and avoiding both false positives and false negatives. It is worth noting that, when the data matrix contains no missing entries, the NLRMC model naturally degenerates to the RPCA formulation; hence Dataset 1 primarily serves as a feasibility investigation of the proposed approach.
Finally, Dataset 2 retains incomplete frequency vectors and therefore cannot be processed by RPCA. Integrating NLRMC into the pipeline enables, within each moving window, simultaneous recovery of the underlying low-rank structure, isolation of sparse corruptions, and completion of missing entries; this restores the operability of the OP-based residual strategy under severe data missingness, improves data utilization and yields timely, reliable detection of structural change. This capability—robust anomaly detection on partially observed and corrupted data via a unified NLRMC-OP pipeline—represents the central contribution of the present work.
The synergy of NLRMC transforms the vulnerability of OP operators to outliers and missing data into a robust component of an unsupervised ML framework for structural anomaly detection. Both OP and NLRMC techniques are rooted in the low-rank data structure, or more generally, low-dimensional modeling. Amid the wave of AI-enabled civil infrastructure, approaches that exploit the inherent low-rank data structure could offer a robust pathway to overcome masking effects, thereby enhancing SHM data usability and decision reliability.