Multiclass Anomaly Detection in Bridge Health Monitoring Data via Attention Enhancement and Class Imbalance Mitigation

Ma, Wenda; Tang, Qizhi; Huang, Lei; Zhang, Shihao

doi:10.3390/buildings16061181

Open AccessArticle

Multiclass Anomaly Detection in Bridge Health Monitoring Data via Attention Enhancement and Class Imbalance Mitigation

¹

State Key Laboratory of Mountain Bridge and Tunnel Engineering, Chongqing Jiaotong University, Chongqing 400074, China

²

School of Civil Engineering, Chongqing Jiaotong University, Chongqing 400074, China

³

CCCC First Highway Engineering Group Co., Ltd., Beijing 100024, China

^*

Author to whom correspondence should be addressed.

Buildings 2026, 16(6), 1181; https://doi.org/10.3390/buildings16061181

Submission received: 22 January 2026 / Revised: 19 February 2026 / Accepted: 16 March 2026 / Published: 17 March 2026

(This article belongs to the Topic New Developments in Intelligent Construction and Operation of Infrastructures)

Download

Browse Figures

Versions Notes

Abstract

Bridge structural health monitoring (BSHM) systems are essential for assessing the operational performance and safety of long-span bridges. However, monitoring data are often affected by factors such as sensor malfunctions, environmental disturbances, or power interruptions, leading to various anomalous data. Moreover, the multiclass imbalance of the data presents a major challenge to traditional anomaly detection methods. To address this issue, a novel multiclass anomaly detection method based on an improved deep convolutional neural network is proposed. Specifically, a ResNet50 architecture integrated with the convolutional block attention module (CBAM) is developed to enhance the extraction of discriminative features. Additionally, the Focal Loss function is introduced to emphasize the loss weight of minority samples, reducing the influence of majority classes, thereby effectively overcoming the class imbalance issue in multiclass anomaly detection. The proposed method is trained and validated using measured acceleration data collected from a large-scale cable-stayed bridge. The experimental results indicate that the model achieves an overall accuracy of 98.28%, while effectively improving the classification performance of minority categories. The method further reproduces the spatiotemporal distribution of anomalies in full-month monitoring data, confirming its robustness and engineering applicability for large-scale automated anomaly diagnosis in BSHM systems.

Keywords:

bridge structural health monitoring; data anomaly detection; class imbalance; deep learning; attention mechanism

1. Introduction

Bridge structural health monitoring (BSHM) systems have been increasingly implemented in long-span bridges, playing a vital role in ensuring operational safety by providing early warning, reliability assessment, and maintenance decision support [1,2]. However, during long-term sensor operation, various forms of anomaly data are often introduced due to factors such as equipment malfunction, electromagnetic interference, and unstable power supply [3,4,5]. These anomalies may obscure the true structural responses. For example, the accuracy of vibration mode estimation in higher modes can be significantly degraded [6,7]. Consequently, it becomes difficult to distinguish whether the abnormalities originate from sensor faults or from the structure itself, posing great challenges to accurate service-state evaluation and maintenance decision-making of bridges [8]. Therefore, it is necessary to detect and remove anomaly data from large volumes of monitoring data to avoid adverse effects on subsequent assessment and decision processes [9,10,11].

In recent years, various methods have been developed to diagnose anomalies in BSHM data, which can be broadly categorized into three types: statistical model-based methods, data-driven machine learning methods, and artificial intelligence-based deep learning methods. Among existing studies, statistical model-based methods have been widely applied owing to their well-established theoretical foundation and ease of implementation. These methods are typically constructed using normal monitoring data to establish a statistical model, through which anomalies are detected according to predefined indicators. Therefore, they do not require large amounts of labeled data or complex network architectures. Gul et al. [12] employed an autoregressive model combined with the Mahalanobis distance to identify outliers. Hernandez-Garcia et al. [13] utilized a latent-variable multivariate statistical analysis method for anomaly detection in sensor networks. Zhang et al. [14] distinguished different types of anomalies based on statistical features such as time-domain root mean square, frequency-domain kurtosis, unit distance number, and mean deviation difference. Jian et al. [15] realized anomaly detection and classification using relative frequency distribution histograms of acceleration data, while Zhang et al. [16] extracted local binary pattern (LBP) histograms from waveform images and combined them with a random forest to achieve multiclass anomaly detection. Traditional statistical methods mainly rely on threshold rules, time–frequency analysis, and correlation measures for anomaly detection. Although these methods are simple to implement and computationally efficient, their effectiveness strongly depends on model assumptions and prior empirical information, requiring multiple experimental validations. Moreover, they tend to perform poorly when dealing with noisy interference, complex high-dimensional data, and high-sampling-rate dynamic signals [17,18,19,20,21].

Data-driven machine learning methods have attracted widespread attention owing to their adaptive learning capability for complex data patterns. Unlike traditional approaches that rely on manually established baseline models, these methods directly learn the intrinsic relationships within data through training samples. The accuracy of such methods largely depends on the effective extraction of features from faulty sensor signals. Some studies have employed principal component analysis (PCA) and clustering algorithms to extract potential anomaly features [22,23], while others have adopted supervised learning approaches such as hidden Markov models (HMMs) and support vector machines (SVMs) to identify offset, trend, and other anomaly patterns [24,25]. In addition, the introduction of ensemble and lightweight models has further improved the accuracy and robustness of anomaly detection [26,27]. Overall, the effectiveness of machine learning-based anomaly diagnosis methods is highly dependent on feature extraction and feature space selection [28,29]. Feature design often requires extensive domain knowledge and trial-and-error processes, and the generalization capability of these methods remains limited when dealing with high-dimensional, complex, and diverse monitoring data [30].

The diversity of anomaly types and the imbalance among different data categories often limit the performance of machine learning methods, posing significant challenges to accurate and reliable anomaly diagnosis [31]. In contrast, deep learning techniques, which possess automatic feature extraction and powerful representation learning capabilities, have gradually emerged and gained broad recognition. Chalapathy and Chawla [32] surveyed deep-learning-based anomaly detection, categorizing major methods and summarizing their cross-domain applications. Gao et al. [33] introduced a low-cost, high-precision anomaly detection method for multi-type data using pattern recognition neural networks (PRNNs). Liu et al. [34] proposed a GAT–LSTM and VAE–DeSVDD-based framework for spatiotemporal SHM anomaly detection, demonstrating strong performance in key metrics. In recent years, anomaly detection methods based on generative adversarial networks (GANs) have attracted considerable attention. Qu et al. [35] proposed a two-stage, attention-enhanced ResNet50 with GAN-based augmentation for imbalanced bridge anomaly detection, achieving over 95% recall across classes. These methods detect anomalies by characterizing the differences between normal and anomaly data, making them more suitable for handling imbalanced datasets [36,37,38,39]. However, although such approaches offer advantages in mitigating training data imbalance and reducing manual labeling requirements, their unsupervised learning nature typically simplifies the problem into a binary classification task (normal vs. anomaly), making it difficult to further distinguish among different types of anomaly patterns.

With the advancement of computer vision (CV) technology, researchers have increasingly tended to convert digital signals into image-based data representations. Through image enhancement techniques, class imbalance can be alleviated, and deep networks can automatically extract discriminative features from images for more accurate anomaly detection. Bao et al. [17] were the first to combine CV techniques with deep learning for anomaly detection in bridge monitoring data. Subsequently, various image generation and feature fusion strategies have been proposed, including time–frequency image methods based on continuous wavelet transform (CWT) and fast Fourier transform (FFT) [40,41,42], as well as lightweight approaches utilizing grayscale images and transfer learning [43,44] to detect anomaly data. Shajihan et al. [42] constructed three-channel images combining time-domain, frequency-domain, and probability density features, and trained a CNN on these representations, achieving higher overall accuracy and recall for anomaly detection. Zhu et al. [45] combined visual transformer (ViT) with CNN modules to strengthen feature learning, reporting an accuracy of 93.1% for anomaly detection. These methods have demonstrated promising performance in enhancing feature representation capability and mitigating the impact of class imbalance.

However, due to the nonlinear nature of data transformation, most existing studies have shown that the converted images are often more difficult to interpret in relation to the original data, thereby reducing the model’s interpretability [30]. In addition, the majority of research has focused on balanced datasets or employed only simple data balancing techniques. In practical SHM applications, the amount of anomaly data is far smaller than that of normal data, which further aggravates the problem of class imbalance and leads to degraded classification performance for minority or difficult-to-distinguish categories [35].

To address the above challenges, this study presents a deep learning-based framework for multiclass anomaly detection in bridge health monitoring data, with particular attention to class imbalance. Specifically, a ResNet50 backbone is adopted as the base classifier and augmented with the convolutional block attention module (CBAM) to improve feature representation for diverse anomaly patterns. In addition, the Focal Loss function is introduced to alleviate class imbalance, enhancing the model’s ability to classify minority and hard-to-classify samples. The proposed framework is evaluated on real monitoring data collected from a large-scale in-service bridge, and its effectiveness is assessed through comprehensive experiments and comparative analyses. The main contributions of this paper are summarized as follows:

(1): A CBAM-enhanced ResNet50 is developed to strengthen both channel-wise and spatial feature responses, improving the sensitivity to subtle anomaly patterns and reducing background interference in time-history images.
(2): The Focal Loss is adopted to address class imbalance by reducing the dominance of easy samples and enhancing learning on minority and hard-to-classify categories; its impact is quantified using standard metrics such as accuracy and recall.

The rest of this paper is organized as follows. In Section 2, the proposed method, model architecture, and evaluation metrics are presented. Section 3 provides experimental validation using real bridge data, along with a comparative study to demonstrate the effectiveness of the method. Section 4 and Section 5 present the anomaly diagnosis results and conclude the paper.

2. Proposed Method

2.1. Overview of Proposed Framework

Figure 1 presents the overall flowchart of the proposed method. The framework consists of three components: data acquisition, model training, and analysis and evaluation. First, the bridge monitoring data are manually labeled and categorized. Then, the input images are fed into a CBAM enhanced ResNet50 network for feature extraction and refinement, and the extracted feature maps are aggregated via adaptive average pooling (AAP). During training, focal loss is adopted for parameter optimization to improve recognition performance under class-imbalanced conditions, and the predicted class is finally obtained. Lastly, the classification results are analyzed and the model performance is evaluated.

2.2. Model Architecture Design

2.2.1. Backbone Network Architecture

Considering the strong non-stationarity and massive nature of bridge acceleration data [33,46,47], while balancing the data characteristics, model performance, and computational efficiency [48,49], this study ultimately adopts ResNet50 as the network model. The residual modules effectively address the common issue of gradient vanishing in deep convolutional networks, enabling ResNet50 to incorporate more layers and thereby enhancing its ability to represent complex features [50].

Figure 2 shows a residual block in ResNet50. In Figure 2,

x

denotes the input feature map and is fed to both the main branch and the shortcut branch.

F (x)

denotes the residual mapping learned by the stacked weight layers in the main branch.

G (x)

denotes the shortcut mapping that propagates the input feature, which is typically an identity mapping and can be replaced by a projection mapping for dimension alignment. The block output is produced by element-wise addition of the two branches. Each residual block consists of three convolutional layers, designed with the Bottleneck structure. The first 1 × 1 convolution reduces the number of channels in the feature map, thereby reducing computational complexity. This is followed by a 3 × 3 convolution for feature extraction, and another 1 × 1 convolution to restore the number of channels. This design ensures that the deep network can be trained more efficiently while effectively extracting key information from the input data, enhancing the model’s learning capability without sacrificing computational efficiency. ReLU activation is applied after each convolutional layer, introducing non-linearity to enhance the network’s expressive power.

Figure 3 illustrates the overall architecture of ResNet50. The input image is first processed by a 7 × 7 convolution layer with 64 channels and stride 2 to extract low-level features and reduce the spatial resolution. A 3 × 3 max pooling layer with stride 2 is then applied to further downsample the feature maps. The network backbone is composed of four consecutive stages of residual blocks, which contain 3, 4, 6, and 3 bottleneck blocks, respectively. In each bottleneck block, the main branch performs a three-layer transformation using a 1 × 1 convolution for channel adjustment, a 3 × 3 convolution for spatial feature learning, and a final 1 × 1 convolution for channel expansion, while the shortcut branch propagates the input feature to enable residual learning. The output channels of the four stages are 256, 512, 1024, and 2048, respectively, and the first block of each stage performs downsampling to change the feature resolution and align feature dimensions. After the residual stages, AAP is used to aggregate the spatial features into a compact representation, which is then fed into a fully connected layer and a Softmax classifier to obtain the predicted class label.

2.2.2. CBAM: Channel and Spatial Attention Mechanisms

CBAM is a lightweight and end-to-end trainable attention module designed to enhance the network’s responsiveness to important features while keeping the overall architecture and computational complexity nearly unchanged. The CBAM consists of two sequential submodules: a channel attention mechanism and a spatial attention mechanism. The channel attention mechanism strengthens feature representations across different channels, whereas the spatial attention mechanism focuses on extracting key information from different spatial locations [51]. For an input feature map, the attention enhancement process can be described as follows:

(1): Channel attention module

The channel attention module adopts both global max pooling and global average pooling to compress the spatial dimensions of the input feature map and obtain two channel descriptors. These descriptors are fed into a shared multi-layer perceptron (MLP) to model inter-channel dependencies and generate channel-wise responses. As illustrated in Figure 4, the descriptors produced by max pooling and average pooling are shown as the dark blue and dark red blocks, respectively, and their corresponding MLP outputs are shown as the light blue and light red blocks. The two responses are then fused by element-wise addition and activated by a sigmoid function to obtain the final channel attention map:

M_{c} (F) = σ (MLP (AvgPool (F)) + MLP (MaxPool (F)))

(1)

where

σ (\cdot)

denotes the Sigmoid activation function, and

AvgPool (F)

and

MaxPool (F)

represent the features obtained through average pooling and max pooling, respectively.

(2): Spatial attention module

As shown in Figure 5, The spatial attention module emphasizes the feature responses at key spatial locations through pooling and convolution operations. Specifically, average pooling and max pooling are first applied along the channel dimension, and the resulting feature maps are then concatenated. Based on the concatenated feature map, a convolution operation with a kernel size of 7 × 7 is performed to generate the final spatial attention map:

M_{s} (F^{'}) = σ (f^{7 \times 7} ([AvgPool (F^{'}); MaxPool (F^{'})]))

(2)

where

f^{7 \times 7} (\cdot)

denotes the convolution operation with a 7 × 7 kernel, and

[;]

represents the concatenation operation along the channel dimension. The remaining symbols have the same meanings as defined above.

(3): Embedding of the attention module into the residual block

In this paper, CBAM is embedded at the end of each residual unit, specifically after the last convolutional layer in the main branch of each bottleneck module and before the residual connection is added. As illustrated in Figure 6, the feature map F output from the main branch is sequentially refined by channel attention and spatial attention. The circle-cross symbol

\otimes

denotes element-wise multiplication (Hadamard product) between the attention map and the feature map. Accordingly, channel attention is applied as

F^{'} = M_{c} (F) \otimes F

, and spatial attention is further applied as

F^{″} = M_{s} (F^{'}) \otimes F^{'}

, where

M_{c} (\cdot)

and

M_{s} (\cdot)

denote the channel attention and spatial attention functions, respectively. Finally, the enhanced feature

F^{″}

is added to the shortcut feature through the residual connection to obtain the block output. By preserving the residual learning structure and enhancing feature discriminability, this strategy contributes to more reliable anomaly recognition under complex image conditions.

2.3. Loss Function

The loss function is mainly used during the training stage of the model to provide an optimization objective for deep neural network training. It guides parameter updates through backpropagation and plays a key role in determining whether the model can effectively converge. To reduce the contribution of easily classified samples to the loss function and make the model more focused on distinguishing hard samples, Lin et al. [46] further improved the weighted cross-entropy loss by introducing a modulating factor based on sample difficulty, resulting in the Focal Loss:

L_{F L} = - \sum_{c = 1}^{C} α_{c} {(1 - {\hat{y}}_{c})}^{γ} y_{c} \log ({\hat{y}}_{c})

(3)

where

y_{c}

denotes the ground-truth indicator for class c, which equals 1 when the sample belongs to class c and equals 0 otherwise, and

{\hat{y}}_{c}

denotes the predicted probability of class c after Softmax. In this study, C = 5.

α_{c}

is the class-balancing factor used to adjust the relative loss among different classes, and

γ

is the focusing parameter that controls the influence of sample difficulty on the loss contribution. When the predicted probability

{\hat{y}}_{c}

approaches 1, the term

{(1 - {\hat{y}}_{c})}^{γ}

significantly reduces the loss weight of that sample, thereby decreasing the attention to easily classified samples. In contrast, for hard samples, the loss contribution is increased, thereby enhancing the optimization of minority classes and samples with indistinct decision boundaries.

Focal Loss effectively enhances the model’s sensitivity to minority and hard-to-classify samples while maintaining the overall classification performance. It mitigates the adverse effects of data imbalance during model training, and its reinforcement mechanism for hard samples helps improve both the discriminative ability and generalization performance of the model. Considering that the dataset used in this study is characterized by class imbalance and high recognition difficulty for certain categories, the Focal Loss is adopted as the loss function.

2.4. Evaluation Indicator

In image classification, model performance is typically assessed using metrics such as Precision, Recall, and Accuracy. Precision indicates how trustworthy the predicted category labels are by calculating the proportion of correct predictions within all predicted samples of a given class. Recall reflects the model’s ability to successfully retrieve samples that truly belong to the class. Accuracy provides a global measure by computing the proportion of correctly classified samples among the entire dataset.

To comprehensively evaluate the performance of an image classification model, multiple metrics are typically considered, as a single indicator may not fully capture the model’s behavior. For example, high precision indicates reliable positive predictions but may correspond to low recall, while high recall may come at the cost of reduced precision. When the dataset is highly imbalanced, the accuracy metric becomes less representative because the misclassification of minority classes has little impact on the overall accuracy. In practice, achieving both high precision and high recall is difficult; therefore, the F1-score is introduced as a complementary metric that accounts for the reliability of classification under class-imbalanced conditions. The specific definitions of these metrics are given as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(4)

R e c a l l = \frac{T P}{T P + F N}

(5)

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(6)

F 1 s c o r e = 2 \cdot \frac{P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l} = \frac{2 T P}{2 T P + F P + F N}

(7)

where

T P

(True Positive) denotes the number of samples correctly classified into the target class;

T N

(True Negative) refers to the number of samples correctly classified as not belonging to the target class;

F N

(False Negative) represents the number of samples from the target class incorrectly classified into other classes; and

F P

(False Positive) corresponds to the number of samples from other classes incorrectly classified into the target class.

3. Case Study

3.1. Data Description

A double-tower cable-stayed bridge was selected as the research object in this paper. The bridge serves as a dedicated rail transit structure, and its operational conditions exhibit clear temporal regularity. The total length of the bridge is 1224 m with a main span of 480 m. The bridge health monitoring system is equipped with various types of sensors, including accelerometers, temperature–humidity sensors, anemometers, hydrostatic level gauges, and strain gauges. To achieve dynamic monitoring of the entire bridge, accelerometers were installed on both the main girder and the towers, with seven dual-channel accelerometers deployed, resulting in a total of 14 measurement points. The sampling frequency of the accelerometers is 50 Hz. The specific sensor locations and measurement directions are shown in Figure 7 and Table 1.

3.2. Data Visualization and Ground-Truth Labeling

In this study, the acceleration data were categorized into five states: missing, noise, environmental excitation, outlier, and normal. The missing class corresponds to signals that are absent or constant, the noise class represents low-amplitude steady vibrations, and the environmental excitation class reflects slight fluctuations caused by external factors. The outlier class contains evident anomaly points, while the normal class corresponds to typical responses under train loading. It should be noted that the noise and environmental excitation are not regarded as fault-induced anomalies; rather, they correspond to specific operating periods and often exhibit temporal regularities. During labeling, they were distinguished using operating conditions and waveform morphology in one-hour segments: low-amplitude, quasi-stationary fluctuations under stable conditions with no train passages were labeled as noise, whereas pronounced, non-stationary amplitude variations with an enlarged envelope under increased ambient disturbances were labeled as environmental excitation. In contrast, missing and outlier anomalies are more sporadic and may occur randomly across sensors and time.

To construct the original dataset, one month of bridge acceleration monitoring data was selected. The raw data consist of single-channel time-series signals collected by each accelerometer. The data were segmented into non-overlapping one-hour windows, resulting in 744 time-series segments per measurement point and a total of 10,416 samples. Each time-series segment contains 180,000 data points. In this study, these one-hour windows were used directly as raw diagnostic samples without additional signal pre-processing. Subsequently, each segment was converted into a time-history plot by rendering only the waveform curve, while the axes, ticks, and grids were removed. The waveform was exported as a raster image with a fixed canvas resolution of 640 × 480 pixels (100 dpi) and a white background. The x-axis span was fixed to one hour for all samples, and the y-axis was automatically scaled for each segment to avoid compressing low-amplitude abnormal patterns. As a result, the fixed one-hour x-axis span avoids horizontal scale variation across samples, while the adaptive y-axis scaling helps preserve waveform morphology for low-amplitude anomalies. As an example, Figure 8 presents five images from Channels 1, 3, and 5 in the training set.

The feature descriptions and sample statistics of each data category are summarized in Table 2, and typical sample images are shown in Figure 9. It can be observed that there is a significant difference in the proportions of different bridge data categories, with normal data accounting for 74.84%, while missing and outlier data account for only 2.84% and 4.76%, respectively, indicating a severe class imbalance.

The dataset was split into training, validation, and test sets with a ratio of 8:1:1. The training set provides sufficient samples for model learning. During training, data augmentation was employed to improve generalization, including random resized cropping to 224 × 224 as well as random horizontal and vertical flipping. For validation and testing, only deterministic preprocessing was applied: each image was resized to 224 × 224 and normalized before being fed into the network, thereby ensuring fair and reliable performance evaluation across classes. The target input size of 224 × 224 was adopted to match the standard input resolution of ResNet-style architectures and to ensure consistent input dimensions across all samples.

3.3. Model Training and Testing

The model training was implemented on a deep learning server based on the PyTorch framework (v2.2.2). The server was configured with an NVIDIA RTX 3090 GPU, a six-core Intel Xeon Gold 6142 CPU, and 32 GB of RAM. To improve the model’s generalization capability, samples in the training set were randomly augmented by resizing, horizontal flipping, and vertical flipping, while the validation and test sets were left unprocessed. The Adam optimizer was employed with an initial learning rate of 0.0001, a batch size of 32, and 100 training epochs. After each epoch, performance was assessed on the validation set, and the model with the highest validation accuracy was retained as the final model.

The Focal Loss was employed as the loss function, with the focusing parameter γ empirically set to 2, which effectively suppresses the dominance of easily classified samples. The weighting factor α was determined according to the class imbalance, being inversely proportional to the number of samples in each class and normalized accordingly. The final values of α were set to [0.72, 1.24, 1.55, 0.40, 1.19].

3.3.1. Training Result Analysis

The variations in the loss value and accuracy with training epochs are illustrated in Figure 10. The training loss is the average value of the loss function over all mini-batches in one epoch. The training accuracy is defined as the ratio of correctly classified samples to the total number of samples in the training set for each epoch, where the predicted label is obtained by the class with the maximum Softmax probability. Both curves are reported on an epoch-wise basis. As shown in the figure, the loss value stabilizes after approximately 60 epochs, while the accuracy gradually increases and finally converges to about 92.8%, indicating that the model achieves stable convergence and that the adopted training strategy is reasonable.

As an important evaluation tool for classification models, the confusion matrix intuitively reflects the recognition accuracy and misclassification relationships among different categories. As shown in Figure 11, the confusion matrices for the training and validation sets demonstrate overall satisfactory classification performance. The results indicate that the model achieves high precision and recall for most categories, with overall accuracies of 92.58% and 97.12% for the training and validation sets, respectively. In the validation set, the “normal” and “missing” categories achieve the best recognition performance, with both precision and recall exceeding 95%. Although slight confusion exists between the “environmental excitation” and “noise” categories, the overall classification trend remains consistent with the expected pattern.

3.3.2. Testing Result Analysis

The confusion matrix of the test set is shown in Figure 12, with an overall accuracy of 98.28%, indicating that the model maintains excellent generalization performance on unseen data. Among all categories, the “normal” and “missing” classes are almost perfectly classified. The “environmental excitation” and “noise” classes are recognized stably with only minor confusion, while the recognition performance of the “outlier” class is significantly improved. Overall, the results demonstrate that the model achieves high-level classification performance even under class-imbalanced conditions.

Furthermore, Figure 13 presents the F1-scores of each category in the test set. The “missing” and “normal” classes achieve the best recognition performance, while the “environmental excitation,” “outlier,” and “noise” classes also maintain relatively high F1-scores. Overall, the results indicate that the model is capable of reliably distinguishing between different types of anomaly samples.

3.4. Comparative Study

3.4.1. Comparison with Traditional Deep CNN

To substantiate the effectiveness of the proposed method, a comparative analysis was conducted against several representative deep CNN architectures, including VGG16, DenseNet121, and EfficientNet-B0. For a fair comparison, all models were trained and evaluated using the same dataset split, input resolution, data augmentation strategy, optimizer settings, and training schedule. Performance was reported on the same test set.

Given the severe class imbalance, macro-averaged precision, recall, and F1-score were adopted to assign equal importance to each category regardless of its sample size. In addition, per-class metrics were reported to further highlight performance on minority categories.

The classification results are summarized in Table 3 and Table 4. Overall, the proposed method achieves the best performance with an accuracy of 98.28%, outperforming VGG16 (96.07%), DenseNet121 (96.36%), and EfficientNet-B0 (96.93%). More importantly, the proposed method demonstrates more consistent and balanced recognition across categories, as reflected by the highest macro-average F1-score of 95.26%, which is 5.81%, 3.67%, and 3.95% higher than those of VGG16, DenseNet121, and EfficientNet-B0, respectively. This indicates that the proposed framework is more robust under the imbalanced multiclass setting.

In terms of minority and hard-to-classify categories, the proposed method exhibits clear advantages. For the environmental excitation class, the proposed method reaches an F1-score of 95.20%, improving by 7.54%, 9.26%, and 5.89% over VGG16, DenseNet121, and EfficientNet-B0, respectively. For the noise class, the improvement is even more pronounced. The proposed method achieves an F1-score of 88.66%, which is 15.93% higher than VGG16 and 11.43% higher than both DenseNet121 and EfficientNet-B0. These gains are consistent with the substantial increase in macro-average recall, suggesting improved sensitivity to minority patterns rather than merely optimizing the majority class. Notably, all models achieve near-perfect performance for the missing class, indicating that this pattern is highly distinguishable in the dataset. Overall, these results demonstrate that, compared with traditional deep CNN architectures, the proposed method provides more reliable multiclass anomaly detection performance, particularly by enhancing recognition of minority categories under class imbalance.

3.4.2. Comparison with Different Methods

To further validate the effectiveness of the proposed method, three comparative methods were designed: (1) ResNet50, used as the baseline model; (2) ResNet50 integrated with the CBAM to evaluate the enhancement in feature extraction capability provided by the attention mechanism; and (3) CBAM-ResNet50 combined with the Focal Loss function to further alleviate the class imbalance problem. For clarity, these three methods are denoted as Method 1, Method 2, and the Proposed Method, respectively. Both Method 1 and Method 2 adopt the cross-entropy loss function. All methods were trained using the same data partition and hyperparameter settings.

The comparative results of the confusion matrices for the three methods on the test set are presented in Figure 14 and Table 5. Overall, the models achieve accuracies of 95.02%, 96.84%, and 98.28%, respectively, indicating a steady improvement in overall performance with the successive introduction of the CBAM and Focal Loss. Specifically, Method 1 performs poorly on the “noise” class, with a recall of only 46.81%. After incorporating the CBAM, Method 2 shows a significant improvement in the recall of the “noise” class, reaching 78.72%. With the further inclusion of Focal Loss, Method 3 achieves overall performance enhancement across all categories, where the recall of the “noise” class increases to 91.49% and the “outlier” class rises to 90.0%. These results demonstrate that the Focal Loss helps mitigate class imbalance and strengthens the learning of minority classes.

The comparison of F1-scores across the five categories for the three methods is illustrated in Figure 15. It can be observed that the F1-scores of all categories improve with the introduction of CBAM and Focal Loss, among which the “environmental excitation,” “noise,” and “outlier” classes show the most significant improvements, while the “missing” and “normal” classes consistently maintain high performance. These results indicate that the proposed method achieves better overall classification accuracy and provides more reliable identification of minority and hard-to-classify samples.

The precision–recall (P-R) curves of the three methods on the same test set are shown in Figure 16. The P-R curves indicate that all three methods achieve ideal recognition performance for the “normal” and “missing” categories, with Average Precision (AP) values of 1.00, suggesting that these two classes have distinct and easily distinguishable features. As the models are progressively improved, the curves of the “environmental excitation,” “noise,” and “outlier” categories move closer to the upper-right corner, and their AP values increase from 0.95, 0.75, and 0.86 to 0.99, 0.90, and 0.96, respectively. These results demonstrate that the proposed method significantly enhances the recognition capability for minority and hard-to-classify categories, thereby achieving optimal overall classification performance.

To provide an intuitive explanation for the above quantitative improvements brought by the attention mechanism, especially the notable gains on hard-to-classify categories such as “noise” and “outlier”, attention heatmaps were further visualized based on the output feature maps of the last residual block. The heatmaps intuitively illustrate the regions of interest that the model focuses on within the image space, helping to interpret its decision-making process. Such visualizations improve the interpretability of the model.

The comparative results of attention heatmaps for different data categories are presented in Figure 17. Without the CBAM, the ResNet50 model exhibits a dispersed attention distribution, with highlighted regions scattered across the image and noticeable background interference, indicating limited focus on waveform-related discriminative regions. After introducing the CBAM, the highlighted regions become more concentrated and aligned with the main signal trajectory, effectively suppressing irrelevant background responses and emphasizing salient regions. This visualization suggests that CBAM enhances feature selectivity by reweighting informative spatial regions and channels, which is consistent with the improved F1 and increased AP values observed for minority and confusing categories.

4. Practical Application Analysis on Full-Month BSHM Data

To further validate the proposed method, the trained model was applied to one month of continuous acceleration monitoring data. On the same hardware platform, the testing of the entire month’s dataset took only five minutes.

Figure 18 compares the ground-truth monthly distribution and the model-predicted distribution for each sensor using stacked bar charts. Overall, the predicted distributions closely follow the ground truth across the 14 sensors. The “normal” class consistently dominates for all channels, with more than 550 h of data identified as “normal”, whereas abnormal patterns account for a smaller proportion. Meanwhile, the absolute difference in class proportion between the predicted and ground-truth distributions remains small for most sensors.

For specific categories, the “missing” class appears across all sensors with limited variation, which is more likely attributable to system-level effects such as communication interruptions or power-supply instability. In contrast, the “outlier” class shows noticeably higher proportions at several channels (e.g., Channels 10, 12, and 14) in both the ground truth and the predictions, implying localized instability or occasional measurement disturbances at these sensors. Overall, Figure 18 suggests that the model not only reproduces the overall category composition but also captures sensor-wise differences in anomaly occurrence.

To assess the reliability of the proposed method, all one-hour samples over the month were manually labeled and compared with the corresponding model predictions on the same timeline. Figure 19 presents the spatiotemporal distributions of the five classes, where (a) shows the ground truth and (b) shows the diagnostic results. Overall, the predicted map closely matches the ground truth in both spatial and temporal patterns, indicating that the model preserves the dominant “normal” periods as well as the major anomaly occurrences. First, the “missing” class forms a distinct block in mid-to-late April and appears simultaneously across all channels, which strongly suggests system-level interruptions such as communication failures or power-supply issues, whereas isolated short “missing” segments likely reflect occasional local sensor instability. Second, “outlier” events are more channel-dependent and mainly concentrate in Channels 10, 12, and 14, implying localized disturbances at these sensors. Third, “environmental excitation” and “noise” occur intermittently across multiple days and channels, reflecting variations in operational and ambient conditions. Overall, Figure 19 demonstrates that the proposed model can reproduce the month-scale spatiotemporal patterns of different data-quality states and thereby supports long-term monitoring analysis and subsequent data screening in practical BSHM applications.

The spatiotemporal distribution of anomaly proportions over a one-month period is presented in Figure 20. A comparison of the two maps reveals that the diagnostic results closely replicate the spatiotemporal patterns present in the actual data, indicating that the model effectively preserves the overall anomaly distribution at the monthly scale. Temporally, both distributions exhibit a clear diurnal pattern: elevated anomaly proportions are consistently concentrated during nighttime and early-morning hours (approximately 00:00–06:00), while anomaly proportions remain low during daytime periods when train operations are frequent. This temporal correspondence between the actual and diagnostic results confirms that the model captures the cyclic nature of anomaly occurrence related to operational conditions. Spatially, the two distributions consistently identify the same channels as exhibiting higher anomaly susceptibility. Channels 10, 12, and 14 show persistently elevated anomaly proportions during nighttime hours in both the actual and diagnostic results, while other channels maintain lower levels. This cross-validation suggests that these channels experience relatively lower stability or greater sensitivity to disturbances, rather than the pattern arising from random variation or diagnostic error. The consistency between the actual and diagnostic spatial distributions demonstrates the model’s capacity for channel-wise anomaly profiling.

Detailed statistical results are provided in Table 6 and Figure 21. The diagnostic results indicate that 7.4% of the data are identified as anomaly, which is close to the actual proportion of 7.6%, with an overall accuracy of 97.16%, demonstrating that the model can achieve accurate classification in most cases. In terms of categories, the “normal” and “missing” classes achieve the best recognition performance, while the “environmental excitation,” “noise,” and “outlier” classes also maintain high recognition accuracy. Overall, the model exhibits stable performance across all data types and is capable of achieving multiclass anomaly diagnosis for bridge monitoring data.

5. Conclusions

In this paper, an approach for multiclass anomaly detection in imbalanced BSHM data is presented by integrating a CBAM into the ResNet50 framework and incorporating Focal Loss to address class imbalance. The method is validated using measured acceleration data from a long-span cable-stayed bridge. The main findings are summarized as follows:

Incorporating CBAM into ResNet50 is observed to improve recognition performance, particularly for confusing categories such as “noise” and “outlier”, leading to improved overall accuracy and more balanced class-wise F1-scores on the studied dataset.
Attention heatmap visualization suggests that CBAM helps the network emphasize waveform-related regions while suppressing irrelevant background responses, providing an intuitive interpretation for the performance gain under complex monitoring scenarios.
Using Focal Loss during training improves the classification of minority and hard-to-classify categories. In particular, the F1-score of the “noise” class increases from 0.7629 to 0.8866, and that of the “outlier” class increases from 0.9011 to 0.9278, indicating reduced performance degradation caused by class imbalance in the considered dataset.
The month-long diagnosis results show that the proposed framework can capture the overall category distribution and reveal representative spatiotemporal patterns of abnormal data, which is useful for large-scale data screening in practical monitoring tasks.

While the proposed anomaly detection framework for bridge monitoring data shows promising results, further efforts are needed to improve its practical applicability. Future work will focus on extending the approach to data from various types of sensors, aiming to achieve more comprehensive anomaly identification for structural health monitoring applications.

Author Contributions

Methodology: W.M. and Q.T.; software: L.H. and S.Z.; validation: W.M., L.H. and S.Z.; writing—original draft preparation: W.M.; writing—review and editing: Q.T. and L.H.; visualization: Q.T. and S.Z.; formal analysis: W.M. and Q.T.; funding acquisition: Q.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by the National Natural Science Foundation of China (Grant Nos. 52408314), Postdoctoral Science Foundation of China (Grant No. 2025M783334), Science and Technology Project of Guizhou Provincial Transportation Department (Grant No. 2024-122-018), and Chongqing Natural Science Foundation of China (Grant No. CSTB2022TIAD-KPX0205).

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

Author Shihao Zhang is employed by the CCCC First Highway Engineering Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Xin, J.; Tao, G.; Tang, Q.; Zou, F.; Xiang, C. Structural Damage Identification Method Based on Swin Transformer and Continuous Wavelet Transform. Intell. Robot. 2024, 4, 200–215. [Google Scholar] [CrossRef]
Wang, C.; Tang, Q.; Wu, B.; Jiang, Y.; Xin, J. Intelligent Bridge Monitoring System Operational Status Assessment Using Analytic Network-Aided Triangular Intuitionistic Fuzzy Comprehensive Model. Intell. Robot. 2025, 5, 378–403. [Google Scholar] [CrossRef]
Han, Q.; Zhao, N.; Xu, J. Recognition and Location of Steel Structure Surface Corrosion Based on Unmanned Aerial Vehicle Images. J. Civ. Struct. Health Monit. 2021, 11, 1375–1392. [Google Scholar] [CrossRef]
Xu, J.; Liu, H.; Han, Q. Blockchain Technology and Smart Contract for Civil Structural Health Monitoring System. Comput.-Aided Civ. Infrastruct. Eng. 2021, 36, 1288–1305. [Google Scholar] [CrossRef]
Ju, H.; Zhai, W.; Deng, Y.; Chen, M.; Li, A. Temperature Time-Lag Effect Elimination Method of Structural Deformation Monitoring Data for Cable-Stayed Bridges. Case Stud. Therm. Eng. 2023, 42, 102696. [Google Scholar] [CrossRef]
Fu, Y.; Peng, C.; Gomez, F.; Narazaki, Y.; Spencer, B.F., Jr. Sensor Fault Management Techniques for Wireless Smart Sensor Networks in Structural Health Monitoring. Struct. Control Health Monit. 2019, 26, e2362. [Google Scholar] [CrossRef]
Ju, H.; Deng, Y.; Zhai, W.; Li, A. Recovery of Abnormal Data for Bridge Structural Health Monitoring Based on Deep Learning and Temporal Correlation. Sens. Mater. 2022, 34, 4491. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, X.; Ding, Z.; Du, Y.; Xia, Y. Anomaly Detection of Sensor Faults and Extreme Events Based on Support Vector Data Description. Struct. Control Health Monit. 2022, 29, e3047. [Google Scholar] [CrossRef]
Ruff, L.; Kauffmann, J.R.; Vandermeulen, R.A.; Montavon, G.; Samek, W.; Kloft, M.; Dietterich, T.G.; Müller, K.-R. A Unifying Review of Deep and Shallow Anomaly Detection. Proc. IEEE 2021, 109, 756–795. [Google Scholar] [CrossRef]
Tang, Q.; Xin, J.; Jiang, Y.; Wang, K.; Zhou, J. Efficient Assessment Method for Structural Safety of Long-Span Arch Bridges Using Subset Simulation and Copula Model. Appl. Math. Model. 2026, 154, 116726. [Google Scholar] [CrossRef]
Pang, G.; Shen, C.; Cao, L.; Hengel, A.V.D. Deep Learning for Anomaly Detection: A Review. ACM Comput. Surv. (CSUR) 2021, 54, 38. [Google Scholar] [CrossRef]
Gul, M.; Necati Catbas, F. Statistical Pattern Recognition for Structural Health Monitoring Using Time Series Modeling: Theory and Experimental Verifications. Mech. Syst. Signal Process. 2009, 23, 2192–2204. [Google Scholar] [CrossRef]
Hernandez-Garcia, M.R.; Masri, S.F. Multivariate Statistical Analysis for Detection and Identification of Faulty Sensors Using Latent Variable Methods. Adv. Sci. Technol. 2008, 56, 501–507. [Google Scholar] [CrossRef]
Zhang, H.; Lin, J.; Hua, J.; Gao, F.; Tong, T. Data Anomaly Detection for Bridge SHM Based on CNN Combined with Statistic Features. J. Nondestruct. Eval. 2022, 41, 28. [Google Scholar] [CrossRef]
Jian, X.; Zhong, H.; Xia, Y.; Sun, L. Faulty Data Detection and Classification for Bridge Structural Health Monitoring via Statistical and Deep-learning Approach. Struct. Control Health Monit. 2021, 28, e2824. [Google Scholar] [CrossRef]
Zhang, Y.; Tang, Z.; Yang, R. Data Anomaly Detection for Structural Health Monitoring by Multi-View Representation Based on Local Binary Patterns. Measurement 2022, 202, 111804. [Google Scholar] [CrossRef]
Bao, Y.; Tang, Z.; Li, H.; Zhang, Y. Computer Vision and Deep Learning–Based Data Anomaly Detection Method for Structural Health Monitoring. Struct. Health Monit. 2019, 18, 401–421. [Google Scholar] [CrossRef]
Wang, H.; Bah, M.J.; Hammad, M. Progress in Outlier Detection Techniques: A Survey. IEEE Access 2019, 7, 107964–108000. [Google Scholar] [CrossRef]
Yang, J.; Yang, F.; Zhang, L.; Li, R.; Jiang, S.; Wang, G.; Zhang, L.; Zeng, Z. Bridge Health Anomaly Detection Using Deep Support Vector Data Description. Neurocomputing 2021, 444, 170–178. [Google Scholar] [CrossRef]
Kao, J.-B.; Jiang, J.-R. Anomaly Detection for Univariate Time Series with Statistics and Deep Learning. In Proceedings of the 2019 IEEE Eurasia Conference on IOT, Communication and Engineering (ECICE), Yunlin, Taiwan, 3–6 October 2019; IEEE: New York, NY, USA, 2019; pp. 404–407. [Google Scholar]
Ye, X.; Wu, P.; Liu, A.; Zhan, X.; Wang, Z.; Zhao, Y. A Deep Learning-Based Method for Automatic Abnormal Data Detection: Case Study for Bridge Structural Health Monitoring. Int. J. Struct. Stab. Dyn. 2023, 23, 2350131. [Google Scholar] [CrossRef]
Yan, R.; Ma, Z.; Kokogiannakis, G.; Zhao, Y. A Sensor Fault Detection Strategy for Air Handling Units Using Cluster Analysis. Autom. Constr. 2016, 70, 77–88. [Google Scholar] [CrossRef]
Titouna, C.; Aliouat, M.; Gueroui, M. Outlier Detection Approach Using Bayes Classifiers in Wireless Sensor Networks. Wirel. Pers. Commun. 2015, 85, 1009–1023. [Google Scholar] [CrossRef]
Warriach, E.U.; Tei, K. Fault Detection in Wireless Sensor Networks: A Machine Learning Approach. In Proceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering, Sydney, Australia, 3–5 December 2013; IEEE: New York, NY, USA, 2013; pp. 758–765. [Google Scholar]
Zidi, S.; Moulahi, T.; Alaya, B. Fault Detection in Wireless Sensor Networks Through SVM Classifier. IEEE Sens. J. 2018, 18, 340–347. [Google Scholar] [CrossRef]
Saeed, U.; Lee, Y.-D.; Jan, S.U.; Koo, I. CAFD: Context-Aware Fault Diagnostic Scheme towards Sensor Faults Utilizing Machine Learning. Sensors 2021, 21, 617. [Google Scholar] [CrossRef]
Chou, J.-Y.; Fu, Y.; Huang, S.-K.; Chang, C.-M. SHM Data Anomaly Classification Using Machine Learning Strategies: A Comparative Study. Smart Struct. Syst. 2022, 29, 77–91. [Google Scholar] [CrossRef]
Jiang, H.; Ge, E.; Wan, C.; Li, S.; Quek, S.T.; Yang, K.; Ding, Y.; Xue, S. Data Anomaly Detection with Automatic Feature Selection and Deep Learning. Structures 2023, 57, 105082. [Google Scholar] [CrossRef]
Ni, F.; Zhang, J.; Noori, M.N. Deep Learning for Data Anomaly Detection and Data Compression of a Long-span Suspension Bridge. Comput. Civ. Infrastruct. Eng. 2020, 35, 685–700. [Google Scholar] [CrossRef]
Deng, Y.; Zhao, Y.; Ju, H.; Yi, T.-H.; Li, A. Abnormal Data Detection for Structural Health Monitoring: State-of-the-Art Review. Dev. Built Environ. 2024, 17, 100337. [Google Scholar] [CrossRef]
Hossain, M.S.; Betts, J.M.; Paplinski, A.P. Dual Focal Loss to Address Class Imbalance in Semantic Segmentation. Neurocomputing 2021, 462, 69–87. [Google Scholar] [CrossRef]
Chalapathy, R.; Chawla, S. Deep Learning for Anomaly Detection: A Survey. arXiv 2019, arXiv:1901.03407. [Google Scholar] [CrossRef]
Gao, K.; Chen, Z.-D.; Weng, S.; Zhu, H.-P.; Wu, L.-Y. Detection of Multi-Type Data Anomaly for Structural Health Monitoring Using Pattern Recognition Neural Network. Smart Struct. Syst. 2022, 29, 129–140. [Google Scholar] [CrossRef]
Liu, Y.; Di, S. Spatio-Temporal Variational Reconstruction Deep Support Vector Data Description for Anomaly Detection in Bridge SHM Data. Mech. Syst. Signal Process. 2026, 244, 113767. [Google Scholar] [CrossRef]
Qu, C.-X.; Yang, Y.-T.; Zhang, H.-M.; Yi, T.-H.; Li, H.-N. Two-Stage Anomaly Detection for Imbalanced Bridge Data by Attention Mechanism Optimisation and Small Sample Augmentation. Eng. Struct. 2025, 327, 119613. [Google Scholar] [CrossRef]
Schlegl, T.; Seeböck, P.; Waldstein, S.M.; Schmidt-Erfurth, U.; Langs, G. Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery. In Information Processing in Medical Imaging, Proceedings of the 25th International Conference, IPMI 2017, Boone, NC, USA, 25–30 June 2017; Springer: Cham, Switzerland, 2017; Volume 10265, pp. 146–157. [Google Scholar]
Akcay, S.; Atapour-Abarghouei, A.; Breckon, T.P. Ganomaly: Semi-Supervised Anomaly Detection via Adversarial Training. In Computer Vision–ACCV 2018, Proceedings of the 14th Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018; Springer: Cham, Switzerland, 2018; pp. 622–637. [Google Scholar]
Mutlu, U.; Alpaydın, E. Training Bidirectional Generative Adversarial Networks with Hints. Pattern Recognit. 2020, 103, 107320. [Google Scholar] [CrossRef]
Mao, J.; Wang, H.; Spencer, B.F., Jr. Toward Data Anomaly Detection for Automated Structural Health Monitoring: Exploiting Generative Adversarial Nets and Autoencoders. Struct. Health Monit. 2021, 20, 1609–1626. [Google Scholar] [CrossRef]
Deng, Y.; Ju, H.; Zhong, G.; Li, A.; Ding, Y. A General Data Quality Evaluation Framework for Dynamic Response Monitoring of Long-Span Bridges. Mech. Syst. Signal Process. 2023, 200, 110514. [Google Scholar] [CrossRef]
Liu, G.; Niu, Y.; Zhao, W.; Duan, Y.; Shu, J. Data Anomaly Detection for Structural Health Monitoring Using a Combination Network of GANomaly and CNN. Smart Struct. Syst. 2022, 29, 53–62. [Google Scholar] [CrossRef]
Shajihan, S.A.V.; Wang, S.; Zhai, G.; Spencer, B.F., Jr. CNN Based Data Anomaly Detection Using Multi-Channel Imagery for Structural Health Monitoring. Smart Struct. Syst. 2022, 29, 181–193. [Google Scholar] [CrossRef]
Du, Y.; Li, L.; Hou, R.; Wang, X.; Tian, W.; Xia, Y. Convolutional Neural Network-Based Data Anomaly Detection Considering Class Imbalance with Limited Data. Smart Struct. Syst. 2022, 29, 63–75. [Google Scholar] [CrossRef]
Zhao, M.; Sadhu, A.; Capretz, M. Multiclass Anomaly Detection in Imbalanced Structural Health Monitoring Data Using Convolutional Neural Network. J. Infrastruct. Preserv. Resil. 2022, 3, 10. [Google Scholar] [CrossRef]
Zhu, Q.; Wu, Q.; Yue, Y.; Bao, Y.; Zhang, T.; Wang, X.; Jiang, Z.; Chen, H. Vision Transformer–Based Anomaly Detection Method for Offshore Platform Monitoring Data. Struct. Control Health Monit. 2024, 2024, 1887212. [Google Scholar] [CrossRef]
Tang, Z.; Chen, Z.; Bao, Y.; Li, H. Convolutional Neural Network-Based Data Anomaly Detection Method Using Multiple Information for Structural Health Monitoring. Struct. Control Health Monit. 2019, 26, e2296. [Google Scholar] [CrossRef]
Pan, Q.; Bao, Y.; Li, H. Transfer Learning-Based Data Anomaly Detection for Structural Health Monitoring. Struct. Health Monit. 2023, 22, 3077–3091. [Google Scholar] [CrossRef]
Ridnik, T.; Lawen, H.; Noy, A.; Ben Baruch, E.; Sharir, G.; Friedman, I. Tresnet: High Performance Gpu-Dedicated Architecture. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Online, 5–9 January 2021; IEEE: New York, NY, USA, 2021; pp. 1400–1409. [Google Scholar]
Penugonda, G.; Singamaneni, R.; Kalyani, A.L. A Comparative Study for Monocot Remembrance Using VGG16, EfficientNet, InceptionV3, and ResNet50 on Accuracy and Response Time. In Proceedings of the 2023 12th International Conference on System Modeling & Advancement in Research Trends (SMART), Moradabad, India, 22–23 December 2023; IEEE: New York, NY, USA, 2023; pp. 218–224. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE: New York, NY, USA, 2016; pp. 770–778. [Google Scholar]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer Natura: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar]

Figure 1. Overall flowchart of the proposed method.

Figure 2. Structure of the residual block.

Figure 3. Overall architecture of the ResNet50 network.

Figure 4. Structure of the channel attention module.

Figure 5. Structure of the spatial attention module.

Figure 6. CBAM embedded in the ResNet50 residual block.

Figure 7. Accelerometer deployment diagram.

Figure 8. Examples of data visualization for channels 1, 3, and 5.

Figure 9. Typical samples of each data category.

Figure 10. Training process loss and accuracy.

Figure 11. Confusion matrix: (a) training; (b) validation (The numbers 1 to 5 represent environmental excitation, noise, missing, normal, and outlier, respectively).

Figure 12. Confusion matrix of the test set.

Figure 13. F1-scores of different data categories in the test set.

Figure 14. Confusion matrix of the test set: (a) Method 1; (b) Method 2; (c) Proposed Method.

Figure 15. Comparison of F1-scores of different methods across five data categories.

Figure 16. P-R curves on the test set: (a) Method 1; (b) Method 2; (c) Proposed Method.

Figure 17. Attention heatmaps for anomaly detection: (a) Original image; (b) Method 1; (c) Method 2. Warmer colors denote stronger attention intensity, whereas cooler colors denote weaker attention intensity.

Figure 18. Category distribution of the monthly acceleration data for each sensor: (a) actual results; (b) diagnostic results.

Figure 19. Overall spatiotemporal distribution of the monthly data: (a) actual results; (b) diagnostic results.

Figure 20. Distribution of anomaly proportions for the entire month: (a) actual results; (b) diagnostic results.

Figure 21. Confusion matrix of the diagnostic results for the entire month.

Table 1. Location and direction of the accelerometer.

Channel Number	Location	Direction
1–2	The main beam at the 1/2 span of the left-side span	X, Z
3–4	The main beam at the 1/5 span of the main span	X, Z
5–6	The main beam at the 2/5 span of the main span	X, Z
7–8	The main beam at the 7/10 span of the main span	X, Z
9–10	The main beam at the 1/2 span of the right-side span	X, Z
11–12	The top of the 1# tower	X, Y
13–14	The top of the 2# tower	X, Y

Table 2. Data feature description and sample size.

Data Category	Image Description	Sample Size
Environmental excitation	Irregular fluctuations in amplitude	1367
Noise	Small and stable amplitude without fluctuation	462
Missing	Mostly blank or constant-value image	296
Normal	Large, periodic oscillations around the center line	7795
Outlier	One or more extreme values appear	496

Table 3. Classification results of VGG16 and DenseNet121.

Output	VGG16			DenseNet121
Output	Precision	Recall	F1-Score	Precision	Recall	F1-Score
Environmental excitation	78.95	98.54	87.66	95.54	78.10	85.94
Noise	93.33	59.57	72.73	78.00	76.47	77.23
Missing	100	100	100	100	100	100
Normal	99.61	98.85	99.23	99.62	100	99.81
Outlier	100	78.00	87.64	95.92	94.00	94.95
Macro-average	94.37	86.99	89.45	93.82	89.71	91.59
Accuracy	96.07			96.36

Note: The unit of precision, recall, F1-scores, macro-average and accuracy is %.

Table 4. Classification results of EfficientNet-B0 and proposed method.

Output	EfficientNet-B0			Proposed Method
Output	Precision	Recall	F1-Score	Precision	Recall	F1-Score
Environmental excitation	93.60	85.40	89.31	96.27	94.16	95.20
Noise	72.22	82.98	77.23	86.00	91.49	88.66
Missing	100	100	100	100	100	100
Normal	99.62	100	99.81	99.49	99.87	99.68
Outlier	88.46	92.00	90.20	95.74	90.00	92.78
Macro-average	90.78	92.08	91.31	95.50	95.10	95.26
Accuracy	96.93			98.28

Note: The unit of precision, recall, F1-scores, macro-average and accuracy is %.

Table 5. Comparison of classification results among different methods.

Output	Method 1		Method 2		Proposed Method
Output	Precision	Recall	Precision	Recall	Precision	Recall
Environmental excitation	79.38	92.70	89.21	90.51	96.27	94.16
Noise	68.75	46.81	74.00	78.72	86.00	91.49
Missing	100	100	96.77	100	100	100
Normal	99.87	98.97	99.49	99.87	99.49	99.87
Outlier	83.67	82.00	100	82.00	95.74	90.00
Macro-average	86.33	84.10	91.89	90.22	95.50	95.10
Accuracy	95.02		96.84		98.28

Note: The unit of precision, recall, macro-average and accuracy is %.

Table 6. Statistical comparison between actual and diagnostic results of data categories.

Data Category	Quantity		Proportion of Total Data (%)		Proportion Within Anomaly Data (%)
Data Category	Diagnostic Results	Actual Values	Diagnostic Results	Actual Values	Diagnostic Results	Actual Values
Normal	7828	7795	75.2	74.8	-	-
Environmental excitation	1373	1367	13.2	13.1	-	-
Noise	442	462	4.2	4.4	-	-
Missing	303	296	2.9	2.8	39.2	37.4
Outlier	470	496	4.5	4.8	60.8	62.6
Subtotal (Anomaly Data)	773	792	7.4	7.6	100.0	100.0
Total	10416	10416	100.0	100.0	-

Note: Only the missing and outlier classes are included in the anomaly data statistics.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ma, W.; Tang, Q.; Huang, L.; Zhang, S. Multiclass Anomaly Detection in Bridge Health Monitoring Data via Attention Enhancement and Class Imbalance Mitigation. Buildings 2026, 16, 1181. https://doi.org/10.3390/buildings16061181

AMA Style

Ma W, Tang Q, Huang L, Zhang S. Multiclass Anomaly Detection in Bridge Health Monitoring Data via Attention Enhancement and Class Imbalance Mitigation. Buildings. 2026; 16(6):1181. https://doi.org/10.3390/buildings16061181

Chicago/Turabian Style

Ma, Wenda, Qizhi Tang, Lei Huang, and Shihao Zhang. 2026. "Multiclass Anomaly Detection in Bridge Health Monitoring Data via Attention Enhancement and Class Imbalance Mitigation" Buildings 16, no. 6: 1181. https://doi.org/10.3390/buildings16061181

APA Style

Ma, W., Tang, Q., Huang, L., & Zhang, S. (2026). Multiclass Anomaly Detection in Bridge Health Monitoring Data via Attention Enhancement and Class Imbalance Mitigation. Buildings, 16(6), 1181. https://doi.org/10.3390/buildings16061181

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multiclass Anomaly Detection in Bridge Health Monitoring Data via Attention Enhancement and Class Imbalance Mitigation

Abstract

1. Introduction

2. Proposed Method

2.1. Overview of Proposed Framework

2.2. Model Architecture Design

2.2.1. Backbone Network Architecture

2.2.2. CBAM: Channel and Spatial Attention Mechanisms

2.3. Loss Function

2.4. Evaluation Indicator

3. Case Study

3.1. Data Description

3.2. Data Visualization and Ground-Truth Labeling

3.3. Model Training and Testing

3.3.1. Training Result Analysis

3.3.2. Testing Result Analysis

3.4. Comparative Study

3.4.1. Comparison with Traditional Deep CNN

3.4.2. Comparison with Different Methods

4. Practical Application Analysis on Full-Month BSHM Data

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI