1. Introduction
The safe operation and maintenance of civil infrastructure, particularly the sustainability and service life extension of historic or aging timber truss bridges, represents a critical challenge in contemporary engineering [
1]. Timber structures are susceptible to damage such as localized decay, bolt loosening, and micro-cracks due to material aging, environmental erosion, and long-term loading. Failure to identify such damage in a timely manner may lead to severe consequences [
2]. Structural health monitoring provides an essential means for real-time condition assessment and early warning of structures. However, SHM faces fundamental challenges in practical applications [
3,
4,
5,
6,
7,
8]: environmental and operational variations can induce significant alterations in vibration response signals, which often mask the subtle characteristic changes caused by early-stage, minor damage. Moreover, inconsistencies in sensor type, location, and density during actual deployment further complicate the reliable extraction of damage-sensitive features from data. Therefore, developing a damage identification method that is robust to environmental disturbances and sensor location variations is a bottleneck issue for enhancing the practical effectiveness of SHM.
Traditional vibration-based damage identification methods have been widely applied in SHM. Based on their feature extraction and modeling strategies, these methods can be broadly categorized into three types. Modal parameter-based methods identify damage by analyzing changes in structural frequencies, mode shapes, or damping. While these methods offer clear physical meaning, they often exhibit limited sensitivity to early-stage minor damage and are susceptible to interference from global factors such as ambient temperature variations. Damage induces changes in the physical parameters of a structure, leading to alterations in modal parameters including frequency, flexibility, stiffness, and mode shapes. Consequently, modal parameter-based approaches have been extensively explored for damage identification. Yan et al. [
9] provided a comprehensive review of vibration-based structural damage identification methods, comparing the advantages and limitations of various techniques, including those based on frequency, flexibility, mode shapes, and modal strain energy. Structural damage typically reduces stiffness, resulting in changes in natural frequencies. Salawu [
10] presented a detailed review of frequency-based damage identification methods. Pandey et al. [
11] proposed a flexibility-based method based on the principle that structural flexibility changes upon damage occurrence. West et al. [
12] introduced a mode shape-based method by comparing mode shapes before and after damage.
Signal processing and statistical feature-based methods, such as wavelet transform, principal component analysis, and autoregressive models, aim to extract statistical quantities or transform-domain features that reflect changes in structural state. However, these methods often suffer from information loss during feature construction, which compromises the interpretability of the results. Xin et al. [
13] proposed a structural damage identification method combining a Swin Transformer and continuous wavelet transform. This approach first converts raw structural vibration data into time-frequency images using continuous wavelet transform to capture damage characteristics information, then employs Swin Transformer for hierarchical learning on these two-dimensional images to achieve efficient damage identification. Nouri et al. [
14] successfully detected damage in timber bridges by integrating Fourier decomposition, time series modeling, and machine learning methods. Oliver et al. [
15] proposed a wavelet transform-based damage identification method for laminated composite beams using modal and strain data, achieving high-quality damage detection. Liu et al. [
16] developed a damage identification method based on extended Kalman filtering and response reconstruction, utilizing extended Kalman filters for signal filtering analysis and orthogonal matching pursuit algorithms for response reconstruction, significantly improving damage localization accuracy.
Machine learning-based methods employ various classifiers for pattern recognition on extracted features. Numerous recent studies have explored machine learning applications in SHM. For instance, Bayane et al. [
17] applied unsupervised anomaly detection algorithms to bridge strain and acceleration data, constructing multiple feature matrices and evaluating five detection algorithms. Ren et al. [
18] utilized support vector machines to identify damaged cables from strain responses of cable-stayed bridges, proposing a two-step method for damage localization and severity estimation. Ghiasi et al. [
19] employed K-nearest neighbors classifiers to detect section loss caused by corrosion, achieving significantly improved detection accuracy. Soleimani et al. [
20] used random forests to assess the importance of modeling parameters in seismic demand estimation. Flah et al. [
21] provided a systematic review of machine learning algorithms in civil structural health monitoring, confirming their effectiveness across various structure types including bridges, buildings, and dams. Lim et al. [
22] applied extreme gradient boosting to evaluate damage conditions of different bridge types using data from a bridge management system. Chang et al. [
23] proposed stochastic deterioration models based on Markov chains and classification trees to address transition probability matrix estimation under limited data conditions. Sun et al. [
24] developed a two-stage detection method combining Bayesian fusion and rough set theory, achieving efficient localization and assessment of multiple damages in bridges.
Despite the significant progress made by these methods, they often exhibit limitations when confronting complex environmental disturbances: some are overly sensitive to environmental variations, leading to elevated false alarm rates; others require extensive historical data covering diverse environmental conditions for modeling, incurring high costs. Their sensitivity is particularly limited when identifying early-stage, minor damage. Furthermore, many methods rely on specific, consistent sensor deployment schemes during feature extraction, and their identification performance and generalization capability may significantly deteriorate when the sensor network changes.
In response to these challenges, recent research has advanced toward more sophisticated and practical directions, particularly in handling real-world data and improving model reliability and interpretability. For instance, interpretable surrogate modeling techniques have emerged as a research focus in engineering failure analysis. These approaches aim to construct transparent models that not only achieve predictive accuracy but also reveal the physical logic between input features and structural responses, providing credible evidence for damage-related decisions [
25]. Furthermore, data-driven frameworks integrating multi-source authentic data are gaining traction. Such frameworks, by incorporating diverse data types and sources, can more comprehensively and robustly characterize the performance evolution of complex structures under realistic operational conditions [
26,
27]. At the algorithmic level, deep learning methods combined with Bayesian optimization offer new solutions for hyperparameter tuning, small-sample learning, and uncertainty quantification, significantly enhancing the generalization capability and efficiency of models in complex structural damage identification tasks [
28]. These cutting-edge explorations collectively point toward several key characteristics for future SHM methodologies: strong interpretability, multi-source data fusion capability, and robustness to real-world complexity and uncertainty.
Exploring novel approaches that directly mine discriminative features from raw dynamic response time series, while remaining relatively robust to environmental variations and sensor configuration changes, has become crucial for addressing current bottlenecks. Time series analysis techniques offer potential in this regard, as their core lies in discovering local morphological patterns from data that can characterize essential differences in structural states, rather than relying on indirect inferences from global statistics or model parameters. The Shapelet Transform represents a time series feature extraction method with such potential [
29]. Its core concept involves identifying the “most discriminative subsequences” within time series, termed Shapelets—continuous subsequences that represent local shape characteristics of a particular class while distinctly contrasting with other classes. By computing distances between original sequences and a set of selected Shapelets, the Shapelet Transform transforms raw data into a discriminative feature space based on “shape similarity.” The key advantages of this method are twofold: the extracted features are themselves segments of the original signal, possessing intuitive physical or geometric meaning, thereby significantly enhancing model interpretability; simultaneously, it directly captures local morphological differences and is relatively insensitive to global amplitude variations potentially caused by environmental factors, aligning closely with the physical nature of damage-induced local dynamic characteristic changes emphasized in SHM [
30]. Leveraging these advantages, Shapelet-based algorithms have been widely applied in various domains, including thunderstorm identification [
31], earthquake and wind-wave prediction [
32], sensor anomaly detection [
33], motion capture [
34,
35], and medical diagnosis [
36,
37], demonstrating their effectiveness in extracting robust local features from noisy data.
However, introducing the Shapelet Transform into the civil engineering SHM domain, particularly for timber structures with more complex materials and configurations, presents several critical challenges requiring investigation. First, the material nonlinearity, connection complexity, and environmental sensitivity of timber structures result in more intricate vibration response characteristics compared to concrete or steel structures [
38,
39], necessitating thorough validation of the Shapelet Transform applicability and effectiveness in such scenarios. Second, existing studies have largely failed to systematically evaluate the comprehensive robustness of Shapelet Transform methods under multiple real-world challenges, including environmental disturbances, minor damage, and variations in sensor type and location [
40], although some research has begun addressing environmental robustness issues in timber structure damage identification [
41]. Third, the configuration of key parameters in Shapelet Transform methods significantly impacts final identification accuracy. However, systematic investigations into parameter effects and selection criteria tailored to civil engineering SHM scenarios remain lacking. While parameter studies in general time series classification [
42] have demonstrated the importance of parameter selection, whether their conclusions can be directly transferred to civil SHM contexts requires further verification.
Addressing these challenges, this paper proposes a damage identification framework combining the Shapelet Transform with a random forest classifier, aiming to provide a novel approach with high interpretability and robustness for anomaly detection in SHM data. As a representation technique based entirely on local shapes of time series, the Shapelet Transform captures distinctive local waveform patterns in sensor anomaly data, while the random forest classifier utilizes these morphological features to effectively identify and classify different anomaly patterns within large SHM databases. Based on actual measurement random vibration response data from a timber truss bridge, this study systematically investigates the recognition performance of the proposed method under various conditions, including different damage severities, sensor locations, and environmental variations. The experimental results demonstrate that the proposed method not only achieves high-precision damage identification but also exhibits sensitivity to early-stage minor damage and robustness to sensor location and environmental disturbances, offering a new technical pathway for developing interpretable and adaptable structural health monitoring methodologies.
The structure of this paper is as follows.
Section 2 provides an overview of the fundamental principles and algorithmic process of the Shapelet Transform.
Section 3 introduces the timber truss bridge SHM dataset employed in this study, systematically presenting damage identification results under different damage severities, sensor locations, and environmental influences, as well as comparing the damage identification performance of different classifiers.
Section 4 discusses the influence of different Shapelet extraction times on identification accuracy, analyzes the reasons for performance differences between random forest, KNN, and SVM under various operating conditions, and conducts in-depth analysis of morphological characteristics of selected Shapelet waveforms, exploring their physical correlation with structural damage severity.
Section 5 summarizes the main conclusions of this study, identifies the advantages and limitations of the proposed method, and presents prospects for its future application in real-time monitoring.
4. Discussion
4.1. Overall Identification Performance Analysis
This study achieves satisfactory identification results in the timber truss bridge damage identification task by combining the Shapelet Transform with the Random Forest classifier. As shown in
Table 3, the Shapelet extraction time has a significant impact on identification accuracy: when the extraction time is 1 min, the average accuracy across the three operating conditions is 58.48%; at 3 min, the average accuracy increases to 89.51%; when extended to 5 min, the average accuracy improves to 93.98%; and when the extraction time reaches 10 min, the identification accuracy achieves 100% across all conditions. This result indicates that sufficient Shapelet extraction time is crucial for ensuring feature quality—longer extraction times allow the algorithm to screen local shape features with higher information gain and stronger discriminative power from the vast number of candidate subsequences, thereby constructing a more discriminative feature space.
Examining the identification results across different operating conditions, the proposed method demonstrates stable and high performance in Conditions I, II, and III.
In Condition I, the algorithm is required to identify data from different sensor locations. The average accuracy reaches 90.28% with a 5 min extraction time and achieves 100% at 10 min. This indicates that the local shape features extracted by the Shapelet Transform are insensitive to sensor deployment direction—whether sensors are positioned vertically or horizontally, the local waveform patterns excited by the same damage state exhibit similarity, and the algorithm can capture these common features across different locations.
In Condition II, the test data comprise a mixture of reference states and damage states collected on different dates. The accuracy reaches 97.22% with a 5 min extraction time and also achieves 100% at 10 min. This result validates the excellent robustness of the proposed method against environmental factors. Although differences in temperature and humidity across different dates may cause shifts in the overall dynamic characteristics of the structure, the Shapelet algorithm, by focusing on local waveform morphology, can effectively distinguish global changes induced by environmental factors from local anomalies caused by damage, thereby maintaining stable identification performance.
In Condition III, the algorithm is required to simultaneously distinguish among six categories, including the healthy state and five different damage severities, representing a typical fine-grained multi-class classification task. The accuracy reaches 94.44% with a 5 min extraction time and also achieves 100% at 10 min. Of particular note, the algorithm successfully distinguishes the minimum simulated damage, which represents only 0.07% of the total structural mass, from both the healthy state and other damage severities. This demonstrates the method’s exceptional sensitivity to early-stage minor damage—a sensitivity derived from the Shapelet Transform’s capability to finely capture local waveform details and extract subtle feature differences caused by minor mass variations from the noise background. Furthermore, the algorithm successfully differentiates among five distinct damage severities, demonstrating that the Shapelet feature space possesses the capability for the fine-grained quantification of damage severity beyond merely discriminating the presence or absence of damage.
The following
Section 4.2 and
Section 4.3 will provide in-depth analyses of the performance advantages of the proposed method from two dimensions: classifier selection and Shapelet waveform morphology, respectively.
4.2. Correlation Between Shapelet Waveform Morphology and Physical Damage
This section further explores the inherent relationship between the morphological characteristics of Shapelet waveforms and changes in the physical state of the structure. From the observations in
Section 3.2, it can be found that under the same acquisition date, Shapelet waveforms corresponding to the healthy state exhibit the highest density of peak counts, while those corresponding to damaged states show significantly reduced peak counts, with this decreasing trend generally consistent with the escalation of damage severity. This phenomenon can be explained from the perspective of structural dynamics.
When the structure is in a healthy state, it possesses relatively high local stiffness and low damping, resulting in higher frequency oscillations in response to external excitation, which manifests as a denser distribution of peaks in the time-domain waveform. As damage is introduced, the local dynamic characteristics of the structure undergo changes: the increase in equivalent mass reduces local modal frequencies, while the decrease in stiffness also leads to frequency reduction, accompanied by potential increases in damping. The combined effect of these changes prolongs the oscillation period of the vibration response, correspondingly reducing the number of peaks. Therefore, the decreasing trend in peak count can be regarded as a quantitative characterization of escalating damage severity in the time-domain waveform.
It is noteworthy that the differences in peak counts of healthy states on different dates shown in
Figure 7 reveal the significant influence of environmental factors on waveform morphology. Variations in environmental conditions such as temperature and humidity affect the elastic modulus of timber and the frictional characteristics of connection nodes, thereby causing overall shifts in structural dynamic properties. However, within the same date, the differences in peak counts between healthy and damaged states remain significant, indicating that the Shapelet algorithm can effectively distinguish global changes induced by environmental factors from local changes caused by damage—the former may lead to overall shifts in waveform morphology, while the latter are reflected in specific morphological alterations of local subsequences.
The findings hold important engineering implications. On one hand, intuitive morphological indicators such as peak count can serve as auxiliary criteria to help engineers understand why the algorithm identifies a particular signal segment as “damage.” On the other hand, this pattern also provides insights for developing more lightweight damage identification methods in the future—for instance, simple features such as peak density statistics of local waveforms could be used for preliminary screening, followed by precise classification using the Shapelet Transform.
In summary, the Shapelet Transform is not merely a data-driven feature extraction method; the local subsequences it selects inherently carry morphological information closely related to changes in the physical state of the structure. This “features-as-explanation” characteristic constitutes the core advantage of the proposed method compared to traditional “black-box” models.
4.3. Analysis of Classifier Performance Differences
The comparative results presented in
Section 3.3 demonstrate that, based on the same Shapelet features, the recognition performance of different classifiers varies significantly. The reasons can be attributed to the following aspects:
First, the ensemble mechanism of Random Forest effectively enhances generalization capability. Random Forest constructs multiple decision trees through bootstrap sampling and integrates their voting results, which effectively reduces model variance and avoids overfitting to specific noise in the training data [
46]. This characteristic is particularly prominent under Condition II—when the test data contains environmental fluctuations from different dates, the accuracies of KNN and SVM drop to 67.25% and 70.62%, respectively, while Random Forest maintains a high accuracy of 97.22%, demonstrating the robustness of its ensemble mechanism against environmental disturbances.
Second, the high-dimensional nature of the Shapelet feature space imposes different requirements on classifiers. The Shapelet Transform converts the original time series into vectors composed of distances to multiple Shapelets. This feature space is typically high-dimensional and exhibits complex nonlinear relationships among features. Random Forest, with its decision tree-based algorithmic structure, can naturally handle high-dimensional features and identify the most critical Shapelets for classification through its feature selection mechanism. In contrast, KNN is susceptible to the “curse of dimensionality” in high-dimensional spaces, leading to diminished discriminative power of distance metrics, which explains its accuracy of only 58.33% under Condition III. Although SVM can address nonlinear problems through kernel functions, its performance heavily depends on parameter tuning and struggles to achieve optimality with limited samples.
Third, Random Forest forms a synergy with the interpretability of Shapelet features. Random Forest can output feature importance rankings, helping to identify which Shapelets are most critical for damage identification. This characteristic complements the inherent interpretability advantage of the Shapelet Transform, enabling the entire methodological framework to not only achieve high accuracy but also provide intuitive physical evidence for engineering decisions. In contrast, neither KNN nor SVM can offer comparable interpretability support in this dimension.
In summary, the classifier comparison results empirically validate the rationality of selecting Random Forest in this study. It not only maximizes the utilization of discriminative information from Shapelet features but also demonstrates significant performance advantages under real-world challenges such as environmental disturbances, sensor variations, and fine-grained multi-class recognition tasks.
5. Conclusions
This study addresses core challenges in structural health monitoring, including environmental disturbances, sensor deployment variations, and the difficulty of identifying early-stage minor damage, by proposing a novel damage identification method that combines the Shapelet Transform with a Random Forest classifier. The proposed method mines highly discriminative local shape subsequences from original vibration response time series to construct an interpretable feature space, and leverages the ensemble mechanism of Random Forest to achieve robust classification. Based on a timber truss bridge experimental dataset, the effectiveness of the method is systematically validated under various conditions involving different damage severities, sensor locations, and environmental variations. The main conclusions are as follows.
First, as a highly interpretable local feature extraction method, the Shapelet Transform demonstrates good applicability in timber structure damage identification. Experimental results show that with a Shapelet extraction time of 10 min, the method achieves 100% identification accuracy across Condition I, Condition II, and Condition III. When the extraction time is reduced to 5 min, the average accuracy across the three conditions remains at 93.98%. Even with an extraction time of only 3 min, the average accuracy maintains 89.51%. These quantitative results reveal a positive correlation between Shapelet extraction time and identification accuracy—longer extraction times allow the algorithm to screen local features with higher information gain, thereby constructing a more discriminative feature space.
Second, the proposed method exhibits significant sensitivity to early-stage minor damage and demonstrates good robustness to sensor location variations and environmental changes. For the minimum simulated damage, which represents only 0.07% of the total structural mass, the method achieves effective identification with a 5 min extraction time. Under Condition I, the algorithm stably identifies data from both vertically and horizontally deployed sensors, indicating that Shapelet features are insensitive to sensor location. Under Condition II, facing environmental fluctuations such as temperature and humidity variations inherent in reference data collected on different dates, the method achieves accuracies of 97.22% and 100% with extraction times of 5 min and 10 min, respectively, demonstrating its ability to effectively distinguish global changes caused by environmental factors from local anomalies induced by damage. Condition III requires the algorithm to simultaneously distinguish among six categories, including a healthy state and five different damage severities, representing a typical fine-grained multi-class classification task. With a 5 min extraction time, the accuracy reaches 94.44%, and with 10 min, it achieves 100%. The algorithm not only accurately distinguishes the healthy state from various damage severities but also successfully differentiates among the five damage severities, demonstrating that the Shapelet feature space possesses the capability for fine-grained quantification of damage severity beyond merely discriminating the presence or absence of damage.
Third, the selection of the Random Forest classifier is validated through quantitative comparison. Within the same Shapelet feature space (5 min extraction time), Random Forest achieves significantly higher accuracies than K-Nearest Neighbors and Support Vector Machines across Conditions I, II, and III. This comparison indicates that the ensemble mechanism of Random Forest not only effectively utilizes the high-dimensional discriminative information of Shapelet features but also exhibits significant performance advantages under real-world challenges such as environmental disturbances, sensor variations, and fine-grained multi-class classification.
Fourth, Shapelet features possess clear physical interpretability, with their waveform morphology exhibiting inherent correlation with structural damage severity. Morphological observation of selected Shapelet waveforms reveals that under the same acquisition date and identical subsequence length, the healthy state exhibits a significantly higher number of peaks compared to both the minimum damage state and the maximum damage state. As damage severity increases progressively from minimum to maximum, the peak count shows a decreasing trend. This pattern indicates that the reduction in local stiffness and increase in mass caused by damage lead to prolonged oscillation periods and reduced peak density in vibration responses—physical changes that are precisely captured as distinguishable morphological features by the Shapelet algorithm. This “features-as-explanation” characteristic constitutes a core advantage of the proposed method compared to traditional “black-box” models.
Despite these achievements, this study has certain limitations. First, the experiment simulates damage by attaching mass blocks, which differs in physical mechanisms from real progressive damage such as timber decay and crack propagation. Second, the current Shapelet discovery strategy based on an exhaustive candidate search incurs high computational costs, potentially limiting its application in ultra-long time series or real-time scenarios. Future research will proceed in the following directions: (1) further validating the effectiveness of the proposed method on bridge monitoring data containing real timber damage; (2) exploring fast Shapelet discovery algorithms based on heuristic search or learning to reduce computational costs; (3) extending the method to other material structures such as steel and concrete to verify its generalization capability; (4) developing a real-time monitoring prototype system based on an “offline training, online lightweight inference” architecture to facilitate the translation of the method into engineering practice.
In summary, this study not only validates the effectiveness of the Shapelet Transform in structural damage identification but also provides a novel technical pathway for developing interpretable, robust, and online-capable intelligent identification algorithms. The proposed methodological framework exhibits clear application potential in scenarios such as historic timber bridge monitoring, the rapid assessment of bridges in remote areas, and the intelligent upgrading of existing monitoring systems.