1. Introduction
Composite materials, as key foundational components in contemporary industrial systems, have become increasingly essential in diverse sectors such as aerospace, transportation, and civil engineering due to their outstanding characteristics, including high strength, low weight, corrosion resistance, and great design flexibility [
1,
2,
3]. However, during both the manufacturing and service life of composites, defects such as delamination, porosity, and cracking are likely to occur, which can severely reduce their performance and durability. Among these, delamination defects, which are often hidden within the internal structure and evade detection by visual inspection, pose a particularly serious risk to the long-term reliability of composite structures [
4,
5]. Consequently, accurately identifying delamination is of great engineering importance for ensuring structural safety and the dependable operation of equipment.
Currently, multiple non-destructive [
6,
7,
8] testing techniques are used to identify delamination defects in composite materials, such as ultrasonic inspection, infrared thermography, and microwave testing. Each of these methods, however, has notable limitations in delamination evaluation: ultrasonic testing depends on coupling media and faces a trade-off between penetration depth and image resolution [
9,
10]; infrared thermography requires materials with favorable thermal properties and is generally restricted to detecting defects near the surface [
11,
12,
13]; microwave testing has low sensitivity to small-scale defects, and its signals are easily influenced by changes in the material’s dielectric properties [
14,
15]. In recent years, terahertz time-domain spectroscopy (THZ-TDS) has become a focus of research and application in composite Non-Destructive Testing (NDT) owing to its distinct advantages [
16,
17]. Relative to the above techniques, THZ-TDS offers a non-contact, non-ionizing inspection approach. Terahertz radiation can readily pass through non-metallic, non-polar composites, such as GFRP, delivering high-resolution imaging and accurate characterization of internal, hidden defects. Moreover, recent studies in terahertz metamaterials and resonant structures have shown that engineered frequency-selective responses can significantly enhance the sensitivity to subtle dielectric or structural variations in composite media. For example, ultra-highly sensitive THz sensors based on electromagnetically induced transparency (EIT)-like effects have demonstrated strong resonance confinement and high-Q spectral responses, enabling precise detection of small refractive-index perturbations [
18]. Related dual-band EIT terahertz metasurface designs also exhibit strong field localization and enhanced interaction with layered media, providing valuable insights for improving damage-sensitive feature extraction in THz NDT scenarios [
19]. These works indicate that the THz frequency domain carries rich discriminative cues and should be leveraged more effectively in defect characterization. In addition to these EIT-based THz sensing structures, recent progress in all-dielectric metasurfaces has further highlighted the critical role of high-Q resonant responses in frequency-domain detection. All-dielectric platforms supporting Fano and quasi-bound-state-in-the-continuum (quasi-BIC) resonances can produce extremely sharp spectral features and strong near-field enhancement, enabling highly sensitive refractive-index and molecular detection across multiple scenarios [
20]. These studies collectively indicate that frequency-selective resonances encode rich, defect-sensitive information and provide compelling evidence that the frequency domain should be fully utilized in THz-based nondestructive evaluation.
Given these properties, THz technology has been extensively applied to detect delamination in composite materials [
21,
22]. To overcome the difficulty of thoroughly identifying and characterizing concealed multi-layer delamination in GFRP laminates, a THz-TDS imaging system was designed by Chung-Hyeon Ryu et al. [
23]. Efficient detection and imaging of the delamination’s shape, thickness, and depth location were successfully achieved by examining how terahertz pulses interact with the material. Shi et al. [
16] tackled the problem of precisely identifying internal defects during the production of GFRP. A technique that integrates cross-correlation-based extraction of pulse response functions from terahertz time-domain spectroscopy with image enhancement was proposed. The localization and imaging of deep and minute defects were achieved via this method, while the reliability and precision of non-destructive testing for composites were enhanced. To tackle the problem of automatically characterizing delamination in curved QFRP, a terahertz time-domain signal classification method based on Transformer neural networks was introduced by Liu et al. [
24]. By combining this method with a collaborative robot, an automated detection system was established that enables high-precision automatic recognition and visual positioning of pre-embedded delamination defects. Mu Da et al. [
25] investigated the problem of decreased contrast and reduced accuracy in defect characterization during terahertz imaging of internal flaws in ultra-multilayer glass fiber-reinforced polymer composites, which arise from signal attenuation and multiple reflections. They proposed a terahertz imaging characterization framework based on reconstructing the lamination structure and performing pixel-level clustering, utilizing the local symmetry of reflected pulses and the density of point clouds. Xu et al. [
26] tackled the problem of performance degradation in data-driven THZ non-destructive evaluation under changing testing conditions, which frequently causes shifts in data distributions and weakens model generalization. An intelligent THz 3D characterization framework that leverages a deep adversarial domain adaptation strategy was proposed. Through the use of unsupervised adversarial learning, discrepancies between various THz datasets were substantially reduced, enabling highly accurate automated localization and imaging of delamination defects in composite materials operating under complex conditions.
At the same time, the computer-vision community has demonstrated that adaptive feature enhancement can substantially improve robustness under challenging sensing conditions. For instance, adaptive learning filters embedded in Vision Transformers (ALF-ViT) have been shown to significantly enhance pixel-level segmentation quality in low-light or noisy environments by dynamically amplifying informative regions while suppressing background interference [
27]. These findings highlight the importance of adaptive, noise-aware, and domain-enhanced representation learning—capabilities that are likewise crucial for THz-TDS signals affected by attenuation, scattering, and multilayer reflections. The aforementioned studies have achieved certain advances, but their core approach still focuses on a single category of features. Whether improving algorithms using time-domain impulse responses or employing deep learning to classify time-domain waveforms or images, these approaches essentially extract features exclusively from the time-domain perspective. Such a one-dimensional feature representation leads to reduced accuracy and robustness when facing complex detection conditions, especially in the presence of complicated attenuation effects and noise interference [
28,
29,
30]. To tackle the aforementioned problems, several researchers have sought to improve detection accuracy through multi-domain feature fusion techniques. Tu et al. [
31] developed a multi-damage identification method based on multi-domain feature fusion combined with machine learning for terahertz non-destructive testing of defective epoxy coating structures. By extracting features from the time domain, frequency domain, and wavelet packet energy, employing an improved random forest algorithm for feature selection, and adopting a cascaded support vector machine classifier tuned through particle swarm optimization, defects of various types and severities were accurately classified and evaluated. The challenge of automatic and reliable detection of interfaces and defects in coating-to-bonding structures during terahertz signal analysis was addressed by Cui et al. [
32], and a framework integrating Residual Networks with Bidirectional Long Short-Term Memory networks was proposed. By feeding the signals into a spatio-temporal feature extraction module, their method enabled precise automatic localization of coating interfaces thicker than 80 µm, as well as high-accuracy detection of defects within the bonding layer. Although some studies have sought to integrate multi-domain features, existing methods still suffer from several limitations. On the one hand, multi-domain features are typically combined through simple concatenation, without mechanisms for deep interaction or adaptive fusion among the features [
33,
34]. On the other hand, the feature extraction process is still largely restricted to the time domain or the spatiotemporal domain and does not sufficiently leverage multi-scale information in the frequency and time–frequency domains.
TFFN for the classification of delamination defects in composite materials is proposed. A feature fusion framework integrating local time-frequency, frequency domain, and time–frequency fused representations is constructed to achieve deep and complementary integration of multi-domain information. This approach improves both the accuracy and robustness of delamination defect identification in composite materials. The main contributions are summarized as follows:
- 1.
A three-branch feature fusion framework is proposed. Features extracted from the local time-frequency, frequency domain, and time–frequency domain branches are combined, thereby addressing the limitations of single-view representations. This architecture provides a comprehensive basis for multi-domain information extraction and deep feature integration from terahertz signals.
- 2.
A frequency-domain adaptive enhancement module, named the channel–spatial–frequency attention network (CSFANet), is introduced. This module performs adaptive decomposition in the frequency domain using deformable convolutions, strengthens feature interactions through a unified attention mechanism across frequency, spatial, and channel dimensions, and applies an adaptive weighting scheme to highlight critical information while attenuating noise.
- 3.
Two feature fusion strategies are introduced. First, the Manifold Mixup method is creatively utilized in the time-frequency branch, where linear interpolation between time domain and frequency domain features is carried out in the feature space to achieve deep semantic alignment of time frequency representations, thereby boosting the robustness of defect classification. Second, a cross-branch attention module is developed to adaptively perform weighted fusion of the three-branch features, which facilitates the extraction of more discriminative fused features and further enhances classification accuracy.
4. Results and Discussion
4.1. Model Parameters
Time-frequency spectrograms of size are fed into local time-frequency and frequency-domain branch. A simplified resnet backbone is employed in each branch for hierarchical feature extraction. The raw time-domain signal with a length of 1300 () is taken as the input to time-frequency domain branch. Through above process, the local time-frequency feature map and the frequency-domain feature map are generated, both having a channel dimension of 16 (). A Transformer encoder coupled with Manifold Mixup is employed to extract fused time-frequency features, resulting in the time-frequency fusion feature . The Transformer encoder has two layers, and the number of cycles for Manifold Mixup is four ().
The hyperparameters of our neural network were set as follows: a batch size of 128, training for 100 epochs, and the Adam algorithm was selected as the optimizer. An early stopping strategy was employed, which halted the training process if the validation accuracy showed no improvement over 20 consecutive epochs. The model with the highest validation accuracy during this period was saved for final evaluation. To enhance the model’s generalization ability, a cross-entropy loss function with label smoothing (smoothing factor ) was adopted. All experiments were conducted on a single computer system running Windows 11, equipped with an Intel(R) Core(TM) Ultra 7 265KF processor, 32 GB of RAM, and an NVIDIA GeForce RTX 5070 GPU. The model was developed and tested using the PyTorch 1.7.0 deep learning framework within an Anaconda3 environment.
4.1.1. GFRP Comparative Experiment
To evaluate the effectiveness of the proposed TFFN, it was first compared with Transformer-based neural networks [
24], AlexNet [
25], ResNet-34 [
35], MCLDNN [
36], and DACNN [
37] on the glass fiber dataset. The comparison was carried out using accuracy, precision, recall, and the average F1-Score as metrics, and the corresponding confusion matrices were generated. In addition, defect visualization images were used to further demonstrate the performance of each method. To reduce the influence of non-defective background regions and enable clearer inspection of the classification results, the subsequent visualizations concentrated on the pre-embedded defect regions, namely the area enclosed by the box in
Figure 13.
As shown in
Table 1 and illustrated in
Figure 14, the proposed method attains the highest overall accuracy among all compared approaches and delivers superior performance in most categories. Specifically, it achieves the best precision for Label 0, Label 1, and Label 3 (98.01%, 99.49%, and 98.01%, respectively) and the highest recall for Label 0, Label 1, Label 3, and Label 4 (98.50%, 98.00%, 98.50%, and 99.00%, respectively). For the remaining labels, our method’s performance is very close to that of the top-performing models: its precision for Label 2 is only 0.49% lower than that of the best MCLDNN, its precision for Label 4 is just 0.37% lower than that of the best ResNet-34, and its recall for Label 5 is close to that of the best Transformer. These results suggest that our method provides a more uniformly balanced performance across all categories. The confusion matrix in
Figure 15 further details the classification outcomes: our approach shows the sharpest and most concentrated main diagonal, reflecting the fewest total misclassifications, whereas all other methods display more pronounced misclassification patterns. The F1-Score combines precision and recall into a single metric, providing a more holistic evaluation of model performance. As shown in
Table 1, our method achieves the highest F1-Score of 98.40% among all methods considered. This finding reinforces that our approach preserves high precision while also delivering strong recall, thereby achieving an effective trade-off and a well-balanced overall performance.
To evaluate the generalization ability of TFFN, data from a different composite plate that was excluded from the training process were selected as the test subject. Using label–color mapping, visual images of delamination defects were generated, where Label 0 to Label 4 correspond to normal data and Defect 1 to Defect 4, respectively. As shown in
Figure 16, misclassifications occur to varying degrees across different models.
Figure 16a–d exhibit relatively severe misidentification: in
Figure 16a, the majority of normal samples are misidentified as Defect 3; in
Figure 16b, there is a serious problem with normal samples being misidentified as Defect 4 within the Defect 4 region; in
Figure 16c,d, the recognition performance for Defects 1–3 is poor, and the identification accuracy in the main defect areas is notably lower than with other methods. In contrast, the recognition performance of
Figure 16e,f is significantly better than that of the other methods. Compared with
Figure 16e,
Figure 16f exhibits a relatively lower recognition rate in the main defect areas, especially for Defect 4, whose recognition rate is markedly lower than that in
Figure 16e. In summary, TFFN delivers outstanding performance: it captures the majority of the primary delamination defect regions while keeping the rate of false detections in non-defect areas low.
4.1.2. QFRP Comparative Experiment
The superior performance of the proposed method on glass fiber composites is verified in
Section 4.2.1. To further assess its ability to generalize to different composite materials, the method is now applied to QFRP specimens in this section. A systematic evaluation is carried out using the same baseline methods and evaluation metrics as those used for the GFRP dataset.
As shown in
Table 2 and
Figure 14, our method also demonstrates strong performance on the quartz dataset: it attains the highest overall accuracy, the highest precision for Label 0, Label 1, and Label 3, and the highest recall for Label 0 and Label 1. For categories where it is not strictly optimal, the difference from the best performing methods is very small: both precision and recall for Label 2 are only about 0.5% lower than the optimal AlexNet, and the recall for Label 3 is just 0.5% lower than the optimal AlexNet. As shown in
Figure 17, the main diagonal of our method’s confusion matrix is the sharpest and most concentrated among all methods, mirroring its behavior on the glass fiber dataset. In addition, our method achieves the highest average F1-Score across all methods, confirming its superior overall performance on the quartz dataset.
To verify the generalization capability of the proposed approach, data from an untrained composite panel were selected for testing. By applying a label-to-color mapping strategy, a hierarchical visualization of defects was obtained, in which Labels 0–3 corresponded to the normal area and Defects 1–3, respectively. This mapping allowed for a stratified visual representation of the defects, where Label 0 corresponded to the intact area and Labels 1–3 corresponded to Defects 1–3. The visualization results are shown in
Figure 18. TFFN (
Figure 18f) maintained robust recognition performance on the QFRP dataset, producing images with clear defect boundaries and complete category regions that closely align with the true defect distribution. AlexNet (
Figure 18a) likewise yielded generally good recognition outcomes. Nonetheless, relative to AlexNet, TFFN achieved higher recognition accuracy in the local areas around Label 1 and Label 3, with fewer misclassification errors. The other methods display varying levels of recognition deficiencies:
Figure 18b substantially misclassifies normal regions as defects and introduces considerable noise near Label 2 and Label 3;
Figure 18c performs poorly on deeper defects such as Label 2 and Label 3, leading to extensive missed or erroneous detections;
Figure 18e performs satisfactorily on Label 1 but still produces numerous misclassified regions around Label 2 and Label 3.
Figure 18d presents clear defect contours but has difficulty distinguishing adjacent defects, with particularly obvious misidentifications near the lowest defect of Label 2. Collectively, these findings highlight the strong generalization capability of the proposed method on the QFRP dataset.
The robust performance of TFFN is attributed to its cross-domain deep feature fusion. Through a three-branch architecture integrating local time-frequency, frequency, and time-frequency domains, local details are preserved in the local time-frequency branch; channel–spatial–frequency attention is introduced in the frequency branch to enhance damage-sensitive frequency bands; a Transformer encoder and Manifold are utilized in the time-frequency fusion branch to obtain fused features across domains; and multi-branch information is finally integrated via cross-branch attention mechanisms. This design not only enhances the robustness of defect recognition but also reduces misclassification in non-defect regions through cross-domain cross-validation, resulting in superior generalization performance. In the case of other comparative methods, either complementary time-frequency domain information is not fully integrated during feature extraction or simplistic fusion strategies lacking deep cross-domain feature integration mechanisms are employed, leading to limited generalization capabilities in complex defect recognition scenarios.
The effectiveness of the proposed cross-domain deep fusion mechanism is intuitively verified in the feature space.
Figure 19 shows the t-SNE visualization of the features extracted by TFFN from the GFRP and QFRP datasets. The horizontal and vertical axes correspond to the first and second t-SNE embedding dimensions, respectively. The coordinate values are linearly rescaled to the range [0, 1] for visualization and are unitless; they do not represent the physical size of the sample. As shown, distinct and well-separated clusters emerge for each defect class in both datasets, characterized by small intra-class distances and large inter-class separations. This indicates that the features learned by TFFN via cross-domain fusion exhibit strong discriminative capability among different categories, thereby supporting its outstanding classification accuracy and generalization performance.
4.2. Ablation Experiments
To validate the design rationale of the proposed three-branch network (TFFN) architecture, the frequency-domain enhancement module (CSFANet), and the cross-branch attention fusion mechanism, ablation studies were conducted on both the GFRP and QFRP datasets. The experiments were performed at two granularities: branch-level and module-level ablation. For branch-level ablation, the local time-frequency branch (TFFNd-ltf), the frequency-domain branch (TFFNd-f), and the time-frequency fusion branch (TFFNd-tf) were individually removed. For module-level ablation, the CSFANet module was removed (TFFNd-csfa), and the cross-branch attention fusion mechanism was replaced with simple concatenation (TFFNsim) and linear fusion (TFFNlinear), respectively, to evaluate the contribution of each component. In these ablation studies, accuracy, precision, recall, and the macro-averaged F1-Score were adopted as quantitative metrics to assess the importance of each branch and module design. Additionally, confusion matrices were used for visual analysis to complement the quantitative evaluation.
4.2.1. Branch Ablation Experiment
To validate the rationality of the branch design, a branch-level ablation study is presented in this subsection. A comparison of performance across three scenarios, each representing the removal of a different branch, is presented in
Table 3 and
Figure 20 and
Figure 21.
The removal of the local time-frequency branch (TFFNltf) led to a marked degradation in defect detection performance across both datasets. For the GFRP dataset, the overall accuracy dropped by 3.4%, and the precision for deep defects (Label 3 and Label 4) fell by 4.87% and 3.57%, respectively, accompanied by a simultaneous reduction in recall. This demonstrates that, without local time-frequency features, the model’s ability to distinguish deep defects is severely weakened. In the QFRP dataset, the effect was even more substantial: overall accuracy declined by 8.25%, and precision, recall, and F1-Score all decreased markedly. The most pronounced performance losses were observed for defect classes Label 1 and Label 2. Analysis of the confusion matrix showed a large number of missed detections and incorrect classifications for these two defect categories. Performance degradation was also observed on both datasets after removing the frequency-domain branch (TFFNf), though the specific patterns differed. For the GFRP dataset, the overall accuracy dropped by 1.2%, and the precision for Label 3 decreased from 98.01% to 95.59%. This suggests that frequency-domain features play a distinct role in boosting the accuracy of deep defect recognition. In the QFRP dataset, eliminating the frequency-domain branch led to a 5% reduction in accuracy, with Label 1 and Label 2 exhibiting the most pronounced performance declines. This reflects how crucial the frequency-domain branch is for extracting spectral features when identifying interlayer defects in quartz materials. A relatively small but consistent effect was observed when the time–frequency fusion branch (TFFNtf) was removed. For the GFRP dataset, the overall accuracy declined by just 0.6%, but the precision for Label 4 decreased by 1.91%, indicating that time–frequency domain information still offers additional classification support for deeper defects. In the QFRP dataset, accuracy decreased by 3.38%, with Label 1 and Label 2 again being the most impacted classes. This further confirms that the time–frequency domain branch contributes complementary features, particularly for mid-layer defects.
The rationality of the three-branch architectural design is validated by the observations above. Its effectiveness stems from the targeted modeling of the physical interaction between terahertz signals and defects: the local time-frequency branch captures pulse delay and morphological changes, enabling precise depth localization and detailed characterization of defects; the frequency-domain branch focuses on the spectral composition of the signal, exhibiting high sensitivity to material discrimination and boundary extraction of mid-to-surface defects; the time-frequency fusion branch enhances feature robustness in complex scenarios through joint representation. By systematically integrating the features from all three branches, effective identification of defects across different materials and varying depths is achieved.
4.2.2. Module Ablation Experiment
A module-level ablation study is conducted in this subsection to validate the design effectiveness of CSFANet and the cross-branch attention fusion mechanism. The performance differences among three configurations are compared: (1) removing CSFANet, (2) replacing the cross-branch attention with simple concatenation, and (3) replacing the cross-branch attention with linear fusion [
38]. The quantitative results are reported in
Table 4, while the corresponding visual results are illustrated in
Figure 20 and
Figure 22. In addition, several fusion strategies within the time–frequency branch, including concatenation, linear fusion, cross-attention, and Manifold Mixup, were compared with assess their impact on the overall fusion performance.
To validate the necessity of the CSFANet module, ablation experiments were conducted by removing the module. On the GFRP dataset, the impact of excluding CSFANet was relatively small: overall accuracy declined by 0.7%. The precision for Label 0 and Labels 2–4 fell by 1–2%. Interestingly, without CSFANet, the precision for Label 1 increased to 100%, but this was accompanied by a 0.5% decrease in recall. This result does not indicate a flaw in the CSFANet design; instead, it demonstrates that CSFANet serves to balance precision and recall, leading to more consistent detection performance across labels. On the QFRP dataset, removing CSFANet led to a 2.63% reduction in accuracy, along with concurrent declines in precision, recall, and F1-Score. The most notable effects were observed for shallow defects (Label 1) and medium-depth defects (Label 2), where precision dropped by 3.00% and 3.95%, respectively. This indicates that CSFANet significantly improves the classification of shallow and medium-depth defects. Overall, these findings show that CSFANet strengthens defect feature representation and is critical for accurately classifying shallow and medium-depth defects in quartz materials.
To validate the advantages of the proposed cross-branch attention mechanism, experiments were conducted comparing it against alternative fusion strategies: simple concatenation (TFFNsim) and linear fusion (TFFNlinear). On the GFRP dataset, replacing attention with simple concatenation led to only a 0.30% reduction in both accuracy and average F1-Score, indicating minimal surface-level impact. In contrast, the precision for deep defects (Label 4) declined by 2.28%, highlighting that the attention mechanism is crucial for effectively fusing features associated with deep defects. When using linear fusion, the negative effect was slightly stronger: accuracy and average F1-Score decreased by 0.40%, and the precision for deep defects (Label 4) dropped by 0.97%. From these results, it is further confirmed that complex defect characteristics are better captured by the attention mechanism than by linear fusion. The contribution of the attention mechanism is even more evident on the QFRP dataset. A 1.87% decline in accuracy, along with reductions in key metrics including precision, was observed as a result of simple concatenation. In particular, the precision for shallow defects (Label 1) fell by 2.50%, and that for medium-depth defects (Label 2) decreased by 2.51%. This indicates that the attention mechanism substantially improves the discriminative performance for shallow and medium-depth defects. Similarly, the use of linear fusion resulted in a clear drop in performance, lowering both accuracy and the average F1-Score by 1.38%. The precision of Labels 1–3 declined to different degrees, whereas the normal class (Label 0) was not impacted. These findings provide additional evidence for the importance of the cross-branch attention mechanism in achieving effective multi-branch feature fusion.
The effectiveness of the CSFANet module and the cross-branch attention mechanism is confirmed by the results above. A three-dimensional attention mechanism is utilized by the CSFANet module to accurately emphasize defect-sensitive spectra and refine feature representations, thereby improving the model’s capability to detect defects in composite materials. In comparison, straightforward concatenation and linear fusion yield only static or linear combinations of features and cannot dynamically modulate the contribution of each branch according to the spatiotemporal-frequency characteristics of the input signal. By contrast, inter-branch correlations are computed by the cross-branch attention mechanism to adaptively highlight the feature dimensions most pertinent to the current defect, thus strengthening the model’s classification robustness and generalization performance under complex conditions.
In addition, we conducted further ablation experiments on the time-frequency branch to evaluate the impact of different fusion methods on model performance. We tested four strategies: simple concatenation (Concat), linear fusion (Linear), cross-attention (CA), and the Manifold Mixup method proposed in this paper. The experimental results are shown in
Table 5. The results show that Concat and Linear achieved accuracy rates of 97.80% and 98.10%, respectively, on the GFRP dataset, and 97.25% and 97.50% on the QFRP dataset. This indicates that even with the most basic static fusion method, there remains a certain degree of complementarity between temporal and frequency domain features. In contrast, CA achieved lower accuracy (97.70% for GFRP and 97.12% for QFRP), while the Manifold Mixup method proposed in this paper achieved the best results on both datasets. The reason for these performance differences lies in the fact that the time-domain and frequency-domain features of terahertz signals differ in their physical properties and statistical structures; they are not naturally aligned. Concat and Linear methods only perform static combinations and cannot address distribution biases between cross-domain features, making it difficult to achieve higher fusion quality. Although CA possesses dynamic alignment capabilities, it requires learning complex matching mappings for cross-domain features and introduces additional parameters. With limited data scale, this can easily lead to unstable attention distributions and further exacerbate the risk of overfitting. In contrast, Manifold Mixup performs continuous interpolation of cross-domain features in the latent space, making the fusion process more consistent in terms of distribution and thereby effectively mitigating the instability caused by cross-domain differences.
The effectiveness of the proposed model was quantitatively demonstrated via ablation experiments: removing any individual branch caused a clear drop in performance, thereby verifying both the utility and the complementary roles of the three-branch architecture. In addition, the CSFANet module strengthens feature discriminability while the cross-branch attention mechanism enhances feature fusion, with both contributing distinct and complementary performance gains. Furthermore, a comparative analysis of fusion strategies indicates that Manifold Mixup offers a more stable and generalizable method for fusing time-frequency features.
4.3. Impact of Frequency-Band Weight Initialization
To evaluate the impact of different band weight initialization methods on model training and performance, this study compared two strategies: non-uniform initialization based on prior knowledge and standard uniform initialization. As shown in
Figure 23, there are significant differences in the convergence curves of the loss function and accuracy between the two strategies; the dots in the figure mark the time points at which each strategy entered the stable convergence phase.
The results show that prior initialization provides a more effective optimization direction in the early stages of training, allowing the model to reach a stable plateau after approximately 42 epochs; in contrast, under uniform initialization, the loss and accuracy curves do not begin to level off until around the 63rd epoch. Although there is a difference in convergence speed between the two, in the later stages of training, both initialization methods converge stably to nearly identical performance levels, with virtually the same final classification accuracy. Since band weights are learnable parameters, they are automatically adjusted during backpropagation to the ratio that best matches the task; even in the absence of prior differences, the model can gradually learn a band importance distribution similar to that of the prior initialization. Therefore, the advantage of prior-based initialization lies primarily in accelerating early convergence, rather than determining the final performance.
In summary, the proposed band-weighted strategy exhibits good robustness during initialization. Both the prior-based and uniform initialization methods ultimately converge to comparable classification performance, further demonstrating the stability and reliability of this method in practical applications.
4.4. Computational Efficiency Analysis
To comprehensively evaluate the computational overhead of TFFN, we conducted standardized inference performance tests on an NVIDIA GeForce RTX 5070 GPU (batch size = 1) and compared the results with baseline models. To ensure fairness, the input dimensions of all baseline models were standardized to 1 × 32 × 32 or THz time-domain sequences of length 1300, thereby avoiding any bias in inference efficiency caused by input dimensions. A quantitative evaluation of the model’s overall operational efficiency was conducted in this study, measuring three key metrics: the number of trainable parameters, the average inference time per THz signal, and the resulting frames per second (FPS). These metrics were selected to comprehensively reflect the model’s performance in terms of computational overhead and real-time capabilities.
As shown in
Table 6, the experimental results reveal significant differences among the models in terms of the number of parameters, inference efficiency, and classification performance. TFFN has 9.887 million parameters. Although this is higher than DACNN (0.097 million), AlexNet (2.47 million), and ResNet 34 (0.46 million), it still falls within the category of lightweight networks, and the model size remains within deployable limits. Meanwhile, TFFN’s average inference time is 9.30 ms, corresponding to a frame rate of 107.5 FPS. Although its architecture includes three feature extraction branches, a cross-domain fusion module, and a sequence modeling unit—resulting in higher overall computational complexity than other lightweight models—its inference speed remains above 100 frames per second, meeting the real-time requirements of composite material non-destructive testing tasks.
In terms of inference efficiency, AlexNet is the fastest (3.17 ms, 316.0 FPS), but its average classification accuracy is only 96.14%, significantly lower than that of TFFN; ResNet-34 has an inference time of 5.750 ms (173.9 FPS), with an average accuracy of 96.26%, which is also lower than TFFN. MCLDNN and Transformer have inference times of 6.20 ms and 7.10 ms, respectively, with frame rates slightly higher than TFFN; however, their average accuracy remains in the range of 95.86%–96.31%, making it difficult to effectively distinguish between multiple defect classes with subtle differences.
A comprehensive comparison reveals that while TFFN is slightly slower in inference speed than some shallow or single-branch models, its average accuracy across the two datasets reaches 98.52%, significantly outperforming all other baseline models. Through three-branch heterogeneous feature extraction, cross-domain interaction and fusion, and temporal modeling mechanisms, TFFN is able to more fully capture the complementary information of THz TDS signals in the time, frequency, and time-frequency domains, thereby significantly improving classification robustness and recognition accuracy. Consequently, while achieving millisecond-level inference speeds, TFFN demonstrates a more pronounced advantage in accuracy, successfully balancing precision with real-time performance, making it more suitable for engineering-oriented non-destructive testing scenarios.
4.5. Analysis of the Characteristics of Complex Permittivity Variations and Detection Sensitivity
The propagation behavior of terahertz waves in composite materials is directly determined by their complex permittivity:
where
primarily influences the propagation speed of electromagnetic waves, phase delay, and interface reflection characteristics, while
reflects absorption loss and spectral energy attenuation. Since laminated composites consist of a multiphase structure comprising fibers, resin, adhesive layers, and honeycomb cores, intrinsic dispersion and significant spatial inhomogeneities coexist within them. When delamination defects form, air gaps (
) replace the resin layers (
2.5–3.0), leading to a decrease in local polarization capacity, changes in the interface structure, and an increase in scattering paths. Consequently, the equivalent complex permittivity undergoes systematic shifts in its real and imaginary parts, as well as in its frequency-dependent behavior. Due to the structural complexity, this study employs the equivalent complex permittivity
as the analytical parameter to explain the differences in terahertz response between the normal and defect regions. To visually demonstrate this effect at the signal level, we analyze QFRP signals; the results are shown in
Figure 24.
In the real part, the defect region exhibits a lower effective permittivity due to the increased proportion of air, causing the overall value of
to decrease. At the same time, since air exhibits almost no dispersion, the slope of
as a function of frequency in the defect region is weaker than that in the normal region. This effect can be directly observed in the time-domain waveform shown in
Figure 24a: the peak position of the main pulse in the defect region shifts forward, and the shapes of the rising and falling edges are broadened, reflecting the group delay difference caused by the change in propagation phase velocity. Furthermore, the echo intensity and arrival time in the defect region differ from those in the normal region, further indicating that the phase accumulation difference caused by the change in refractive index is amplified during multi-interface propagation. The phase difference spectrum in
Figure 24d provides more direct evidence: the defect region exhibits a continuous phase shift relative to the normal region across the entire frequency band, with a larger shift rate in the mid- and upper-mid-frequency bands. This indicates that the dispersion laws of the phase velocity in the two regions are markedly different, representing a typical macroscopic manifestation of the variation in
.
In terms of the imaginary part, the layered defect simultaneously reduces intrinsic loss, enhances interfacial loss, and alters the scattering path, resulting in distinct directional and frequency-band characteristics in the variation of
. The amplitude spectrum in
Figure 24b shows that the low-frequency region of the defect is nearly identical to that of the normal region, indicating that their effective losses are close at low frequencies, i.e.,
. In the mid-frequency range, however, the amplitude spectrum begins to exhibit an overall shift and spectral distortion. This reflects the fact that the original polarization loss peak of the resin matrix is attenuated by the air gap, while the newly introduced interfacial reflection causes energy to be amplified at certain frequencies and attenuated at others. The amplitude ratio plot in
Figure 24c reveals this frequency band structure more clearly: In the mid-frequency range, an alternating pattern of “positive peak–negative trough–positive peak” appears, indicating that
exhibits a dispersion perturbation in this frequency region characterized by “first greater than normal, then less than normal, and then greater than normal,” reflecting the redistribution of the equivalent imaginary part under the combined effects of weakened absorption loss and enhanced interface reflection. In the high-frequency band, the defect spectrum decays more rapidly, with an amplitude-to-normal ratio significantly below one, indicating enhanced scattering loss that causes high-frequency energy to decay more quickly, corresponding to
. This three-segment pattern of “low-frequency approximation, mid-frequency fluctuation, and high-frequency enhanced attenuation” directly reflects the direction and magnitude of the change in the imaginary part of the complex permittivity.
In summary, layered defects cause an overall decrease in the real part of the equivalent permittivity and frequency-band-dependent perturbations in the imaginary part while altering the frequency-dependent behavior of both components. This results in a series of distinguishable differences in amplitude, phase, and energy structures of terahertz signals in the time domain, frequency domain, and local time-frequency domain. The features in
Figure 24 collectively constitute the intuitive manifestation of the aforementioned changes in the complex permittivity at the macroscopic signal level. Based on this, the TFFN proposed in this study can simultaneously capture phase perturbations, spectral distortion, and energy migration—multiscale information resulting from variations in
and
—within a joint time-frequency neighborhood. Furthermore, the model’s attention automatically focuses on the mid-frequency sensitive regions where differences are most pronounced in
Figure 24, indicating that the key features extracted by the network align with the variation patterns of the dielectric constants. Ultimately, accuracy rates of 98.40% and 98.63% were achieved on the GFRP and QFRP datasets, respectively, demonstrating that the model maintains a highly sensitive response to complex permittivity perturbations, thereby exhibiting excellent cross-material applicability and overall detection sensitivity.
4.6. Noise Robustness Evaluation
THz time-domain signals acquired in the laboratory typically exhibit a high signal-to-noise ratio and minimal external interference. However, in practical engineering applications, the detection environment is often affected by factors such as equipment vibration, electromagnetic noise, structural coupling, and fluctuations in ambient temperature, causing the signals to contain varying degrees of random noise. Therefore, it is necessary to construct test data under different noise conditions based on the normal samples to evaluate the model’s stability and adaptability in non-ideal environments. To this end, we generate three test sets with typical noise intensities by adding zero-mean Gaussian white noise to the original signals, without altering the structure or labels of the original samples. The signal-to-noise ratios are set to 20 dB, 10 dB, and 0 dB, respectively. Signals under different noise levels are input into each model to compare how their classification accuracy and F1 scores change as noise increases, thereby analyzing the models’ noise robustness. The results are shown in the
Table 7.
As can be seen from the table, the performance of all models declines as noise intensity increases, though the rate of decline varies. Traditional CNN architectures (AlexNet, DACNN) exhibit a more pronounced drop in accuracy when noise is increased. Transformer models perform relatively stably under mild and moderate noise but exhibit some degradation under strong noise (0 dB). In contrast, TFFN maintains the highest accuracy and F1 scores under all noise conditions, retaining an accuracy of over 97% even under strong noise (0 dB), significantly outperforming other models. The results show that TFFN maintains stable classification performance across different noise levels, demonstrating strong adaptability to varying environmental conditions. To evaluate the effectiveness of the method described in this paper in real-world scenarios, the following section presents performance tests conducted on multi-layer composite structures.
4.7. Robustness Analysis Under Equivalent Multi-Layer Conditions
In practical aerospace and industrial applications, composite components often consist of 20 or even 30 layers or more. However, it is challenging to fabricate ultra-multilayer composite specimens with internal defects. To comprehensively evaluate the performance of the TFFN model under conditions such as increased layer count and environmental interference, we employ a stress testing method based on a physical degradation model, which is widely recognized in the field of NDT. The input for this method is derived from the GFRP and QFRP composite material datasets constructed in this paper, with an SNR of 40 dB in the raw signals. We use real signals that the model has not previously encountered as input and construct an ultra-multilayer degradation test set by applying mathematical operators that follow the physical laws governing terahertz wave propagation in thick media. Subsequently, these degraded signals are directly fed into the pre-trained TFFN model to evaluate its performance under equivalent ultra-multi-layer conditions.
4.7.1. Construction of the Terahertz Signal Degradation Model
To ensure the validity and accuracy of the simulated signals, the degradation model developed in this study strictly adheres to the three core physical mechanisms governing the propagation of terahertz waves in multilayer media, and the key parameters are rigorously calibrated against reference standards:
The attenuation of terahertz waves in multilayer GFRP/QFRP does not scale uniformly across the entire frequency range. According to the theory of electromagnetic wave propagation, the resin matrix produces a fixed background absorption, while the interwoven fiber network within the composite causes intense Rayleigh scattering. Therefore, we apply a frequency-dependent attenuation filter to the spectrum
of the input test signal
:
where
is the simulated equivalent thickness. In this study, the background absorption coefficient is strictly calibrated as
, and the coefficient representing the severe loss due to high-frequency Rayleigh scattering is calibrated as
mm
−1/THz
2. This physical process causes the signal energy to decrease exponentially and results in severe high-frequency cutoff.
As the number of layers increases, minute impedance mismatches between layers cause the forward-propagating waveform to be superimposed with countless faint reflections. This study introduces a physically meaningful delayed reflection term in the time domain:
where
represents the interlayer weak reflection coefficient and
represents the time-of-flight delay corresponding to the interlayer thickness. Physically, this term reproduces the typical waveform distortion and tail observed at the trailing edge of the main pulse in ultramultilayer materials.
In real-world multi-layer non-destructive testing, the energy of defect echoes at deep interfaces typically decays rapidly, causing the SNR of the final received signal to approach or even fall below 0 dB. To address this, Gaussian white noise
is injected into the signal to simulate real-world industrial conditions.
where
is the degraded observed signal,
is the valid echo signal obtained by considering only multiple reflections, and
denotes zero-mean Gaussian white noise;
and
denote the average powers of the valid signal and noise, respectively, within a given sampling window.
4.7.2. Model Validation
To verify whether the constructed degradation model accurately reflects the propagation degradation mechanism of terahertz waves in ultra-multilayer composite materials, we constructs equivalent 10-, 20-, and 25-layer degradation signals based on the raw terahertz signal from a specific measurement point on the QFRP and conduct a comparative analysis of the signal evolution characteristics in both the time and frequency domains. In this degradation model, the layer number denotes an equivalent propagation thickness rather than the actual ply thickness. Each equivalent layer is set to 0.2 mm to conveniently model cumulative attenuation. Thus, 15, 20, and 25 layers correspond to equivalent propagation distances of 3.0 mm, 4.0 mm, and 5.0 mm, respectively.
Figure 25 demonstrates spectral shape, and time-domain pulse morphology.
Figure 25a compares the normalized original shallow-penetration terahertz echo with the degraded signal under conditions equivalent to 20 layers and 0 dB. In the original signal, the main pulse exhibits a high-amplitude, well-defined peak with low background noise; in contrast, in the degraded signal, the amplitude of the main peak is significantly reduced, and the waveform is largely overwhelmed by high-level noise. According to model calculations, the amplitude of the main peak in the 20-layer equivalent signal is attenuated to approximately 15.1% of that in the original shallow-layer signal. This phenomenon is consistent with empirical observations in the non-destructive testing of ultra-multilayer composites. In terms of time-domain morphology, the loss of high-frequency components not only causes a decrease in amplitude but also alters the pulse width and the steepness of the edges. To eliminate the influence of amplitude differences, the original signal and the equivalent 20-layer degraded signal were normalized separately and compared by aligning their main peaks. The results are shown in
Figure 25b. The degraded main pulse exhibits typical time-domain broadening compared with the original main pulse. The mid-to-high-frequency components in the original signal help maintain the sharp edges of the time-domain waveform, but these are attenuated during propagation through the ultra-multilayer coating, resulting in the broadening of the main pulse along the time axis.
Figure 25c shows the amplitude envelope of the degraded signal for the original signal and for equivalent layer counts of 15, 20, and 25. Within the main energy distribution band, the spectral amplitude decreases as the equivalent layer count increases, indicating that the degradation model introduces energy attenuation that accumulates with thickness. As the number of layers increases, the high-frequency end of the spectrum is gradually suppressed and approaches noise. This characteristic is consistent with the Beer–Lambert-type transmission law of terahertz waves in thick composite materials and the physical mechanism whereby high-frequency components are more easily absorbed and scattered. To provide a more intuitive comparison of changes in spectral shape,
Figure 25d shows a normalized superposition of the original signal and the spectrum of a signal equivalent to a 20-layer, 0 dB degraded signal. It can be seen that within the frequency bands where the main energy is concentrated, the amplitude of the degraded spectrum is lower than that of the original spectrum, and the spectral components on the high-frequency side are more easily attenuated. Beyond approximately 1.0 THz, the amplitudes of both curves approach zero; however, due to noise and normalized perturbations, the effective bandwidth of the degraded signal narrows significantly, which is consistent with physical reality.
4.7.3. Experimental Results and Analysis
To generate terahertz time-domain waveform data with varying degrees of degradation, we map the original test set to three typical degradation scenarios by adjusting the equivalent propagation thickness and injecting noise. Specifically, condition a (
) corresponds to an equivalent propagation thickness of 15 layers and a SNR of approximately 20 dB; condition b (
) corresponds to an equivalent coverage thickness of 20 layers and a SNR of approximately 10 dB; condition c (
) corresponds to an equivalent coverage thickness of 25 layers and a SNR of approximately 0 dB. These conditions are used to simulate varying degrees of signal attenuation and noise interference. The results are shown in
Table 8.
As the equivalent coverage thickness increased from 15 layers to 25 layers and the SNR decreased from approximately 20 dB to approximately 0 dB, both the and F1 scores of all models showed a downward trend, with the performance decline becoming more pronounced as the degradation worsened. This trend holds consistent across both the GFRP and QFRP datasets, indicating that the constructed degradation models can reasonably reflect physical laws such as the energy attenuation of terahertz waves in multilayer media. Although overall performance declined as degradation intensified, TFFN consistently outperformed the comparison models under all conditions and demonstrated greater robustness: under the most severe condition c, it still maintained accuracy rates of 97.32% and 98.13%, whereas the comparison methods experienced a significant drop, indicating that conventional architectures are susceptible to feature blurring and noise interference under strongly degraded THz waveforms. Under operating , TFFN still maintains relatively stable recognition performance, which is attributed to its internal feature modeling approach. On the one hand, traditional networks rely heavily on the absolute amplitude and sharp edges of time-domain waveforms; however, these features are degraded during propagation due to amplitude attenuation, pulse broadening, and significant loss of high-frequency components. In contrast, TFFN uses CSFANet to adaptively focus on the mid-frequency energy structure, which is less affected by thickness-induced attenuation, and employs a Transformer to extract more stable spatiotemporal joint features. On the other hand, the cross-branch attention mechanism within the network automatically adjusts branch weights under low signal-to-noise ratio conditions, thereby suppressing background noise at the feature level and highlighting structural information related to defects. Overall, this design—which simultaneously utilizes multi-view features and adaptively adjusts branch contributions—enables TFFN to demonstrate superior robustness against deep-layer degradation effects such as pulse broadening, amplitude attenuation, and high-frequency component loss. Consequently, it maintains a high recognition accuracy close to that under normal operating conditions even under operating condition c.
4.8. Frequency Domain Attention Visualization and Analysis
To further verify that the CSFANet module can indeed focus on damage-sensitive frequency bands during frequency-domain feature extraction, a visual analysis of frequency-domain attention comprising spectral attention distributions and subband attention matrices is supplemented. Based on the frequency-domain energy distribution characteristics of terahertz time-domain signals, the 0–2.5 THz band is divided into three sub-bands: the low-frequency band (0–0.3 THz), the mid-frequency band (0.3–1.0 THz), and the high-frequency band (1.0–2.5 THz). Among these, the mid-frequency band is the key frequency range most strongly correlated with layered defects. To further refine the attention distribution within this critical frequency range, this study builds upon the aforementioned three-segment division by subdividing the 0–2.5 THz band into seven sub-bands, thereby enabling a clearer representation of the network’s differentiated attention patterns within the mid- frequency band.
Representative samples were selected from each class in the GFRP and QFRP datasets, and the frequency-domain attention heatmap is shown in
Figure 26. The results show that CSFANet’s high attention is primarily concentrated in the mid-frequency band of the three-band division. The 0.4–0.7 THz band is the core attention peak, which naturally extends to the 0.7–1.0 THz band. This attention distribution indicates that CSFANet can actively focus on the physically sensitive frequency bands associated with the mechanisms of layered defects. Specifically, the model maintains a consistently high level of attention in the 0.1–1.0 THz range, which covers the characteristic spectral responses of layered interfaces caused by effects such as dielectric constant discontinuities and enhanced interfacial reflection. Furthermore, the attention peak formed by the network near 0.4–0.7 THz aligns with the enhanced absorption and dispersion changes commonly observed in GFRP/QFRP materials within this frequency range, reflecting the model’s ability to actively identify key frequency information that is more sensitive to layered damage. This result indicates that CSFANet does not focus on random noise points or training biases, but rather on frequency bands associated with the material’s physical mechanisms, thereby validating the model’s reliability and interpretability in frequency-domain feature extraction.
Furthermore, the frequency-domain attention matrix based on a division into seven fixed sub-bands is shown in
Figure 27. The results show that the highest attention distribution for all defect categories is concentrated in the three sub-bands of 0.3–0.9 THz, further validating the critical role of the mid-frequency band in hierarchical defect detection.
The distribution patterns described above clearly demonstrate that the key frequency bands targeted by the model are not generated randomly, but rather have a clear physical correspondence with changes in dielectric properties caused by delamination defects in composite materials. In summary, by employing a dual perspective of spectral attention heatmaps and subband energy matrices, we demonstrate that the CSFANet module can adaptively focus on damage-sensitive frequency bands with clear physical significance, thereby significantly enhancing the interpretability and reliability of frequency-domain feature extraction. Compared with the defect category, the attention distribution of normal samples is more dispersed across all frequency bands, whereas defect samples exhibit a slightly more concentrated response in the mid-frequency range, indicating that the presence of interlayer discontinuities introduces relatively consistent spectral perturbations in this region.
5. Conclusions
A composite material delamination defect identification network is proposed in this paper, with its core methodology based on three-branch feature extraction and fusion. By integrating local time-frequency, frequency domain, and time-frequency-domain features, the limitations of single-domain representations are overcome. In the local time-frequency branch, a lightweight ResNet is employed to extract pulse-shape features. In the frequency branch, the CSFANet module is introduced on top of the temporal features, where deformable convolution and channel-spatial-frequency band attention are utilized to achieve dynamic frequency band decomposition and feature enhancement. The time-frequency branch is equipped with Manifold Mixup to enable deep cross-domain feature fusion. Furthermore, a cross-branch attention fusion mechanism is designed to dynamically weight features from different branches, thereby promoting information complementarity and enhancing discriminability among heterogeneous features. The superior performance of the proposed method is demonstrated through comparative experiments on two self-built datasets, while the effectiveness of each module is further verified by ablation studies. Therefore, an accurate and robust intelligent detection approach for composite delamination defects is provided by the presented TFFN, which exhibits strong practical value and application potential.
Although the proposed method has demonstrated high accuracy under controlled laboratory conditions, several practical factors must still be considered in real-world industrial applications. Surface moisture strongly absorbs and scatters terahertz waves, thereby altering local dielectric properties and reducing the effective signal-to-noise ratio; in such cases, the model may need to be retrained using moisture-calibrated data or undergo domain adaptation. Furthermore, high-speed industrial scanning typically employs larger step sizes and fewer averaging passes, which further reduces the signal-to-noise ratio and spatial resolution. For online inspection, the proposed framework can be combined with terahertz sources operating at higher repetition rates and sparse/linear array scanning, thereby maintaining reliable defect characterization capabilities under high-throughput conditions. We will investigate this further in future work.