1. Introduction
Steel structures are extensively employed in aerospace, heavy machinery, oil pipelines, and automotive manufacturing owing to their high strength and seismic resilience [
1,
2]. These systems often operate under continuous loading, during which fatigue, thermal cycling, and external impacts induce stress accumulation in critical components. Such stress accelerates the degradation of material properties and heightens the risk of structural failure [
3]. Accurate stress assessment is therefore vital for maintaining structural integrity.
Common nondestructive testing (NDT) methods for stress evaluation include strain gauge techniques [
4], X-ray diffraction [
5], magnetic methods [
6], and ultrasonic techniques [
7]. Strain gauges are simple and cost-effective but measure only surface stress and are susceptible to environmental interference, limiting their long-term reliability. X-ray diffraction provides high accuracy but requires complex, expensive instrumentation and strict operational conditions. Magnetic methods apply only to ferromagnetic materials. Advancements in acoustic and ultrasonic technologies have significantly driven progress across various complex engineering domains [
8,
9,
10]. By contrast, ultrasonic techniques enable nondestructive and accurate measurement of both surface and internal stress. Its lightweight, low-cost instrumentation and broad material applicability make them particularly promising approach for stress evaluation [
11].
Longitudinal critically refracted (LCR) waves exhibit well-defined propagation paths and high sensitivity to stress variations, making them a preferred wave mode for stress analysis. These advantages have attracted significant interest from researchers. Li et al. [
12] proposed a stress evaluation method for steel components based on a cross-correlation algorithm applied to LCR waves. This method surpasses traditional peak-based methods in accuracy and eliminates the need for signal filtering, presenting an effective solution for practical stress measurement. Lu et al. [
13] developed a stress characterization model based on LCR waves and achieved the measurement of the depth distribution of residual stress within aluminum alloys. Zhao et al. [
14] employed LCR waves to measure stress in steel structures, demonstrating the feasibility of rapid, non-destructive detection of initial internal stress. Their study also revealed a linear correlation between stress magnitude and time of flight. Li et al. [
15] proposed a one-transmit, two-receive transducer array technique incorporating temperature–stress coupling for absolute stress measurement in steel components. The method achieved measurement errors within 10 MPa. Ma et al. [
16] investigated complex residual stress distributions in metal plates using LCR waves based on acoustoelastic effect, demonstrating that LCR waves are feasible for stress measurement. Liu et al. [
17] developed an ultrasonic absolute stress measurement system integrating an optimized wavelet transform with a cross-correlation algorithm. The system achieved an accuracy error less than 23 MPa and a stability error of 4.94 MPa in stress measurement of steel components.
The above-mentioned research has demonstrated the feasibility of nondestructive stress measurement using LCR waves. However, existing methods often depend on physical models developed under idealized conditions and require manual feature extraction along with extensive data preprocessing. These limitations reduce detection efficiency and hinder the deployment of rapid, real-time stress measurement in engineering environments.
Deep learning (DL) algorithms provide effective solutions for accurate stress measurement using ultrasonic by overcoming key limitations of traditional ultrasonic methods, including limited adaptability to complex conditions, reliance on manual intervention, and low detection efficiency. Pradhan et al. [
18] proposed a model that integrates machine learning with ultrasonic to measure surface residual stress during the rolling process of lightweight alloys. The model achieved high determination coefficients (R
2) of 0.973 for compressive stress and 0.926 for tensile stress, with relatively low root mean square errors. Lim et al. [
19] integrated Lamb waves with convolutional neural networks to directly predict stress from ultrasonic signals, eliminating the need for acoustic feature extraction in traditional methods. The approach exhibited strong robustness under both static and dynamic loading conditions up to 5 Hz, validating the effectiveness of the model. Park et al. [
20] applied machine learning to ultrasonic amplitude-scan signals propagated through the aluminum alloy to nondestructively estimate a full-range stress–strain curve, establishing a quantitative relationship between ultrasonic characteristics and mechanical performance. Deng et al. [
21] proposed an absolute stress identification method that integrates a one-dimensional convolutional neural network (1D-CNN) with ultrasonic shear waves. The method achieved an average relative error of 3.83% and demonstrated strong adaptability to steel components of varying thicknesses.
These studies demonstrate that DL-based stress measurement methods can directly map ultrasonic signals to stress values, reducing dependence on manual acoustic feature extraction and eliminating the need for physical modeling in ultrasonic stress measurement. However, existing DL methods for stress measurement are constrained by limited accuracy and poor generalizability. Moreover, few studies have focused on LCR waves, and the advantages of their clear propagation paths and high sensitivity to stress variations have not been fully exploited.
To address this gap, this study progressed as follows. A nondestructive testing experimental platform utilizing LCR waves was established to acquire ultrasonic signals under varying stress conditions. An end-to-end DL model was then developed by integrating 1D-CNN, gated recurrent units (GRU), and an attention mechanism to directly map ultrasonic signals to stress values. Furthermore, data augmentation techniques were employed to expand both the size and diversity of the dataset. Based on this framework, a high-precision stress measurement model for LCR waves was developed. Generalization experiments confirmed the model’s strong robustness and stability under varying conditions.
4. Comparison and Analysis of Performance Among Different Models
A series of comparative experiments were conducted to assess the effectiveness of the proposed model and identify the optimal model for ultrasonic stress measurement. The evaluation included CNN of varying depths, alongside hybrid architectures combining recurrent neural networks and attention mechanisms. All models were trained on the same dataset under identical training parameters. An early stopping criterion based on validation performance was applied, automatically terminating training if no improvement was observed for 150 consecutive epochs. This approach effectively prevented overfitting while ensuring optimal performance for each model.
Model performance was assessed using three standard regression metrics: mean absolute error (MAE), root mean square error (RMSE), and the coefficient of determination (R2). MAE represents the average absolute difference between predicted and true values, reflecting overall prediction accuracy. RMSE measures the degree of fluctuation in prediction errors and is more sensitive to abnormal predictions. R2 is used to assess the model’s goodness of fit, with values closer to 1 indicating a stronger ability of the model to capture the relationship between ultrasonic signals and stress.
The performance of each model is summarized in
Table 1. To systematically identify the contribution of each architectural component including network depth, receptive field size, temporal modeling strategies, and attention mechanisms, a step-by-step ablation analysis was conducted based on the progressive optimization of the model structure. Initially, the impact of the convolutional architecture was evaluated. The baseline 3-layer CNN achieved a mean absolute error of 10.34 MPa. Increasing the depth to a 5-layer CNN reduced the MAE to 8.24 MPa confirming that deeper networks are more effective at extracting abstract hierarchical features. Subsequently, expanding the convolutional kernel size in the 5-layer CNN with larger receptive field further decreased the MAE to 5.14 MPa. This significant improvement indicates that a wider receptive field is essential for capturing the global structural characteristics of the LCR waveform rather than relying solely on local peaks.
To capture the sequential evolution of ultrasonic signals, we further investigated different temporal modeling strategies. The CNN combined with BiLSTM and attention architecture achieved an MAE of 2.56 MPa. While effective, the bidirectional complexity increased computational cost without surpassing the simpler GRU. Similarly, the model integrating LSTM and the transformer achieved an MAE of 2.98 MPa likely because the self-attention mechanism of the transformer showed limited advantage in modeling the short-term dependencies of ultrasonic echoes given the dataset scale. Most notably, the integration of the Gated Recurrent Unit yielded the best performance with the 5-layer CNN and GRU model achieving an MAE of 1.42 MPa. This demonstrates that the GRU provides the optimal balance between capturing temporal dependencies and model simplicity for this specific task.
Finally, the proposed 5-layer CNN combined with GRU and attention model yielded an MAE of 1.94 MPa. Although this global error is slightly higher than the pure GRU model on the standard test set, the attention mechanism was explicitly retained to enhance the generalization capability of the model on unseen data. As detailed in the generalization tests in
Section 5.2, the pure GRU model tends to overfit to the specific stress points in the training set. In contrast, the attention module effectively highlights intrinsic stress-sensitive regions, preventing performance degradation when predicting unseen stress states such as 35 MPa and 170 MPa, thereby ensuring the robustness of the model in practical applications.
In summary, the ablation analysis confirms that the hierarchical integration of deep convolutional features, GRU-based temporal modeling, and attention mechanisms is essential for accurate ultrasonic stress measurement. The progressive optimization of the network architecture not only improves the feature extraction capability but also ensures reliable generalization performance beyond the training data distribution.
5. Analysis of Model Visualization Results
5.1. Model Performance Analysis
To further assess the generalization ability and predictive stability of the proposed model under previously unseen stress conditions, a stress value removal experiment was conducted. Samples with stress values of 35 MPa, 80 MPa, 125 MPa, and 170 MPa were randomly removed from the training set and used exclusively for testing. The trained model’s performance was comprehensively assessed using scatter regression analysis, error distribution statistics, and residual cloud analysis.
Figure 14 shows the evolution of the loss function during training and validation. The loss decreases sharply in the initial epochs and converges rapidly at approximately 200 epochs. Thereafter, both curves gradually plateau and stabilize around 600 epochs. The close alignment of the training and validation loss trends, without significant oscillations or divergence, indicates the absence of overfitting. These results validate the effectiveness of the training strategy and confirm the soundness of the model optimization process.
Figure 15 shows the scatter regression plot of predicted versus actual stress values on the test set. The red ideal line denotes the theoretical perfect prediction where the predicted stress is exactly equal to the actual stress, y = x, while the black dashed line represents the regression fit of the predicted values. Blue data points cluster tightly along the ideal line, with the regression line nearly coinciding with it, demonstrating strong linearity across the 0–200 MPa stress range. No fitting imbalance is observed in either the low-stress or high-stress intervals, and no systematic offset or slope deviation is apparent. Further analysis shows that the 95% prediction interval, representing the confidence range of the model’s outputs, encompasses the vast majority of data points. The interval exhibits uniform width and smooth variation across the entire data range, without noticeable local fluctuations. These results indicate that the proposed model not only achieves high fitting accuracy but also maintains stable uncertainty control, ensuring precise and reliable stress predictions.
Figure 16 shows the histogram of prediction errors on the test set, illustrating the overall error distribution and concentration. The distribution closely follows a normal pattern, with its primary peak centered near zero, consistent with the ideal error pattern of a regression model. Notably, 90% of the data exhibit errors within ±5 MPa. Among 416 samples, 236 have absolute errors below 2.5 MPa, accounting for 56.7% of the total, indicating that most predictions are highly accurate with minimal deviations. Further analysis shows that the error distribution exhibits high kurtosis and near-zero skewness, suggesting a concentrated and symmetric pattern with a very low proportion of outliers. This implies that the model’s prediction errors primarily originate from inherent modeling uncertainty and environmental noise, rather than from systematic structural deficiencies or biases induced by overfitting. Additionally, the proportion of samples with errors exceeding ±10 MPa is minimal, confirming the model’s strong robustness in suppressing outliers and maintaining stable predictive accuracy.
Figure 17 shows the two-dimensional distribution of the model’s predicted residuals with respect to actual stress levels. The residuals are predominantly concentrated within ±2.5 MPa and are uniformly distributed across the entire stress value range, showing no evident bias. The model maintains consistent prediction accuracy across low, medium, and high stress levels, with no error amplification at high stress or instability at low stress, confirming the model’s stability and reliability. The density distribution peaks near zero residual, indicating that most predicted values exhibit minimal deviations with well-controlled error fluctuations. A small number of residual points fall outside the ±5–10 MPa range. However, they are sparse, randomly scattered, and show no clustering or regional anomalies, exerting negligible influence on overall prediction stability.
Comprehensive analyses show that the 5-layer CNN+GRU+Attention model achieves excellent fitting ability, error control ability and predictive performance. Overall prediction accuracy satisfies the requirements of a high-precision regression model.
5.2. Generalization Ability Evaluation of the Model
This section systematically assesses the generalization ability of the proposed model by comparing the predictive performance of the 5-layer CNN+GRU model with and without the attention mechanism. The assessment focuses on four stress levels, 35 MPa, 80 MPa, 120 MPa, and 170 MPa, which were intentionally excluded from training sets. These data span the entire stress values range, enabling a comprehensive evaluation of the model’s generalization ability. For each stress level, 10 independent test samples formed a separate test set. This ensured that predictions were entirely independent of the training data. As shown in
Table 2, the models incorporating attention mechanisms yield average predictions that more closely match the actual stress values. Notably, at the boundary of the 35 MPa and 170 MPa stress intervals, the prediction errors of stress values decreased markedly from 7.53 MPa and 7.72 MPa for models without attention to 3.18 MPa and 3.62 MPa, representing a reduction of over 50%. This improvement arises not from differences in training data volume but from the attention mechanism’s capacity to enhance feature extraction in boundary regions. Consequently, the model effectively captures stress-related information even where ultrasonic signal variations are subtle, thereby overcoming a common limitation of deep learning models in predicting at the boundary regions of the training range.
Further analysis of
Figure 18 and
Figure 19 provides a more intuitive understanding of the prediction performance of the two model configurations across different stress levels. As shown in
Figure 18, the model without the attention mechanism produces relatively concentrated predictions with good linearity at mid-range stress levels. The red dashed line in the figure represents the ideal stress curve, which corresponds to y = x, indicating that the predicted values are equal to the true values. However, at 35 MPa and 170 MPa, the scatter points diverge, and several predictions deviate from the ideal stress curve, indicating unstable and highly fluctuating performance under boundary stress conditions. This may be because the amplitude of the ultrasonic signal changes little and is close to the noise level in the low stress conditions. At high stress levels, changes in material microstructure and acoustic propagation characteristics make it challenging for the model to effectively identify stress-dependent signal features. In contrast, as shown in
Figure 19, the attention-based model exhibits densely clustered prediction points across all stress levels. Particularly at the boundary regions, the predicted values align closely with the ideal curve and remain uniformly distributed without significant deviations. This demonstrates that the attention mechanism enhances the model’s adaptability under low signal-to-noise ratio and non-ideal conditions. Furthermore, the fitted blue curve closely follows the ideal curve, confirming that the model achieves high prediction accuracy and strong generalization ability.
Comprehensive analysis reveals that the attention-based model maintains consistently high prediction accuracy across both the central and the low and high stress regions of the training range. The average error remains below 3 MPa, satisfying the dual requirements of precision and robustness for industrial nondestructive testing. Conversely, while the model without attention achieves a slightly lower mean absolute error of 1.42 MPa on the standard test set, it exhibits larger errors exceeding 7.5 MPa and poorer fitting at low and high stress levels, limiting its applicability under complex stress distributions. From an engineering standpoint, steel structures in service are often subjected to highly variable stress levels. Traditional models tend to fail under these conditions, particularly during early low-load or extreme high-load phases. Sacrificing a minor 0.52 MPa of average accuracy to prevent severe prediction failures at low and high stress levels is a necessary and beneficial strategic balance. Ultimately, the attention mechanism exhibits superior prediction stability, effectively fulfilling the strict requirements of reliable stress measurement under diverse operating conditions.
5.3. Comparative Analysis with Conventional Acoustoelastic Theory
To comprehensively evaluate the advantages of the proposed deep learning model, its predictive performance was compared against the traditional analytical approach. To provide a rigorous benchmark for the experimental results in
Table 3, the conventional stress evaluation method is detailed here. This approach relies on the acoustoelastic effect where the absolute stress σ is calculated from the propagation time variation as follows
In this expression,
t0 denotes the propagation time in the reference stress free state while
t represents the time under a specific load and
K is the acoustoelastic coefficient calibrated for the 20# steel specimens. The time-of-flight variation is precisely determined using a cross-correlation algorithm to identify the temporal shift between the reference and target signals. The cross-correlation function
Rxy(
τ) is defined by the following integral
The optimal time delay is obtained by finding the lag τ that maximizes the similarity between the two signals. This conventional method of calculating stress through cross-correlation based time-of-flight measurement has been widely applied in the research of Li et al. [
12] and Liu et al. [
17]. These established methods serve as a rigorous baseline for the comparative analysis presented in this study.
While this analytical method provides a fundamental baseline, a critical limitation is evident in its practical application. To ensure a fair comparison, the conventional linear model was calibrated using the original unaugmented signals from the training dataset. As observed in
Figure 20, at each stress increment within this training set, five independent ultrasonic time-of-flight measurements were recorded and are represented by the blue dots. The red triangles denote the average value calculated from these five trials, which serves as the basis for the linear regression fit shown as the black solid line, corresponding to an acoustoelastic coefficient
K was 3.4498 ×
MPa
−1. The comparative predictions in
Table 3 were then generated entirely from an independent test set.
Prior to the conventional calculations, every individual signal was hardware-averaged 128 times and digitally filtered to suppress noise. It is evident that the average values indicated by the red triangles deviate noticeably from the linear fitted line, particularly in the low- and high-stress regions, highlighting the limitations of a simple linear calibration. Furthermore, the individual raw measurements shown as blue dots exhibit distinct stochastic fluctuations around the average trend. Despite these processing steps, the conventional method still exhibited significant errors on the independent test set. This indicates that traditional cross correlation methods relying on a rigid calibration coefficient are highly sensitive to microscopic variations in transducer coupling states and environmental noise. The rigid linear formula relies on the averaged trend and inherently treats these raw fluctuations as errors to be discarded. However, in single-shot field measurements where averaging is not feasible, these deviations lead to significant prediction errors. In contrast, the proposed deep learning model treats these raw signal variations not as noise but as high-dimensional features, implicitly compensating for the instability that the linear model fails to accommodate.
To quantify this precision gap,
Table 3 presents a detailed comparison of the stress measurement results between the two methods across the range of 15 to 195 MPa. The conventional method, constrained by the fixed acoustoelastic coefficient, exhibits significant instability at specific stress points where signal fluctuations occur. For instance, at 115 MPa, the conventional calculation deviates drastically with an error of −16.51 MPa, corresponding to a relative error of 14.36%. Similarly, in the high-stress region at 195 MPa, the conventional method produces a large positive error of 15.36 MPa, with a relative error of 7.88%. These substantial errors clearly demonstrate the insufficient accuracy of the rigid linear model for precise stress evaluation.
In sharp contrast, the proposed deep learning model maintains consistently high precision across the entire loading spectrum. At the same challenging stress levels of 115 MPa and 195 MPa, the model restricts the prediction errors to negligible values of −0.18 MPa and 1.67 MPa, respectively, corresponding to relative errors of only 0.16% and 0.86%. Overall, the conventional approach yields a mean absolute error of 9.88 MPa with an average relative error of 17.92%, whereas the proposed 1D-CNN-GRU-Attention model achieves a significantly lower MAE of 1.08 MPa and an average relative error of only 1.90%. This represents an error reduction of approximately 89%, while the relative error is reduced by approximately 89.4%.
The 1.08 MPa error of our proposed architecture also represents an approximate 89% improvement over the 10 MPa measurement error reported in the research of Li et al. [
12]. Furthermore, it delivers a 78% enhancement in accuracy compared to the 4.94 MPa error achieved in the study of Liu et al. [
17]. In summary, this innovative method achieves a substantial leap in accuracy within the field of ultrasonic stress measurement.
It is important to emphasize that the dataset used for model training and the prediction data presented in
Table 3 were derived from independent loading experiments. This strict separation ensures that the evaluation reflects the model’s true generalization capability rather than memorization of specific experimental instances. Regarding the scope of application, the specific model parameters established in this study are valid for 20# steel under room temperature conditions using LCR waves. However, the proposed deep learning framework, which integrates 1D-CNN, GRU, and attention mechanisms, is a general deep learning framework for ultrasonic stress detection. The architecture is not restricted to a single material or wave mode. For applications involving different materials, such as aluminum alloys and composites, or alternative ultrasonic modes including Shear waves and Lamb waves, the core network structure remains applicable. To extend the method to these new domains, the framework simply requires retraining with a corresponding dataset characteristic of the target material and environmental conditions.
6. Conclusions
To address the heavy reliance of ultrasonic stress measurement on manual calibration, this study proposes a stress measurement method based on deep neural networks. Based on LCR waves, an end-to-end stress measurement model was developed to directly map raw time series ultrasonic data to stress values, eliminating the need for traditional calibration curves. This approach markedly enhances automation and intelligence in stress measurement while improving adaptability under practical engineering conditions. Data acquisition was performed using a piezoelectric ultrasonic experimental platform coupled with a hydraulic loading unit. Ultrasonic signals were collected from 420 standard 20# steel specimens across a stress range of 0–200 MPa. Through data augmentation, the dataset was expanded to 4200 samples and partitioned into training, validation, and test sets at a ratio of 8:1:1. After multiple training iterations, the proposed 5-layer CNN+GRU+Attention model achieved an average absolute error of 1.94 MPa. For previously unseen stress values, prediction errors remained within 3 MPa. Overall, the model demonstrates high measurement accuracy, along with strong generalization and robustness. It provides an efficient and reliable technical solution for calibration-free ultrasonic stress measurement, with significant practical value and broad applicability for the safety assessment and structural health monitoring of steel structures.
Despite the high accuracy and promising generalization of the proposed model for piezoelectric ultrasonic stress measurement, several limitations remain. The experimental dataset primarily comprises 20# steel specimens, leaving the model’s adaptability to other materials and complex structures untested. While the current study demonstrates the effectiveness of the proposed model for standardized 20# steel components, the influence of material anisotropy in other materials remains an important factor. Future work will extend the proposed framework to anisotropic materials such as aluminum alloys and titanium alloys, further enhancing its capability for industrial applications. Specifically, future research will focus on enhancing the engineering applicability of the model through transfer learning techniques. By utilizing the current model as a pre-trained backbone, we aim to achieve rapid adaptation to different steel grades and complex acoustic environments with minimal additional data collection. Furthermore, the inherent black box nature of deep learning limits its physical interpretability. Therefore, future research will primarily investigate what specific stress related physical features the network has extracted.