3.1. P-Wave Arrival Picking Results of Different Bandwidth Coefficients
To investigate the influence of the KDE bandwidth coefficient on P-wave arrival picking performance, five bandwidth coefficients of
β = 0, 25, 100, 200, and 500 μs were selected to generate the arrival probability labels during training. As presented in
Figure 9, the bandwidth coefficient
β directly controls the smoothness of the probability distribution. A larger bandwidth produces a broad, smooth transition around the arrival, reflecting higher uncertainty in the predicted arrival time. Conversely, as the bandwidth decreases, the probability distribution becomes sharper and more focused on the true arrival time. When
β = 0, the arrival probability collapses to a strict binary distribution.
Figure 10 shows the convergence curves for model training and validation across the five bandwidth settings, with corresponding training and validation losses summarized in
Table 2. For the binary label case (
β = 0 μs), the training loss decreased rapidly from 8.64 at Epoch 1 to 0.01 at Epoch 100. The validation loss initially decreased and reached its minimum at approximately Epoch 17, triggering the early stopping criterion. To facilitate comparison with other bandwidth settings, training was continued to 100 epochs. After Epoch 17, the validation loss increased sharply, reaching 6.35 at Epoch 50 and 10.41 at Epoch 100, indicating overfitting and a substantial deterioration in model generalization performance.
For models with β > 0, the convergence behavior followed a different pattern. At the start of training (Epoch 1), models with higher bandwidths showed larger initial training losses. The training losses corresponding to the four bandwidth coefficients were approximately 5.84, 7.47, 8.54, and 8.62, respectively. As training progressed (Epochs 50 and 100), losses with smaller bandwidths (e.g., β = 25 μs) converged more quickly and achieved lower final loss values. At Epoch 100, the training losses for β = 25, 100, 200, and 500 μs were 4.64, 6.02, 6.74, and 7.77, respectively. Validation losses followed a similar pattern: initial losses were higher for larger bandwidths, but as training continued, smaller bandwidths led to a faster loss decrease and lower final validation loss. The validation losses at Epoch 1 for β = 25, 100, 200, and 500 μs were approximately 5.50, 6.32, 7.11, and 8.16, respectively. After 100 training epochs, the validation losses stabilized at approximately 5.06, 6.12, 6.76, and 7.78, respectively. These results indicate that, among the positive bandwidth settings, reduced bandwidth facilitates more effective and efficient model convergence, whereas excessively large bandwidths produce broader probability distributions and consequently higher training and validation losses.
To further investigate the influence of the bandwidth coefficient on model performance,
Figure 11 and
Figure 12 present the arrival picking results of Attention PhaseNet for two example AE waveforms in the test dataset. As shown in
Figure 11b–f, the absolute arrival picking errors of example 1# under the five bandwidth conditions were 1212, 11, 18, 32, and 91 μs, respectively. As shown in
Figure 12b–f, the absolute arrival picking errors of example 2# under the five bandwidth conditions were 1016, 5, 12, 16, and 103 μs, respectively. With large bandwidths (e.g., 500 μs), the predicted probability is smooth but deviates from the true arrival. As the bandwidth decreases from 500 μs to 100 μs, the width of the probability peak gradually decreases and aligns more closely with the true P-wave arrival. At
β = 25 μs, the predicted peak matches the true arrival almost exactly, yielding the highest accuracy.
Notably, when
β = 0, the predicted probability remains close to zero over almost the entire waveform. The maximum predicted probabilities of P-wave arrival for examples 1# and 2# are only 0.03 and 0.05, respectively. Under this condition, the model completely failed to reliably identify true seismic arrivals, which is consistent with the overfitting observed during training. This occurs because the label distribution is extremely sparse—only one point is labeled as ‘1’, with the rest as ‘0’. Under such an extremely imbalanced label distribution, the network minimizes loss by outputting probability sequences close to zero for the entire waveform, converging to a trivial solution [
41] without learning wave arrival features. These results further indicate that the
β = 0 setting is prone to overfitting, resulting in degraded generalization performance and reduced arrival picking accuracy on unseen waveforms.
Table 3 summarizes the picking results of the Attention PhaseNet under different bandwidth coefficients on the test dataset. The test dataset consisted of 884 AE waveforms. As discussed in
Section 3.1, the model trained with
β = 0 exhibited overfitting and poor generalization performance. Therefore, the
β = 0 case was excluded from the quantitative accuracy comparison, and only models trained with positive bandwidth coefficients were considered. Among the four positive bandwidth settings, the model trained with
β = 25 μs achieved the best overall performance, with an MAE of 17.41 μs and an RMSE of 43.37 μs. When the bandwidth coefficient increased to 100 μs, the MAE and RMSE increased to 26.03 μs and 47.52 μs, respectively. Compared with the model trained with
β = 100 μs, the MAE and RMSE of the model trained with
β = 25 μs decreased by 33.12% and 8.73%, respectively. Compared with the models trained with
β = 500 μs, the MAE and RMSE of the model trained with
β = 25 μs decreased by 63.24% and 33.81%, respectively.
Furthermore, the model with β = 25 μs achieved the highest hit rates under all error thresholds. The HR within the error thresholds of 5 to 25 μs were 31.90%, 52.83%, 65.38%, 75.57%, and 82.69%, respectively. Compared with the model trained with β = 100 μs, the HR under the five thresholds increased by 71.97%, 61.61%, 41.67%, 32.02%, and 22.85%, respectively. Compared with the model trained with β = 200 μs, the corresponding improvements were 84.56%, 62.88%, 48.95%, 36.70%, and 25.26%, respectively. The improvements were more significant when compared with the model trained with β = 500 μs, reaching 291.89%, 228.95%, 179.16%, 156.95%, and 129.12%, respectively. A small positive bandwidth (β = 25 μs) introduces limited temporal uncertainty around the manually picked arrival while avoiding excessive label smoothing, resulting in improved arrival picking performance. In contrast, excessively large bandwidth coefficients (e.g., β = 500 μs) produce overly smooth probability distributions and increase the uncertainty in P-wave arrival times, thereby reducing the temporal resolution of the predicted probability distribution and decreasing arrival picking accuracy.
In summary, the KDE bandwidth coefficient directly influences the probability distribution and the accuracy of arrival picking. When β = 0, the highly imbalanced label distribution promotes severe overfitting, resulting in poor generalization performance and unreliable arrival picking results on AE waveforms. In contrast, excessively large bandwidth coefficients significantly smooth the arrival probability distribution and increase the uncertainty of arrival picking. Compared with the other bandwidth conditions, the model trained with β = 25 μs achieves the optimal balance between label smoothness and picking accuracy. Therefore, β = 25 μs was selected as the optimal KDE bandwidth coefficient for subsequent analyses of model performance.
3.2. P-Wave Arrival Picking Results of Different Methods
To evaluate the picking performance of the proposed Attention PhaseNet model, comparative analyses were conducted against two representative traditional methods, namely the STA/LTA method and the AR-AIC method, as well as the classical deep learning model PhaseNet. P-wave arrival picking experiments were performed for all methods using the same test dataset. The MAE, RMSE, and HR were computed across different error thresholds to evaluate the picking accuracy, stability, and robustness of the methods.
The STA/LTA method is formulated as follows [
42]:
where
Thr denotes the STA/LTA ratio (dimensionless),
tLTA and
tSTA denote the lengths of the long-term and short-term windows (s), respectively, and
Ψ(·) denotes the absolute amplitude of the AE waveform (V). The STA/LTA ratio is calculated by continuously sliding the short-term and long-term windows along the waveform time axis. The arrival of the P-wave causes a rapid increase in the average amplitude within the short-term window, which leads to a rise in
Thr. A seismic arrival is identified when
Thr surpasses a predefined trigger threshold. Previous studies have suggested that the LTA window should be several times longer than the STA window [
43,
44]. Considering the duration and sampling frequency of AE waveforms in this study, the
tLTA and
tSTA were set to 1000 μs and 100 μs, respectively, and the trigger threshold was set to 2.
The AR-AIC method is defined as follows [
45]:
where var(
x[1,
k]) and var(
x[
k + 1,
n]) denote the variances of the waveform amplitudes before and after the candidate arrival point
k, respectively, and
n denotes the number of sampling points in the waveform. By searching all sampling points along the time axis, the minimum value of the VAR-AIC function can be identified. The time corresponding to this minimum is then regarded as the seismic arrival time.
Figure 13 and
Figure 14 present the arrival picking results of the different methods on example 1# and 2#. For example 1#, the absolute picking errors of Attention PhaseNet, PhaseNet, STA/LTA, and AR-AIC were 11, 18, 35, and 813 μs, respectively, whereas the corresponding errors for example 2# were 5, 8, 207, and 906 μs, respectively. The AR-AIC method exhibited noticeable deviations due to background noise, and the STA/LTA method was slightly affected by noise and sensitivity to window parameters. In contrast, Attention PhaseNet effectively suppressed noise interference and concentrated the picked arrival region near the true arrival position. In the two examples, the absolute picking error of Attention PhaseNet was reduced by an average of 38.20%, 93.08%, and 99.05% compared with PhaseNet, STA/LTA, and AR-AIC, respectively. These results demonstrate that Attention PhaseNet can more accurately capture abrupt arrival features and achieve the highest arrival picking accuracy.
Figure 15 presents the distributions of the absolute picking errors of the different methods on the test dataset. The traditional methods exhibited significantly larger picking errors and broader error distributions than the deep learning methods. The maximum absolute picking errors of the STA/LTA and AR-AIC methods were 1301 μs and 4390 μs, respectively. In contrast, the deep learning methods can learn temporal waveform features through model training and achieve more accurate arrival picking. In this study, the absolute picking errors of Attention PhaseNet were mainly concentrated within the range of 0–524 μs, whereas those of PhaseNet were distributed within the range of 0–549 μs. Furthermore, the maximum absolute picking error of Attention PhaseNet was reduced by 4.55% compared with PhaseNet. These results indicate that the proposed attention gates effectively suppress the transmission of noise features in the skip connections. Consequently, the Attention PhaseNet focuses on critical features associated with the P-wave arrival during the decoding process and improves the stability and accuracy of arrival picking.
Table 4 summarizes the arrival picking performances of all methods on the test dataset. Attention PhaseNet achieved the best performance in P-wave arrival picking, with an MAE and an RMSE of 17.41 μs and 43.37 μs, respectively, representing reductions of 10.58% and 18.00% compared to PhaseNet. Compared to STA/LTA, the MAE and RMSE of Attention PhaseNet were reduced by 92.92% and 90.21%, respectively, and compared to AR-AIC, the corresponding reductions were 98.25% and 96.41%. These results indicate that traditional methods that rely on fixed time windows or manually defined thresholds struggle to achieve stable arrival picking due to amplitude variations in AE data. In contrast, Attention PhaseNet achieves superior accuracy and error distribution stability by leveraging the attention mechanism and KDE-based label smoothing.
The HR metric further highlights the superiority of Attention PhaseNet. At an error threshold of 5 μs, the HR of Attention PhaseNet reached 31.90%, which was significantly higher than those of PhaseNet (28.39%), STA/LTA (5.09%), and AR-AIC (0.57%). As the error thresholds increased to 10 μs and 15 μs, the HR further increased to 52.83% and 65.38%, respectively, and remained significantly higher than those of the other methods. At the thresholds of 20 μs and 25 μs, the HR of Attention PhaseNet reached 75.57% and 82.69%, respectively. These results indicate that most AE arrivals can be accurately picked within a relatively small picking error, demonstrating the high reliability of Attention PhaseNet for AE arrival picking. In contrast, the other three methods consistently produced lower HRs across all thresholds.
3.3. P-Wave Arrival Picking Results of Different Methods Under Different SNR Conditions
To investigate the influence of noise levels on arrival picking performance, the test dataset was divided into three subsets according to the SNR of the AE waveforms: low SNR (SNR ∈ [1, 3), medium SNR (SNR ∈ (3, 6]), and high SNR (SNR > 6). The low-, medium-, and high-SNR subsets contained 133, 552, and 199 AE waveforms, respectively. All arrival picking methods were evaluated on the same waveform subset within each SNR category. The MAE, RMSE, and HR of different methods under different SNR conditions were subsequently compared. The HR values were calculated according to Equation (11) using the corresponding SNR subset as the evaluation dataset.
Table 5,
Table 6 and
Table 7 summarize the arrival picking results of the different methods under the three SNR conditions, respectively. In general, the arrival picking accuracies of all methods improved as SNR increased. However, significant differences in arrival picking accuracy were observed across methods under varying SNR conditions. The traditional methods exhibited relatively large picking errors, whereas the deep learning methods maintained lower picking errors and higher HRs across different SNR conditions, demonstrating better stability and adaptability.
Under the three SNR conditions, Attention PhaseNet consistently exhibited lower picking errors than the other methods. In the low-SNR condition, the MAE of Attention PhaseNet was reduced by 11.25%, 92.14%, and 97.89% relative to PhaseNet, STA/LTA, and AR-AIC, respectively, while its RMSE was reduced by 35.34%, 91.07%, and 97.16%, respectively. In the medium-SNR condition, the MAE of Attention PhaseNet was 19.51 μs, which was reduced by 5.70% compared with PhaseNet and by 92.62% and 98.01% compared with STA/LTA and AR-AIC, respectively. The RMSE of Attention PhaseNet was 49.89 μs, corresponding to reductions of 14.23%, 89.01%, and 95.44% compared with PhaseNet, STA/LTA, and AR-AIC, respectively. In the high-SNR condition, the MAE of Attention PhaseNet further decreased to 13.13 μs, corresponding to reductions of 2.52%, 94.58%, and 98.98% compared with PhaseNet, STA/LTA, and AR-AIC, respectively. The RMSE further decreased to 18.17 μs, representing reductions of 4.57%, 95.46%, and 98.63% relative to PhaseNet, STA/LTA, and AR-AIC, respectively. These results indicate that Attention PhaseNet achieves lower picking errors across different noise conditions and exhibits strong noise resistance.
The HR metric further illustrates the arrival picking capability of different methods across different error thresholds. Across all error thresholds and SNR conditions, Attention PhaseNet consistently achieved higher HR values than the other picking methods. Compared with PhaseNet, the average HR improvement of Attention PhaseNet across the five error thresholds reached 9.13%, 2.68%, and 10.50% under low-, medium-, and high-SNR conditions, respectively. These results indicate that the proposed method provides more accurate arrival picking and maintains stable performance under different noise levels. The advantage of Attention PhaseNet is particularly evident under low-SNR conditions. At the strictest error threshold of 5 μs, the HR of Attention PhaseNet reached 27.07%, representing a 12.51% improvement over PhaseNet. This result suggests that the introduced attention gates effectively suppress the transmission of noise features through skip connections and enhance the extraction of arrival-related features, thereby improving picking accuracy in noisy AE waveforms. In contrast, the AR-AIC method achieved an HR of 0% across all error thresholds under low-SNR conditions, indicating that it was unable to reliably distinguish P-wave arrivals from background noise.
In summary, Attention PhaseNet consistently achieved lower picking errors and higher hit rates than the other evaluated methods across all SNR conditions. The results demonstrate that incorporating attention mechanism into the PhaseNet framework improves both arrival-picking accuracy and robustness, particularly when processing low-SNR AE signals.