An Attention-Based Deep Learning Method for Acoustic Emission Arrival Picking in True Triaxial Hydraulic Fracturing Experiments

Lu, Ji; Lin, Botao

doi:10.3390/pr14122004

Open AccessArticle

An Attention-Based Deep Learning Method for Acoustic Emission Arrival Picking in True Triaxial Hydraulic Fracturing Experiments

by

Ji Lu

¹ and

Botao Lin

^1,2,*

¹

State Key Laboratory of Petroleum Resources and Engineering, China University of Petroleum (Beijing), Beijing 102249, China

²

College of Artificial Intelligence, China University of Petroleum (Beijing), Beijing 102249, China

^*

Author to whom correspondence should be addressed.

Processes 2026, 14(12), 2004; https://doi.org/10.3390/pr14122004 (registering DOI)

Submission received: 22 May 2026 / Revised: 16 June 2026 / Accepted: 18 June 2026 / Published: 20 June 2026

(This article belongs to the Special Issue Applications of Intelligent Models in the Petroleum Industry, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Accurate arrival picking of acoustic emission (AE) data is essential for AE event localization and hydraulic fracture characterization in true triaxial hydraulic fracturing experiments. However, conventional arrival picking methods are highly sensitive to manually defined thresholds, whereas existing deep learning models are constrained by low signal-to-noise ratios (SNRs) and limited AE dataset sizes. To address these challenges, this study proposes an attention-based deep learning method for AE arrival picking. The proposed method introduces an attention mechanism into the PhaseNet framework to suppress noise feature transmission in the skip connections. In addition, a kernel density estimation (KDE)-based label smoothing strategy was adopted to alleviate label imbalance and account for arrival-time uncertainty. The results demonstrate that the proposed method reduced the mean absolute error (MAE) by 10.58%, 92.92%, and 98.25% compared with PhaseNet, STA/LTA, and AR-AIC, respectively. The proposed method exhibited superior picking accuracy, robustness, and computational efficiency relative to the other methods, providing a reliable foundation for AE event localization and high-precision AE monitoring in hydraulic fracturing experiments.

Keywords:

attention mechanism; arrival-time picking; deep learning; acoustic emission; hydraulic fracturing

1. Introduction

During rock failure, internal strain energy is rapidly released as transient elastic waves, which are recorded as acoustic emission (AE) signals. Natural earthquakes are generally regarded as failure responses of geological strata, and they, therefore, exhibit similarities to AE in terms of source mechanisms. Manthie et al. [1] systematically investigated rock failure phenomena across multiple spatial scales and summarized the scaling relationships among signal frequency, magnitude, and source radius based on earthquakes, microseismic monitoring, field-scale AE observations, and laboratory AE monitoring. Compared to other signals, laboratory-scale AE signals are characterized by the highest frequencies (kHz–MHz), the lowest moment magnitudes, and the smallest fracture scales (source radius < 10⁻² m) [2]. However, in true triaxial hydraulic fracturing experiments, owing to the limited specimen size and stress-loading conditions, the dominant frequency of AE events is typically below 100 kHz [3]. Moreover, AE signals usually exhibit strong attenuation and low signal-to-noise ratios (SNRs).

The accuracy of arrival-time picking directly determines the precision of AE event localization. Therefore, reliable arrival-time picking methods are a critical component of AE data processing. Due to signal attenuation, S-wave arrivals are often masked by noise in AE recordings, and event localization is performed primarily using P-wave arrivals [4,5]. At present, P-wave arrival picking in AE monitoring largely relies on classical seismological methods, mainly the Short-Term Average to Long-Term Average (STA/LTA) and Autoregressive–Akaike Information Criterion (AR-AIC) methods. Before arrival-time picking, the envelope thresholding method is commonly employed as a preprocessing step to identify potential P-wave arrivals in the raw waveform data and to narrow the parameter tuning range for arrival picking algorithms. This method identifies potential P-wave arrival intervals by tracking abrupt increases in waveform amplitude, which generally correspond to wave arrival regions.

The STA/LTA method is a typical time-domain feature analysis method that identifies P-wave arrivals based on abrupt changes in waveform amplitude and energy before and after wave arrival. Specifically, the P-wave arrival time is automatically detected using the ratio of short-term to long-term signal energy [6]. The AR-AIC method, developed after Hirotugu Akaike introduced the Akaike Information Criterion (AIC) for autoregressive models, offers greater sensitivity for detecting wave arrivals within localized time windows [7]. Because AR-AIC relies on the precise definition of time windows, it is usually paired with preprocessing methods like envelope thresholding to ensure the interval contains real arrival signals [8]. Although the STA/LTA and AR-AIC methods can automatically pick AE arrivals, both rely heavily on manual selection of thresholds and window lengths. Consequently, their robustness under varying noise conditions is limited, and the picking accuracy is strongly influenced by empirical parameter selection.

In recent years, deep learning techniques have been widely applied to seismic arrival picking research. Existing methods can generally be classified into edge-detection-based approaches and semantic-segmentation-based approaches [9,10,11,12]. Edge-detection-based methods typically employ convolutional neural networks to identify abrupt waveform variations and determine arrival signals. Wang et al. [13] proposed a decision-tree-based ensemble machine learning framework that integrates multiple conventional pickers to improve the robustness of AE first-arrival detection under variable SNR conditions. However, edge-detection-based methods are highly sensitive to noise and prone to false detections when the arrival signal and noise have similar amplitudes or frequencies [14,15]. In contrast, semantic-segmentation-based methods partition the entire waveform into noise and arrival-signal regions, with the transition boundary between them defined as the seismic arrival. Compared with the former, semantic-segmentation-based methods perform point-wise waveform classification and are less sensitive to abnormal signal fluctuations. In such methods, the predicted wave arrival is directly associated with the signal–noise transition boundary, thereby providing higher interpretability [16,17]. However, noise fluctuations may also be classified as arrival-signal regions under low-SNR conditions, resulting in multiple signal–noise transition boundaries within a single waveform. To ensure a unique picking result, semantic-segmentation-based methods generally predict the arrival probability at each sampling point, and the time point corresponding to the maximum probability is identified as the wave arrival [18]. Representative semantic-segmentation-based methods include PhaseNet [19], EQTransformer [20], and PickBlue [21]. More recently, Yang et al. [22] developed a Transformer-based picking method, in which waveform sequences were directly processed by a Transformer network to identify arrival times, demonstrating superior performance compared with several traditional arrival picking methods. Furthermore, Li et al. [23] proposed an ensemble deep learning model (AEbagging) that combines multiple neural network feature extractors and an ensemble-learning decoder for laboratory AE phase picking. Among the deep-learning-based arrival picking methods, PhaseNet has become a representative network for seismic arrival identification due to its high picking accuracy and robustness [19]. PhaseNet adopts a U-Net architecture and utilizes skip connections to fuse features from shallow and deep layers of the network, thereby improving the picking accuracy of seismic arrivals. In recent years, several studies have further improved U-Net-based architectures for microseismic and AE arrival picking. For example, Zhang et al. proposed a residual-link nested U-Net (RLU-Net) to enhance feature extraction through residual learning and improve first-arrival picking performance [24]. Liu et al. introduced a feature pyramid network (FPN) to integrate multi-scale information for microseismic first arrival picking [25]. More recently, Shen et al. [26] improved U-Net3+ by employing full-scale skip connections and feature aggregation from different network depths, thereby enhancing the robustness of arrival picking in noisy seismic recordings. However, existing improvements mainly focus on residual feature learning and multi-scale information fusion. For low-SNR AE signals, the skip connections may directly transmit noise features extracted in shallow layers into deeper network layers, thereby interfering with the identification of critical arrival features and increasing picking errors. Consequently, suppressing noise feature transmission during feature fusion remains an important challenge for accurate AE arrival picking.

To date, deep learning methods still face substantial challenges when applied to laboratory hydraulic fracturing AE data. On the one hand, AE signals often experience rapid attenuation, short propagation distances, and low SNRs [27,28,29], which can cause neural networks to misclassify noise segments as valid arrival signals, thereby reducing arrival picking accuracy. On the other hand, seismological deep learning models are trained on millions of labeled datasets from global seismic networks. In contrast, laboratory AE experiments typically contain only several hundred events. The significant discrepancy in training data size limits the generalizability of existing models.

To address these challenges, this study proposes an improved PhaseNet architecture for hydraulic fracturing AE signals. The proposed method adopts PhaseNet’s output strategy and reformulates AE arrival identification as a point-wise signal–noise segmentation problem. To enhance the network’s ability to identify critical signal features while suppressing noise transmission, an attention gate was incorporated into the original network architecture, yielding the Attention PhaseNet model. Furthermore, given the small size and imbalanced nature of AE datasets, a label smoothing strategy based on kernel density estimation (KDE) was adopted. Specifically, the manually labeled AE arrivals were transformed into continuous probability distributions to alleviate label sparsity and account for arrival-time uncertainty using a Gaussian kernel density function, following the labeling strategy proposed by Zhu and Beroza [19]. In addition, the influence of the bandwidth coefficient on label smoothness and arrival picking performance was systematically investigated. Finally, comparisons among different seismic arrival picking methods were conducted to validate the accuracy and robustness of the proposed method for AE signal arrival picking.

2. Materials and Methods

2.1. AE Data Acquisition

The AE dataset used in this study was acquired from a true triaxial hydraulic fracturing experiment reported by Zhang et al. [30], which was conducted using a sand–shale outcrop sample collected from the Chang 7 Member in Yichuan County, Shaanxi Province, China. In the present study, the AE waveform data generated during the experiment were adopted for the development and evaluation of the proposed model.

The Chang 7 reservoir is a typical continental interbedded shale oil reservoir characterized by multiple sets of thick sandstone layers interbedded with thin shale formations. Distinct lithological boundaries and mutation structures are developed between the sandstone and shale layers. High-angle structural shear fractures are well developed within the thick sandstone intervals, whereas abundant laminations are observed in the thin shale layers. The reservoir porosity ranges from 3% to 10%, with an average porosity of 7.69%. Under a confining pressure of 20 MPa and a pore pressure of 5 MPa, the permeability ranges from 0.001 × 10⁻³ to 6.69 × 10⁻³ μm², with an average permeability of 0.12 × 10⁻³ μm².

The hydraulic fracturing experiment was carried out using a multi-cluster injection control and acquisition system [30], as shown in Figure 1. The system was designed to achieve real-time monitoring and dynamic regulation of the injection flow rate and injection pressure for each cluster during fracturing. It was equipped with pressure gauges, customized Coriolis flow meters, and automated control units to simultaneously monitor the injection parameters of the main pipeline and individual cluster pipelines during hydraulic fracturing experiments. The allowable injection flow rate ranged from 5 to 100 mL/min, with a control accuracy of ±1 mL/min, while the system’s pressure-bearing capacity ranged from 0 to 45 MPa, with a control accuracy of ±0.05 MPa.

The AE monitoring system utilized in this study consists of three main components: (1) a microseismic/AE analysis system developed by Itasca, with a data acquisition frequency of 10 MHz; (2) an RS-2A sensor, with a frequency range of 50 kHz–400 kHz and a central frequency of 125 kHz. This frequency band allows the sensors to effectively receive acoustic emission signals from rock samples with dimensions of 300 mm × 300 mm × 300 mm; and (3) a preamplifier configured with a gain of 40 dB. As shown in Figure 2, when the released elastic waves reach the surface of the cubic rock specimen, small displacement disturbances are generated and subsequently detected by the AE sensors. The sensors convert the mechanical vibration signals into electrical signals, which are then amplified, filtered, and recorded as AE waveform data through the processing module.

Figure 3 illustrates the installation of AE sensors on the square loading platen. The platen was a square metal plate with a side length of 300 mm, pre-fabricated with sensor mounting holes and wiring channels. Each sensor hole was positioned 50 mm from the adjacent boundaries of the platen. Each loading platen was equipped with four AE sensors. Since four loading platens were used in the true triaxial hydraulic fracturing system, a total of 16 AE sensors were deployed during the experiments. During installation, the inner end of each mounting hole was sealed with a plug, while the outer end was fitted with the piezoelectric ceramic element of the AE sensor. The square loading platen and confining plate were connected using bolts to ensure structural stability during hydraulic fracturing experiments. Lubricating oil was applied to the sensor surface as a coupling agent to ensure effective acoustic coupling between the AE sensors and the rock specimen. This procedure reduced interfacial acoustic impedance and improved signal transmission quality.

Table 1 summarizes the hydraulic fracturing parameters. The laboratory experiments used injection flow rates ranging from 20 to 120 mL/min. The in situ stress conditions were set at S_V = 20 MPa, S_H = 15 MPa, and S_h = 10 MPa, corresponding to a horizontal stress difference of 5 MPa. Field-scale single horizontal fracturing stages typically contain 6–7 perforation clusters, whereas the laboratory experiment employed two injection clusters with a spacing of 0.04 m.

As shown in Figure 4, the outcrop sample was processed into a cuboid specimen with dimensions of 250 mm × 250 mm × 200 mm. A U-shaped optical fiber cable was arranged on the outcrop specimen surface to provide auxiliary monitoring during fracture propagation. The optical fiber signal was used to identify whether the propagating fracture had reached the boundary of the outcrop specimen. Once the fracture propagation reached the specimen boundary, the hydraulic fracturing experiment was terminated to ensure that fracture evolution remained within the interior of the outcrop specimen. To satisfy the geometric size requirements and boundary loading conditions of the true triaxial loading apparatus, a layer of C80 concrete was uniformly cast around the outcrop specimen, increasing the overall specimen dimensions to 300 mm × 300 mm × 300 mm. The concrete layer was primarily used to satisfy the boundary loading requirements of the experimental apparatus while reducing the boundary effects associated with optical fiber monitoring. Hou et al. [31] demonstrated that this concrete encapsulation method exerts minimal influence on the internal stress field and fracture propagation behavior of the rock specimen and can effectively approximate true triaxial stress loading conditions.

Each AE waveform was recorded over an 8192 μs window with 8192 sampling points. A total of 5254 AE waveforms were collected during the hydraulic fracturing experiments.

2.2. Attention PhaseNet Model

To further illustrate the model design of the proposed method, this section first introduces the basic architecture of PhaseNet and then presents the proposed Attention PhaseNet model. PhaseNet is a deep neural network designed for seismic arrival picking tasks. As illustrated in Figure 5, its core architecture is based on the encoder–decoder framework of U-Net. The encoder progressively extracts temporal features associated with seismic arrivals from the input waveform via multiple one-dimensional (1D) convolutions and downsampling operations. During the downsampling process, the length of the feature sequence gradually decreases while the number of feature channels increases, enabling the extraction of semantic information.

The decoder restores the temporal resolution of the feature sequence through upsampling and convolution operations, reconstructing high-resolution features linked to seismic arrivals. At each upsampling stage, PhaseNet uses skip connections to merge features from the encoder and decoder at matching resolutions. This approach allows the network to leverage both high-level semantic information from deeper layers and fine-resolution details from shallower layers. For seismic arrival picking tasks, semantic features classify the waveform as arrival or noise, while resolution features pinpoint the exact arrival time.

Unlike traditional convolutional networks, PhaseNet omits fully connected layers and outputs a label sequence matching the input waveform. Each time point is assigned a semantic class. This output strategy avoids additional feature engineering and enables automatic learning of multi-scale temporal features, enhancing arrival picking accuracy.

In the original PhaseNet, skip connections transfer temporal features from the encoder to the decoder. However, these connections can also introduce noise from shallow layers, especially in low-SNR AE data, leading to the potential misclassification of arrivals. To address this issue, the attention mechanism originally proposed in Attention U-Net by Oktay et al. [32] was introduced into the PhaseNet architecture. Specifically, attention gates (AGs) were embedded into the skip connections to selectively suppress irrelevant features and enhance arrival-related information. By adapting the attention gate from 2D image segmentation to 1D AE waveform processing, an Attention PhaseNet architecture was developed for AE arrival picking.

The architecture of the proposed Attention PhaseNet model is illustrated in Figure 6. The network processes a single-channel AE waveform of 8192 points and outputs a probability distribution of the same length, indicating the presence of seismic arrivals. The encoder comprises four downsampling units, each containing a 1D convolutional layer, batch normalization, and ReLU activation. For instance, the first encoder unit applies two 3 × 1 convolutions with 8 channels, converting the input from 8192 × 1 to 8192 × 8. A max-pooling layer with a stride of 2 then reduces the dimension to 4096 × 8. The remaining encoder units follow a similar pattern, with output channels equal to the number of convolutional kernels. After the fourth downsampling, features are passed to the decoder for upsampling.

The decoder also contains four upsampling units, each comprising an upsampling layer, an attention gate, a skip connection, and a convolutional layer. In the deepest decoder unit, temporal resolution is restored using a 3 × 1 transposed convolution with a stride of 2. Upsampled features are concatenated with encoder features at the same dimension, and the attention gate selects those associated with seismic arrivals. The attention-weighted encoder features are concatenated with the upsampled decoder features and further processed by a 3 × 1 convolution. This sequence is repeated across all decoder stages, with convolution blocks refining the fused features. Finally, a 1 × 1 convolution layer produces a single-channel probability sequence of 8192 points. The time point with the highest probability is identified as the seismic arrival, while others are classified as noise.

In skip connections, features from shallow layers carry detailed temporal information but also considerable background noise. To address this, Attention PhaseNet places an attention gate (AG) in each skip connection to filter out noise. As shown in Figure 7, the attention gate enables the model to retain features most relevant for seismic arrival detection. The AG receives two inputs: the encoder feature map F_en with dimensions of L_en × C_en, where L_en denotes the feature length and C_en denotes the number of feature channels, and a gating signal F_gs with dimensions of L_gs × C_gs from deeper layers. The gating signal provides contextual information to help the AG identify potential arrival regions. Both inputs are first mapped to the same dimension C_int via 1 × 1 1D convolutions and combined to generate an intermediate feature:

F_{i n t} = f_{R} (W_{e n} * F_{e n} + W_{g s} * F_{g s} + b)

(1)

where W_en and W_gs denote 1 × 1 convolution kernels, F_int denotes the intermediate feature, * denotes convolution, and f_R denotes the ReLU activation function. The intermediate response is subsequently compressed into a single-channel feature and transformed into an attention coefficient vector through a Sigmoid activation function:

α = f_{S} (ψ * F_{i n t} + b_{ψ})

(2)

where ψ denotes a 1 × 1 convolution kernel and f_S denotes the Sigmoid activation function. When the lengths of the attention coefficient vector and encoder feature map differ, linear interpolation is applied to resample the attention coefficients:

α^{'} = f_{L I} (α)

(3)

where α denotes the original attention coefficient, α′ denotes the resampled attention coefficient, f_LI denotes the linear interpolation operation. For each encoder feature at the i-th time sampling point

F_{e n}^{i}

, the AG performs element-wise weighting according to the resampled attention coefficient:

{\hat{F}}_{e n}^{i} = F_{e n}^{i} \cdot α_{i}^{'}

(4)

where

{\hat{F}}_{en}^{i}

denotes the attention-weighted feature,

F_{en}^{i}

represents the encoder feature map,

α_{i}^{'}

denotes the attention coefficient at the i-th time sampling point. The weighted features are subsequently transmitted to the decoder via skip connections and jointly used in the upsampling process with deep layer features. This process suppresses noise transmission and enhances the reliability of AE arrival picking.

2.3. Label Smoothing Based on Kernel Density Estimation

In semantic-segmentation-based seismic arrival picking, each sampling point in the waveform is assigned an arrival probability prior to model training. Ideally, only the sampling point corresponding to the seismic arrival is assigned a probability of 1, whereas all other sampling points are assigned a probability of 0. Therefore, the majority of samples are labeled as noise and the resulting labels are highly imbalanced. This imbalance causes the network to preferentially learn noise features during training and neglect the scarce arrival samples, leading to reduced picking accuracy and poor generalization across varying SNR levels. Moreover, factors such as propagation medium and sensor placement introduce uncertainty in the arrival time [25,26]. Therefore, labeling the arrival as a single point may fail to capture the true physical variability and could misrepresent the underlying process.

To alleviate the label imbalance problem and account for arrival-time uncertainty, probabilistic labels were adopted in this study. Following the labeling strategy proposed by Zhu and Beroza [19], the manually labeled arrival time was transformed into a continuous probability distribution centered on the arrival position. This label-smoothing strategy converts the original binary labels into probabilistic labels and enables the network to learn contextual information surrounding the arrival, thereby improving learning stability and sensitivity to seismic arrivals.

In this study, kernel density estimation (KDE) with a Gaussian kernel was employed to generate the arrival probability distribution. The probability distribution was generated using a Gaussian kernel density function [19], which can be expressed as:

P_{a r r} = \exp (- \frac{{(t - t_{a r r})}^{2}}{2 β^{2}})

(5)

where P_arr denotes the seismic arrival probability, t denotes the time variable (s), t_arr denotes the true seismic arrival time (s), and β is the bandwidth coefficient controlling distribution smoothness and arrival uncertainty, (s). As illustrated in Figure 8, the single-point discrete label is transformed into a continuous probability distribution centered on the seismic arrival. The KDE method produces a smooth transition region near the arrival position, converting the original binary 0/1 labels into probability-based labels, thereby improving physical interpretability. A larger bandwidth produces a smoother, more tolerant distribution, while a smaller bandwidth results in a sharper, more precise label. As the bandwidth coefficient approaches zero, the probability distribution gradually converges to a Dirac delta function:

\lim_{β \to 0} P_{a r r} = δ (t - t_{a r r})

(6)

where δ denotes the Dirac delta function. This function represents the ideal binary-label distribution, where only the arrival sampling point is assigned a value of 1 and all other sampling points are assigned a value of 0. Therefore, the bandwidth coefficient directly controls the smoothness of the probability distribution. These probability-based labels allow the network to learn contextual features around the arrival time more effectively and reduce training instability caused by label sparsity. Accordingly, the influence of different bandwidth coefficients on arrival picking performance was systematically investigated in Section 3.1.

2.4. Model Training and Evaluation

Since S-wave arrivals are often masked by noise in AE recordings, this study focuses on P-wave arrival picking. Prior to model training and arrival picking, all AE waveforms were filtered using a second-order Butterworth band-pass filter with cutoff frequencies of 50 kHz and 400 kHz. The selected frequency range corresponds to the operating frequency band of the RS-2A sensors used in the hydraulic fracturing experiment. The waveform amplitudes were normalized to the range [−1, 1] to avoid amplitude differences caused by different sensor locations across signals.

To quantitatively evaluate the quality of AE data, the SNR was calculated using the method proposed by Leach et al. [33]. The segment of the waveform within a specified time window after the P-wave arrival was treated as the effective signal, while the remainder represented noise. SNR was computed as the ratio of the root mean square (RMS) amplitude within the signal window to that within the noise window:

S N R = 10 \lg \sqrt{\frac{\frac{1}{n_{w}} \sum_{t_{a r r}}^{t_{a r r} + t_{w}} x_{w}^{2}}{\frac{1}{n_{n o i}} (\sum_{t_{s t a r t}}^{t_{a r r}} x_{n o i}^{2} + \sum_{t_{a r r} + t_{w}}^{t_{e n d}} x_{n o i}^{2})}}

(7)

where n_w denotes the number of sampling points within the effective signal window, n_noi denotes the number of sampling points within the noise window, and t_w denotes the duration of the effective signal window (s). t_start and t_end denote the start and end times of the signal window (s), respectively. x_w represents the waveform data within the effective signal window, whereas x_noi represents the waveform data within the noise window.

In this study, the effective signal window length was set to 2000 μs, and only waveforms with SNR ≥ 1 dB were retained for analysis. Waveforms with SNR < 1 dB were excluded because the signal amplitude was lower than the background-noise amplitude, causing the P-wave arrivals to be severely obscured by noise and making reliable manual arrival labeling difficult. All retained P-wave arrivals were manually labeled. After removing duplicated AE waveforms and retaining only unique records, 4420 records were selected for model training and evaluation. The dataset was split randomly into training (60%), validation (20%), and test (20%) sets.

Model development and training were performed using the PyTorch2.10.0 framework [34]. All experiments were run on a Windows 10 Professional workstation with 32 GB RAM, an NVIDIA GeForce RTX 3080 Ti GPU, and an AMD Ryzen 7 5800X processor. The Adam optimizer [35] was used for model training. The loss function of the Attention PhaseNet adopts the cross-entropy function established by [19]:

L = - \sum_{i = 1}^{n} P_{a r r} \log (\hat{P_{a r r}})

(8)

where P_arr denotes the true probability distribution,

\hat{P_{arr}}

denotes the predicted probability distribution, n denotes the number of sampling points in the waveform. The model’s hyperparameters were configured as follows: learning rate 0.001, batch size 16, number of training epochs 100, and weight decay coefficient 0.0005. The learning rate and batch size were selected according to commonly adopted settings in previous deep learning studies for AE signal analysis, where a learning rate of 0.001 and batch sizes ranging from 16 to 32 have been widely used [36,37,38]. The maximum number of training epochs was set to 100 because both the training and validation losses had reached stable convergence before 100 epochs, as demonstrated in the Results section. To mitigate overfitting, a weight decay coefficient of 0.0005 was applied, which has been reported to provide a favorable balance between promoting model convergence and maintaining sufficient flexibility to fit the training data [39,40]. Additionally, we employed the early stopping method, terminating training when the validation loss failed to improve for three consecutive epochs. The improvement is defined as the absolute change in the validation loss between two adjacent epochs being larger than 0.0001. The model outputs the seismic arrival probability for each AE waveform, designating the time point with the highest probability as the picked arrival.

Model performance for P-wave arrival picking was evaluated using three metrics: mean absolute error (MAE), root mean squared error (RMSE), and hit rate (HR). MAE measures the average deviation between predicted and true arrival times, while RMSE is more sensitive to larger errors due to the squaring operation. Lower MAE and RMSE values indicate more accurate predictions. HR measures the proportion of samples with absolute picking error below a specified threshold. The formulas for these metrics are:

M A E = \frac{\sum_{j = 1}^{n_{s a m p l e}} |\hat{X_{j}} - X_{j}|}{n_{s a m p l e}}

(9)

R M S E = \sqrt{\frac{\sum_{j = 1}^{n_{s a m p l e}} {(\hat{X_{j}} - X_{j})}^{2}}{n_{s a m p l e}}}

(10)

H R = \frac{1}{n_{s a m p l e}} \sum_{j = 1}^{n_{s a m p l e}} H (ε - |\hat{X_{j}} - X_{j}|) \times 100 %

(11)

where n_sample denotes the number of samples,

\hat{X_{j}}

denotes the predicted arrival time (s) of the j-th sample, X_j denotes the true arrival time (s) of the j-th sample,

|\hat{X_{j}} - X_{j}|

represents the absolute picking error (s), and ε denotes the error threshold (s). In this study, five error thresholds of 5 μs, 10 μs, 15 μs, 20 μs, and 25 μs were adopted to evaluate the arrival picking performance. H(·) denotes the Heaviside step function, which is defined as:

H (ζ) = \{\begin{matrix} 1, ζ \geq 0 \\ 0, ζ < 0 \end{matrix}

(12)

where ζ denotes the independent variable of the step function.

3. Results

3.1. P-Wave Arrival Picking Results of Different Bandwidth Coefficients

To investigate the influence of the KDE bandwidth coefficient on P-wave arrival picking performance, five bandwidth coefficients of β = 0, 25, 100, 200, and 500 μs were selected to generate the arrival probability labels during training. As presented in Figure 9, the bandwidth coefficient β directly controls the smoothness of the probability distribution. A larger bandwidth produces a broad, smooth transition around the arrival, reflecting higher uncertainty in the predicted arrival time. Conversely, as the bandwidth decreases, the probability distribution becomes sharper and more focused on the true arrival time. When β = 0, the arrival probability collapses to a strict binary distribution.

Figure 10 shows the convergence curves for model training and validation across the five bandwidth settings, with corresponding training and validation losses summarized in Table 2. For the binary label case (β = 0 μs), the training loss decreased rapidly from 8.64 at Epoch 1 to 0.01 at Epoch 100. The validation loss initially decreased and reached its minimum at approximately Epoch 17, triggering the early stopping criterion. To facilitate comparison with other bandwidth settings, training was continued to 100 epochs. After Epoch 17, the validation loss increased sharply, reaching 6.35 at Epoch 50 and 10.41 at Epoch 100, indicating overfitting and a substantial deterioration in model generalization performance.

For models with β > 0, the convergence behavior followed a different pattern. At the start of training (Epoch 1), models with higher bandwidths showed larger initial training losses. The training losses corresponding to the four bandwidth coefficients were approximately 5.84, 7.47, 8.54, and 8.62, respectively. As training progressed (Epochs 50 and 100), losses with smaller bandwidths (e.g., β = 25 μs) converged more quickly and achieved lower final loss values. At Epoch 100, the training losses for β = 25, 100, 200, and 500 μs were 4.64, 6.02, 6.74, and 7.77, respectively. Validation losses followed a similar pattern: initial losses were higher for larger bandwidths, but as training continued, smaller bandwidths led to a faster loss decrease and lower final validation loss. The validation losses at Epoch 1 for β = 25, 100, 200, and 500 μs were approximately 5.50, 6.32, 7.11, and 8.16, respectively. After 100 training epochs, the validation losses stabilized at approximately 5.06, 6.12, 6.76, and 7.78, respectively. These results indicate that, among the positive bandwidth settings, reduced bandwidth facilitates more effective and efficient model convergence, whereas excessively large bandwidths produce broader probability distributions and consequently higher training and validation losses.

To further investigate the influence of the bandwidth coefficient on model performance, Figure 11 and Figure 12 present the arrival picking results of Attention PhaseNet for two example AE waveforms in the test dataset. As shown in Figure 11b–f, the absolute arrival picking errors of example 1# under the five bandwidth conditions were 1212, 11, 18, 32, and 91 μs, respectively. As shown in Figure 12b–f, the absolute arrival picking errors of example 2# under the five bandwidth conditions were 1016, 5, 12, 16, and 103 μs, respectively. With large bandwidths (e.g., 500 μs), the predicted probability is smooth but deviates from the true arrival. As the bandwidth decreases from 500 μs to 100 μs, the width of the probability peak gradually decreases and aligns more closely with the true P-wave arrival. At β = 25 μs, the predicted peak matches the true arrival almost exactly, yielding the highest accuracy.

Notably, when β = 0, the predicted probability remains close to zero over almost the entire waveform. The maximum predicted probabilities of P-wave arrival for examples 1# and 2# are only 0.03 and 0.05, respectively. Under this condition, the model completely failed to reliably identify true seismic arrivals, which is consistent with the overfitting observed during training. This occurs because the label distribution is extremely sparse—only one point is labeled as ‘1’, with the rest as ‘0’. Under such an extremely imbalanced label distribution, the network minimizes loss by outputting probability sequences close to zero for the entire waveform, converging to a trivial solution [41] without learning wave arrival features. These results further indicate that the β = 0 setting is prone to overfitting, resulting in degraded generalization performance and reduced arrival picking accuracy on unseen waveforms.

Table 3 summarizes the picking results of the Attention PhaseNet under different bandwidth coefficients on the test dataset. The test dataset consisted of 884 AE waveforms. As discussed in Section 3.1, the model trained with β = 0 exhibited overfitting and poor generalization performance. Therefore, the β = 0 case was excluded from the quantitative accuracy comparison, and only models trained with positive bandwidth coefficients were considered. Among the four positive bandwidth settings, the model trained with β = 25 μs achieved the best overall performance, with an MAE of 17.41 μs and an RMSE of 43.37 μs. When the bandwidth coefficient increased to 100 μs, the MAE and RMSE increased to 26.03 μs and 47.52 μs, respectively. Compared with the model trained with β = 100 μs, the MAE and RMSE of the model trained with β = 25 μs decreased by 33.12% and 8.73%, respectively. Compared with the models trained with β = 500 μs, the MAE and RMSE of the model trained with β = 25 μs decreased by 63.24% and 33.81%, respectively.

Furthermore, the model with β = 25 μs achieved the highest hit rates under all error thresholds. The HR within the error thresholds of 5 to 25 μs were 31.90%, 52.83%, 65.38%, 75.57%, and 82.69%, respectively. Compared with the model trained with β = 100 μs, the HR under the five thresholds increased by 71.97%, 61.61%, 41.67%, 32.02%, and 22.85%, respectively. Compared with the model trained with β = 200 μs, the corresponding improvements were 84.56%, 62.88%, 48.95%, 36.70%, and 25.26%, respectively. The improvements were more significant when compared with the model trained with β = 500 μs, reaching 291.89%, 228.95%, 179.16%, 156.95%, and 129.12%, respectively. A small positive bandwidth (β = 25 μs) introduces limited temporal uncertainty around the manually picked arrival while avoiding excessive label smoothing, resulting in improved arrival picking performance. In contrast, excessively large bandwidth coefficients (e.g., β = 500 μs) produce overly smooth probability distributions and increase the uncertainty in P-wave arrival times, thereby reducing the temporal resolution of the predicted probability distribution and decreasing arrival picking accuracy.

In summary, the KDE bandwidth coefficient directly influences the probability distribution and the accuracy of arrival picking. When β = 0, the highly imbalanced label distribution promotes severe overfitting, resulting in poor generalization performance and unreliable arrival picking results on AE waveforms. In contrast, excessively large bandwidth coefficients significantly smooth the arrival probability distribution and increase the uncertainty of arrival picking. Compared with the other bandwidth conditions, the model trained with β = 25 μs achieves the optimal balance between label smoothness and picking accuracy. Therefore, β = 25 μs was selected as the optimal KDE bandwidth coefficient for subsequent analyses of model performance.

3.2. P-Wave Arrival Picking Results of Different Methods

To evaluate the picking performance of the proposed Attention PhaseNet model, comparative analyses were conducted against two representative traditional methods, namely the STA/LTA method and the AR-AIC method, as well as the classical deep learning model PhaseNet. P-wave arrival picking experiments were performed for all methods using the same test dataset. The MAE, RMSE, and HR were computed across different error thresholds to evaluate the picking accuracy, stability, and robustness of the methods.

The STA/LTA method is formulated as follows [42]:

T h r = \frac{S T A (t)}{L T A (t)} = \frac{t_{L T A} \sum_{t = 1}^{t_{S T A}} Ψ (t)}{t_{S T A} \sum_{t = 1}^{t_{L T A}} Ψ (t)}

(13)

where Thr denotes the STA/LTA ratio (dimensionless), t_LTA and t_STA denote the lengths of the long-term and short-term windows (s), respectively, and Ψ(·) denotes the absolute amplitude of the AE waveform (V). The STA/LTA ratio is calculated by continuously sliding the short-term and long-term windows along the waveform time axis. The arrival of the P-wave causes a rapid increase in the average amplitude within the short-term window, which leads to a rise in Thr. A seismic arrival is identified when Thr surpasses a predefined trigger threshold. Previous studies have suggested that the LTA window should be several times longer than the STA window [43,44]. Considering the duration and sampling frequency of AE waveforms in this study, the t_LTA and t_STA were set to 1000 μs and 100 μs, respectively, and the trigger threshold was set to 2.

The AR-AIC method is defined as follows [45]:

A I C = k \log (var (x [1, k])) + (n - k - 1) \log (var (x [k + 1, n]))

(14)

where var(x[1, k]) and var(x[k + 1, n]) denote the variances of the waveform amplitudes before and after the candidate arrival point k, respectively, and n denotes the number of sampling points in the waveform. By searching all sampling points along the time axis, the minimum value of the VAR-AIC function can be identified. The time corresponding to this minimum is then regarded as the seismic arrival time.

Figure 13 and Figure 14 present the arrival picking results of the different methods on example 1# and 2#. For example 1#, the absolute picking errors of Attention PhaseNet, PhaseNet, STA/LTA, and AR-AIC were 11, 18, 35, and 813 μs, respectively, whereas the corresponding errors for example 2# were 5, 8, 207, and 906 μs, respectively. The AR-AIC method exhibited noticeable deviations due to background noise, and the STA/LTA method was slightly affected by noise and sensitivity to window parameters. In contrast, Attention PhaseNet effectively suppressed noise interference and concentrated the picked arrival region near the true arrival position. In the two examples, the absolute picking error of Attention PhaseNet was reduced by an average of 38.20%, 93.08%, and 99.05% compared with PhaseNet, STA/LTA, and AR-AIC, respectively. These results demonstrate that Attention PhaseNet can more accurately capture abrupt arrival features and achieve the highest arrival picking accuracy.

Figure 15 presents the distributions of the absolute picking errors of the different methods on the test dataset. The traditional methods exhibited significantly larger picking errors and broader error distributions than the deep learning methods. The maximum absolute picking errors of the STA/LTA and AR-AIC methods were 1301 μs and 4390 μs, respectively. In contrast, the deep learning methods can learn temporal waveform features through model training and achieve more accurate arrival picking. In this study, the absolute picking errors of Attention PhaseNet were mainly concentrated within the range of 0–524 μs, whereas those of PhaseNet were distributed within the range of 0–549 μs. Furthermore, the maximum absolute picking error of Attention PhaseNet was reduced by 4.55% compared with PhaseNet. These results indicate that the proposed attention gates effectively suppress the transmission of noise features in the skip connections. Consequently, the Attention PhaseNet focuses on critical features associated with the P-wave arrival during the decoding process and improves the stability and accuracy of arrival picking.

Table 4 summarizes the arrival picking performances of all methods on the test dataset. Attention PhaseNet achieved the best performance in P-wave arrival picking, with an MAE and an RMSE of 17.41 μs and 43.37 μs, respectively, representing reductions of 10.58% and 18.00% compared to PhaseNet. Compared to STA/LTA, the MAE and RMSE of Attention PhaseNet were reduced by 92.92% and 90.21%, respectively, and compared to AR-AIC, the corresponding reductions were 98.25% and 96.41%. These results indicate that traditional methods that rely on fixed time windows or manually defined thresholds struggle to achieve stable arrival picking due to amplitude variations in AE data. In contrast, Attention PhaseNet achieves superior accuracy and error distribution stability by leveraging the attention mechanism and KDE-based label smoothing.

The HR metric further highlights the superiority of Attention PhaseNet. At an error threshold of 5 μs, the HR of Attention PhaseNet reached 31.90%, which was significantly higher than those of PhaseNet (28.39%), STA/LTA (5.09%), and AR-AIC (0.57%). As the error thresholds increased to 10 μs and 15 μs, the HR further increased to 52.83% and 65.38%, respectively, and remained significantly higher than those of the other methods. At the thresholds of 20 μs and 25 μs, the HR of Attention PhaseNet reached 75.57% and 82.69%, respectively. These results indicate that most AE arrivals can be accurately picked within a relatively small picking error, demonstrating the high reliability of Attention PhaseNet for AE arrival picking. In contrast, the other three methods consistently produced lower HRs across all thresholds.

3.3. P-Wave Arrival Picking Results of Different Methods Under Different SNR Conditions

To investigate the influence of noise levels on arrival picking performance, the test dataset was divided into three subsets according to the SNR of the AE waveforms: low SNR (SNR ∈ [1, 3), medium SNR (SNR ∈ (3, 6]), and high SNR (SNR > 6). The low-, medium-, and high-SNR subsets contained 133, 552, and 199 AE waveforms, respectively. All arrival picking methods were evaluated on the same waveform subset within each SNR category. The MAE, RMSE, and HR of different methods under different SNR conditions were subsequently compared. The HR values were calculated according to Equation (11) using the corresponding SNR subset as the evaluation dataset. Table 5, Table 6 and Table 7 summarize the arrival picking results of the different methods under the three SNR conditions, respectively. In general, the arrival picking accuracies of all methods improved as SNR increased. However, significant differences in arrival picking accuracy were observed across methods under varying SNR conditions. The traditional methods exhibited relatively large picking errors, whereas the deep learning methods maintained lower picking errors and higher HRs across different SNR conditions, demonstrating better stability and adaptability.

Under the three SNR conditions, Attention PhaseNet consistently exhibited lower picking errors than the other methods. In the low-SNR condition, the MAE of Attention PhaseNet was reduced by 11.25%, 92.14%, and 97.89% relative to PhaseNet, STA/LTA, and AR-AIC, respectively, while its RMSE was reduced by 35.34%, 91.07%, and 97.16%, respectively. In the medium-SNR condition, the MAE of Attention PhaseNet was 19.51 μs, which was reduced by 5.70% compared with PhaseNet and by 92.62% and 98.01% compared with STA/LTA and AR-AIC, respectively. The RMSE of Attention PhaseNet was 49.89 μs, corresponding to reductions of 14.23%, 89.01%, and 95.44% compared with PhaseNet, STA/LTA, and AR-AIC, respectively. In the high-SNR condition, the MAE of Attention PhaseNet further decreased to 13.13 μs, corresponding to reductions of 2.52%, 94.58%, and 98.98% compared with PhaseNet, STA/LTA, and AR-AIC, respectively. The RMSE further decreased to 18.17 μs, representing reductions of 4.57%, 95.46%, and 98.63% relative to PhaseNet, STA/LTA, and AR-AIC, respectively. These results indicate that Attention PhaseNet achieves lower picking errors across different noise conditions and exhibits strong noise resistance.

The HR metric further illustrates the arrival picking capability of different methods across different error thresholds. Across all error thresholds and SNR conditions, Attention PhaseNet consistently achieved higher HR values than the other picking methods. Compared with PhaseNet, the average HR improvement of Attention PhaseNet across the five error thresholds reached 9.13%, 2.68%, and 10.50% under low-, medium-, and high-SNR conditions, respectively. These results indicate that the proposed method provides more accurate arrival picking and maintains stable performance under different noise levels. The advantage of Attention PhaseNet is particularly evident under low-SNR conditions. At the strictest error threshold of 5 μs, the HR of Attention PhaseNet reached 27.07%, representing a 12.51% improvement over PhaseNet. This result suggests that the introduced attention gates effectively suppress the transmission of noise features through skip connections and enhance the extraction of arrival-related features, thereby improving picking accuracy in noisy AE waveforms. In contrast, the AR-AIC method achieved an HR of 0% across all error thresholds under low-SNR conditions, indicating that it was unable to reliably distinguish P-wave arrivals from background noise.

In summary, Attention PhaseNet consistently achieved lower picking errors and higher hit rates than the other evaluated methods across all SNR conditions. The results demonstrate that incorporating attention mechanism into the PhaseNet framework improves both arrival-picking accuracy and robustness, particularly when processing low-SNR AE signals.

4. Discussion

To further evaluate the practical applicability of the different arrival picking methods, the four methods were comparatively analyzed across five aspects: mean picking error, inference time, training time, picking stability, and update capability. The comparison results are summarized in Table 8. The picking accuracy was evaluated using the MAE, whereas the picking stability was quantified by the standard deviation of the absolute picking errors, which characterizes the model’s error-control capability. The results indicate that Attention PhaseNet achieved the lowest MAE of 17.41 μs among the four methods. In addition, the standard deviation of its absolute picking errors was 6.06 μs, which was significantly lower than those of PhaseNet (49.21 μs), STA/LTA (358.79 μs), and AR-AIC (587.42 μs). These results demonstrate that Attention PhaseNet exhibits superior picking accuracy and stability across AE data with different SNR conditions.

The inference times listed in Table 8 represent the total processing time required for the entire test dataset. All computational times were measured on the same hardware platform. For a fair comparison, both the deep learning models and the conventional methods were implemented using PyTorch and executed on the GPU. The inference times of Attention PhaseNet and PhaseNet were 2.33 s and 2.34 s, respectively, both of which were significantly shorter than those of STA/LTA (3.67 s) and AR-AIC (3.42 s). This is because the deep learning methods perform direct arrival prediction through feedforward inference without iterative optimization, resulting in higher computational efficiency when processing large numbers of AE waveforms. In contrast, the STA/LTA and AR-AIC require sliding windows along the waveform time axis to calculate the STA/LTA ratio and evaluate the AIC function at candidate arrival positions. These iterative calculations increase the computational cost when processing large numbers of waveforms. Although deep learning models require longer training time than inference time, their parameters remain updateable according to variations in experimental conditions and data distributions. Therefore, deep learning methods are more suitable for AE monitoring under varying SNR conditions.

The comparative results in Table 8 indicate that Attention PhaseNet achieved superior performance in picking accuracy, stability, and computational efficiency compared with the other methods. Although the traditional methods are simple to implement and do not require model training, they are highly sensitive to threshold parameters and prone to false detections under low-SNR conditions, resulting in substantially lower stability than the deep learning methods. In contrast, the proposed Attention PhaseNet enhances the identification capability of critical P-wave arrival features through the introduced attention mechanism and achieves more reliable AE arrival picking under complex SNR conditions. These advantages provide a reliable data foundation for subsequent AE event localization.

5. Conclusions

Accurate seismic arrival picking of AE data is limited by the manually defined thresholds used in traditional methods. In addition, the low-SNR characteristics and limited dataset size of AE waveforms limit the applicability of existing deep learning models developed for seismic data. The proposed model introduces the attention gates into the PhaseNet framework to suppress noise-feature transmission during feature fusion. In addition, a KDE-based label-smoothing strategy was adopted to alleviate label imbalance and account for arrival-time uncertainty, thereby improving the accuracy and robustness of P-wave arrival picking.

The results demonstrate that the KDE bandwidth coefficient directly influences the arrival probability distribution and the accuracy of arrival picking. When the bandwidth coefficient was small (β = 0), the highly imbalanced label distribution caused the network to converge to a trivial solution, resulting in an extremely low training loss but completely degraded picking performance. In contrast, large bandwidth coefficients produced excessively smooth probability distributions and increased the uncertainty of arrival picking. Among the investigated bandwidth conditions, β = 25 μs achieved the optimal balance between probability smoothness and picking accuracy.

Under different SNR conditions, Attention PhaseNet consistently achieved lower picking errors and a higher HR than the other methods, demonstrating better picking accuracy and robustness for P-wave arrival picking. This improvement is mainly attributed to the introduced attention gates, which effectively suppress the transmission of noise features through the skip connections and enable the network to focus on critical arrival features during the decoding process.

Furthermore, Attention PhaseNet exhibited superior performance in terms of picking accuracy, stability, and inference efficiency compared to the other methods. The introduced attention mechanism enhanced the extraction capability of critical temporal features associated with P-wave arrivals and improved the reliability of AE arrival picking under complex noise conditions. These advantages provide a reliable foundation for subsequent AE event localization and contribute to the development of high-precision AE monitoring methods for hydraulic fracturing experiments.

Author Contributions

Conceptualization, J.L. and B.L.; methodology, J.L.; validation, J.L.; formal analysis, J.L.; investigation, J.L.; resources, B.L.; data curation, J.L.; writing—original draft preparation, J.L.; writing—review and editing, J.L. and B.L.; visualization, J.L.; supervision, B.L.; funding acquisition, B.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant NO. 42277122).

Data Availability Statement

Data will be made available on request. The data are not publicly available due to restrictions related to ongoing research and laboratory data management.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Manthei, G.; Eisenblätter, J.; Dahm, T. Moment tensor evaluation of acoustic emission sources in salt rock. Constr. Build. Mater. 2001, 15, 297–309. [Google Scholar] [CrossRef]
Jiang, M.; Zhao, J.; Fan, C. Stress-structure controlled time-dependent fracture mechanism of deep jointed granite: Acoustic emission moment tensor method. Eng. Fract. Mech. 2025, 318, 110953. [Google Scholar]
Wu, S.; Ge, H.; Wang, X.; Meng, F. Shale failure processes and spatial distribution of fractures obtained by AE monitoring. J. Nat. Gas Sci. Eng. 2017, 41, 82–92. [Google Scholar] [CrossRef]
Zhou, H.; Zhang, Y.; Yao, X.; Liang, P.; Cao, Y.; Lu, Y.; Song, H.; Zhao, J. Research on Acoustic Emission Localization Algorithm Without Wave Velocity Based on High-Precision Time Delay Estimation. Soc. Sci. Res. Netw. 2025, 5870565. [Google Scholar] [CrossRef]
Wang, Y.; Yang, T.; Shang, Z.; Liu, D.; Xu, J.; Liu, F. Characterization and source localization of unknown deep-ocean acoustic emissions with ocean bottom seismographs. Appl. Acoust. 2026, 250, 111343. [Google Scholar] [CrossRef]
Zhou, L.; Peng, P.; Wang, L.; Meng, H.; Wu, Z. Automated P-wave arrival picking in microseismic monitoring: Integrating multi-feature clustering and enhanced AIC-STA/LTA. Measurement 2025, 256, 118143. [Google Scholar]
Akaike, H. Akaike’s information criterion. In International Encyclopedia of Statistical Science; Springer: Berlin/Heidelberg, Germany, 2025; pp. 41–42. [Google Scholar]
Wu, L.; Yan, J.; Zhang, Y.; Hong, M.; Zhang, H.; Liao, S. MFU-Net: A multi-scale fusion U-Net for seismic phase picking. Sci. Rep. 2025, 16, 2282. [Google Scholar] [PubMed]
Anikiev, D.; Birnie, C.; bin Waheed, U.; Alkhalifah, T.; Gu, C.; Verschuur, D.J.; Eisner, L. Machine learning in microseismic monitoring. Earth-Sci. Rev. 2023, 239, 104371. [Google Scholar] [CrossRef]
Li, P.; Fang, Z.; Wang, H. Edge-detection-driven first-arrival picking method for borehole radial velocity imaging. J. Appl. Geophys. 2025, 242, 105919. [Google Scholar]
Katoh, S.; Iio, Y.; Nagao, H.; Katao, H.; Sawada, M.; Tomisaka, K. SegPhase: Development of arrival time picking models for Japan’s seismic network using the hierarchical vision transformer. Earth Planets Space 2025, 77, 118. [Google Scholar]
Wang, Q.; Sheng, G.; Tang, X.; Xie, K. Semi-Picking: A semi-supervised arrival time picking for microseismic monitoring based on the TransUGA network combined with SimMatch. Geophys. J. Int. 2025, 240, 502–534. [Google Scholar]
Wang, X.; Yue, Q.; Liu, X. Reliable arrival time picking of acoustic emission using ensemble machine learning models. Mech. Syst. Signal Process. 2024, 215, 111442. [Google Scholar] [CrossRef]
Perol, T.; Gharbi, M.; Denolle, M. Convolutional neural network for earthquake detection and location. Sci. Adv. 2018, 4, e1700578. [Google Scholar] [CrossRef] [PubMed]
Mousavi, S.M.; Zhu, W.; Sheng, Y.; Beroza, G.C. CRED: A deep residual network of convolutional and recurrent units for earthquake signal detection. Sci. Rep. 2019, 9, 10267. [Google Scholar] [CrossRef] [PubMed]
Yu, Y.; Lin, J.; Zhang, L.; Liu, G.; Hu, J.; Tan, Y.; Zhang, H. Identification of seismic wave first arrivals from earthquake records via deep learning. In Proceedings of the International Conference on Knowledge Science, Engineering and Management, Changchun, China, 17–19 August 2018; pp. 274–282. [Google Scholar]
Edigbue, P.; Al-Shuhail, A.; Muhammad, A.; Hanafy, S. Seismic event detection and first arrival picking using continuous wavelet transform and machine learning techniques. Arab. J. Sci. Eng. 2026, 51, 1913–1926. [Google Scholar]
Wang, H.; Zhang, J.; Wei, X.; Zhang, C.; Long, L.; Guo, Z. MSSPN: Automatic first-arrival picking using a multistage segmentation picking network. Geophysics 2024, 89, U53–U70. [Google Scholar] [CrossRef]
Zhu, W.; Beroza, G.C. PhaseNet: A deep-neural-network-based seismic arrival-time picking method. Geophys. J. Int. 2019, 216, 261–273. [Google Scholar]
Mousavi, S.M.; Ellsworth, W.L.; Zhu, W.; Chuang, L.Y.; Beroza, G.C. Earthquake transformer—An attentive deep-learning model for simultaneous earthquake detection and phase picking. Nat. Commun. 2020, 11, 3952. [Google Scholar] [CrossRef] [PubMed]
Bornstein, T.; Lange, D.; Münchmeyer, J.; Woollam, J.; Rietbrock, A.; Barcheck, G.; Grevemeyer, I.; Tilmann, F. PickBlue: Seismic phase picking for ocean bottom seismometers with deep learning. Earth Space Sci. 2024, 11, e2023EA003332. [Google Scholar]
Yang, Z.; Li, H.; Chen, R. An acoustic emission onset time determination method based on Transformer. Struct. Health Monit. 2024, 23, 3174–3194. [Google Scholar] [CrossRef]
Li, D.; Xie, F.; Wang, Q.Y.; Milanese, E.; Xie, J.; Li, L. An ensemble deep learning-based acoustic emission picking model reveals migratory foreshocks on large-scale laboratory fault. J. Geophys. Res. Solid Earth 2025, 130, e2024JB029934. [Google Scholar]
Zhang, J.; Sheng, G. First arrival picking of microseismic signals based on nested U-Net and Wasserstein Generative Adversarial Network. J. Pet. Sci. Eng. 2020, 195, 107527. [Google Scholar] [CrossRef]
Liu, N.; Chen, J.; Wu, H.; Li, F.; Gao, J. Microseismic first-arrival picking using fine-tuning feature pyramid networks. IEEE Geosci. Remote Sens. Lett. 2021, 19, 7505105. [Google Scholar]
Shen, T.; Jiang, X.; Wang, S.; Peng, G. Improved U-Net3+ network for first arrival picking of noisy earthquake recordings. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5912511. [Google Scholar] [CrossRef]
Niemz, P.; Cesca, S.; Heimann, S.; Grigoli, F.; von Specht, S.; Hammer, C.; Zang, A.; Dahm, T. Full-waveform-based characterization of acoustic emission activity in a mine-scale experiment: A comparison of conventional and advanced hydraulic fracturing schemes. Geophys. J. Int. 2020, 222, 189–206. [Google Scholar] [CrossRef]
Zhang, Y.; Zhao, Y.; Zang, A.; Long, A. Acoustic emission evolution and hydraulic fracture morphology of changning shale stressed to failure at different injection rates in the laboratory. Rock Mech. Rock Eng. 2024, 57, 1287–1308. [Google Scholar]
Sui, J.; Li, L.; Mu, W.; Lu, J.; Liu, L.; Gu, C. Study on acoustic emission-resistivity response of hydraulic fracturing in rock-like samples. Rock Mech. Rock Eng. 2025, 58, 5937–5959. [Google Scholar]
Zhang, Q.; Hou, B.; Wu, A.; Zhang, B.; Chen, G.; Yang, G.; Sun, T. Feature identification in multi-cluster true triaxial hydraulic fracturing using integrated fiber optic and acoustic emission monitoring. Measurement 2025, 258, 119363. [Google Scholar] [CrossRef]
Hou, B.; Zhang, Q.; Lv, J. Distributed fiber optic monitoring of asymmetric fracture swarm propagation in laminated continental shale oil reservoirs. Rock Mech. Rock Eng. 2024, 57, 5067–5087. [Google Scholar] [CrossRef]
Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B. Attention u-net: Learning where to look for the pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
Leach, R.R., Jr.; Dowla, F.U.; Schultz, C.A. Optimal filter parameters for low SNR seismograms as a function of station and event location. Phys. Earth Planet. Inter. 1999, 113, 213–226. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8024–8035. [Google Scholar]
Chen, X.; Ma, W.; Li, D.; Zhu, F.; Routray, S.; Guduri, M.; Margala, M. Kalman-based adaptive moment estimation optimisation algorithm to enhance GPT in LLMs for medical sentiment analysis of patient health-related feedback. IEEE J. Biomed. Health Inform. 2025, 30, 3744–3753. [Google Scholar]
Cheng, L.; Nokhbatolfoghahai, A.; Groves, R.M.; Veljkovic, M. Data level fusion of acoustic emission sensors using deep learning. J. Intell. Mater. Syst. Struct. 2025, 36, 77–96. [Google Scholar]
Inderyas, O.; Alver, N.; Tayfur, S.; Shimamoto, Y.; Suzuki, T. Deep learning-based acoustic emission signal filtration model in reinforced concrete. Arab. J. Sci. Eng. 2025, 50, 1885–1903. [Google Scholar]
Zheng, W.; Yu, X.; Peng, X.; Yang, C.; Wang, S.; Chen, H.; Bu, Z.; Zhang, Y.; Zhang, Y.; Lin, L. A Deep Learning Model for Detecting the Arrival Time of Weak Underwater Signals in Fluvial Acoustic Tomography Systems. Sensors 2025, 25, 922. [Google Scholar] [CrossRef] [PubMed]
Son, G.-J.; Kwak, D.-H.; Park, M.-K.; Kim, Y.-D.; Jung, H.-C. U-Net-based foreign object detection method using effective image acquisition system: A case of almond and green onion flake food process. Sustainability 2021, 13, 13834. [Google Scholar]
Yang, G.; Wang, M.; Zhou, Q.; Li, J. YUNet: Improved YOLOv11 Network for Skyline Detection. arXiv 2025, arXiv:2502.12449. [Google Scholar]
Daw, A.; Bu, J.; Wang, S.; Perdikaris, P.; Karpatne, A. Mitigating Propagation Failures in Physics-informed Neural Networks using Retain-Resample-Release (R3) Sampling. In Proceedings of the 40th International Conference on Machine Learning; Proceedings of Machine Learning Research: Norfolk, MA, USA, 2023; pp. 7264–7302. [Google Scholar]
Qiu, L.; Li, C. STA/LTA method for picking up the first arrival of natural seismic waves and its improvement analysis. Prog. Geophys. 2023, 38, 1497–1506. [Google Scholar]
Di Benedetto, A.; Figlioli, A.; D’Alessandro, A.; Lo Bosco, G. Grid-search method for short-term over long-term average parameter tuning: An application to Stromboli explosion quakes. Front. Earth Sci. 2024, 12, 1440967. [Google Scholar]
Wang, K.; Ma, K.; Tang, C.a.; Liang, Z. Study on automatic first arrival picking of mine microseismic waveforms based on the improved U-Net. Tunn. Undergr. Space Technol. 2025, 166, 106995. [Google Scholar] [CrossRef]
Maeda, N. A method for reading and checking phase times in autoprocessing system of seismic wave data. Zisin 1985, 38, 365–379. [Google Scholar] [CrossRef] [PubMed]

Figure 1. True triaxial multi-cluster hydraulic fracturing system using acoustic emission monitoring system [30].

Figure 2. Schematic diagram of acoustic emission monitoring in hydraulic fracturing.

Figure 3. Experimental setup: (a) Installation of acoustic emission sensors on the loading platen; (b) Installation of the loading platens on the hydraulic fracturing device.

Figure 4. Experimental specimens: (a) Outcrop rock specimen; (b) Hydraulic fracturing specimen.

Figure 5. Architecture of PhaseNet.

Figure 6. Architecture of the proposed Attention PhaseNet.

Figure 7. Architecture of Attention Gate.

Figure 8. Probability of seismic arrival generated by kernel density estimation: (a) Raw waveform data; (b) Probability of seismic arrival.

Figure 9. Probability of P wave arrival at different bandwidths: (a) Raw waveform data; (b) Probability at β = 0; (c) Probability at β = 25; (d) Probability at β = 100; (e) Probability at β = 200; (f) Probability at β = 500.

Figure 10. Loss convergence curve of Attention PhaseNet under different bandwidths: (a) Training loss; (b) Validation loss.

Figure 11. Arrival picking results of example 1# under different bandwidths: (a) Raw waveform data; (b) Model trained with β = 0; (c) Model trained with β = 25; (d) Model trained with β = 100; (e) Model trained with β = 200; (f) Model trained with β = 500.

Figure 12. Arrival picking results of example 2# under different bandwidths: (a) Raw waveform data; (b) Model trained with β = 0; (c) Model trained with β = 25; (d) Model trained with β = 100; (e) Model trained with β = 200; (f) Model trained with β = 500.

Figure 13. Arrival picking results of different methods on example 1#: (a) Raw waveform data; (b) Attention PhaseNet; (c) PhaseNet; (d) STA/LTA; (e) AR-AIC.

Figure 14. Arrival picking results of different methods on example 2#: (a) Raw waveform data; (b) Attention PhaseNet; (c) PhaseNet; (d) STA/LTA; (e) AR-AIC.

Figure 15. Distribution of absolute picking errors of different methods.

Table 1. Experimental parameters used in the hydraulic fracturing experiments.

Parameters	Values
Horizontal maximum principal stress S_H/MPa	15
Horizontal minimum principal stress S_h/MPa	10
Vertical stress S_V/MPa	20
Viscosity of fracturing fluid/Pa·s	1.0 × 10⁻³
Clusters	2.0
Cluster spacing/m	0.04

Table 2. Comparison of training and validation losses of Attention PhaseNet under different bandwidths.

Bandwidth Coefficient β/μs	Training Loss			Validation Loss
Bandwidth Coefficient β/μs	Epoch = 1	Epoch = 50	Epoch = 100	Epoch = 1	Epoch = 50	Epoch = 100
0	8.64	0.35	0.01	2.03	6.35	10.41
25	5.84	4.65	4.64	5.50	5.07	5.06
100	7.47	6.03	6.02	6.32	6.13	6.12
200	8.54	6.75	6.74	7.11	6.77	6.76
500	8.62	7.78	7.77	8.16	7.79	7.78

Table 3. Comparison of Arrival picking results of Attention PhaseNet under different bandwidths.

Bandwidth Coefficient β/μs	MAE/μs	RMSE/μs	HR/%
Bandwidth Coefficient β/μs	MAE/μs	RMSE/μs	5 μs	10 μs	15 μs	20 μs	25 μs
25	17.41	43.37	31.90	52.83	65.38	75.57	82.69
100	26.03	47.52	18.55	32.69	46.15	57.24	67.31
200	27.39	55.97	15.38	29.75	43.10	54.75	65.27
500	47.36	65.52	8.14	16.06	23.42	29.41	36.09

Table 4. Comparison of arrival picking results of different methods.

Methods	MAE/μs	RMSE/μs	HR/%
Methods	MAE/μs	RMSE/μs	5 μs	10 μs	15 μs	20 μs	25 μs
Attention PhaseNet	17.41	43.37	31.90	52.83	65.38	75.57	82.69
PhaseNet	19.47	52.89	28.39	48.64	64.37	75.00	81.90
STA/LTA	260.21	443.05	5.09	10.52	15.05	18.67	22.06
AR-AIC	1054.71	1207.10	0.57	0.90	1.36	1.70	2.04

Table 5. Comparison of arrival picking results of different methods under low-SNR conditions.

Methods	MAE/μs	RMSE/μs	HR/%
Methods	MAE/μs	RMSE/μs	5 μs	10 μs	15 μs	20 μs	25 μs
Attention PhaseNet	21.23	40.94	27.07	47.37	60.15	70.68	77.44
PhaseNet	23.92	63.32	24.06	41.35	56.39	66.17	73.68
STA/LTA	269.97	458.32	2.26	10.53	13.53	18.80	24.81
AR-AIC	1006.63	1439.42	0.00	0.00	0.00	0.00	0.00

Table 6. Comparison of arrival picking results of different methods under medium-SNR conditions.

Methods	MAE/μs	RMSE/μs	HR/%
Methods	MAE/μs	RMSE/μs	5 μs	10 μs	15 μs	20 μs	25 μs
Attention PhaseNet	19.51	49.89	31.52	52.72	65.58	75.36	83.61
PhaseNet	20.69	58.17	30.43	49.09	64.49	75.32	82.07
STA/LTA	264.27	453.89	5.80	9.78	14.13	17.39	20.11
AR-AIC	980.48	1094.35	0.54	1.09	1.09	1.09	1.09

Table 7. Comparison of arrival picking results of different methods under high SNR conditions.

Methods	MAE/μs	RMSE/μs	HR/%
Methods	MAE/μs	RMSE/μs	5 μs	10 μs	15 μs	20 μs	25 μs
Attention PhaseNet	13.13	18.17	36.18	56.78	69.35	79.90	86.93
PhaseNet	13.47	19.04	25.63	52.26	68.34	79.40	86.43
STA/LTA	242.45	400.15	5.03	12.56	18.59	22.11	25.63
AR-AIC	1292.74	1328.87	1.01	1.01	3.02	4.52	6.03

Table 8. Comparison of the performance of different methods.

Methods	MAE/μs	Inference Time/s	Training Time/s	Picking Stability/μs	Update Capability
Attention PhaseNet	17.41	2.33	184.34	39.29	Yes
PhaseNet	19.47	2.34	170.62	49.21	Yes
STA/LTA	260.21	3.67		358.79	No
AR-AIC	1054.71	3.42		587.42	No

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lu, J.; Lin, B. An Attention-Based Deep Learning Method for Acoustic Emission Arrival Picking in True Triaxial Hydraulic Fracturing Experiments. Processes 2026, 14, 2004. https://doi.org/10.3390/pr14122004

AMA Style

Lu J, Lin B. An Attention-Based Deep Learning Method for Acoustic Emission Arrival Picking in True Triaxial Hydraulic Fracturing Experiments. Processes. 2026; 14(12):2004. https://doi.org/10.3390/pr14122004

Chicago/Turabian Style

Lu, Ji, and Botao Lin. 2026. "An Attention-Based Deep Learning Method for Acoustic Emission Arrival Picking in True Triaxial Hydraulic Fracturing Experiments" Processes 14, no. 12: 2004. https://doi.org/10.3390/pr14122004

APA Style

Lu, J., & Lin, B. (2026). An Attention-Based Deep Learning Method for Acoustic Emission Arrival Picking in True Triaxial Hydraulic Fracturing Experiments. Processes, 14(12), 2004. https://doi.org/10.3390/pr14122004

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Attention-Based Deep Learning Method for Acoustic Emission Arrival Picking in True Triaxial Hydraulic Fracturing Experiments

Abstract

1. Introduction

2. Materials and Methods

2.1. AE Data Acquisition

2.2. Attention PhaseNet Model

2.3. Label Smoothing Based on Kernel Density Estimation

2.4. Model Training and Evaluation

3. Results

3.1. P-Wave Arrival Picking Results of Different Bandwidth Coefficients

3.2. P-Wave Arrival Picking Results of Different Methods

3.3. P-Wave Arrival Picking Results of Different Methods Under Different SNR Conditions

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI