1. Introduction
Predicting earthquake magnitude is a highly challenging yet crucial issue in seismology [
1]. The diverse properties of seismic waves—such as low-frequency and high-frequency energy, surface waves, and body waves—are closely related to magnitude measurement, reflecting distinct source characteristics, earthquake sizes, and ranges of epicentral distances. The challenges in earthquake magnitude prediction include the limited number of monitoring stations [
2], short epicentral distances, and scarcity of training samples. In seismological research, the reliability of marker data varies significantly, with a lack of high-quality marker datasets as ground truth and a shortage of standardized benchmarks. These issues may lead to incompleteness in current seismic data and low accuracy in magnitude prediction.
The continuous development and application of machine learning in seismology have enabled the processing of large volumes of data, covering seismic phase picking [
3], earthquake recognition [
4], the determination of earthquake source mechanisms [
5], and earthquake magnitude prediction. As a result, machine learning has been widely adopted in earthquake data processing, model selection and development, and result analysis. With the advancement of seismology and machine learning, researchers have made the following progress:
In China, Liu Tao [
6] used a convolutional neural network (CNN) with seismic acceleration information as the model input for training and testing. This method achieved an accuracy of 92.3%; however, overfitting occurred during training. Additionally, earthquake acceleration time records contain only limited magnitude information, and there is a problem of missing original label data.
To address overfitting caused by large datasets, Lin Binhua [
7] constructed a CNN magnitude prediction model with 3 s waveform input. By transforming the prediction problem into a classification problem and testing the model with new samples in 2019, it was concluded that the accuracy within a magnitude error range of ±0.3 was satisfactory. Nevertheless, this method did not analyze the effects of noise or waveform data beyond the 3 s window.
Zhu Jingbao [
8] also utilized waveforms for magnitude prediction, selecting wave characteristic parameters such as amplitude and period as inputs to construct a Support Vector Machine (SVM)-based magnitude prediction model. Compared with traditional methods, this approach yielded a smaller prediction error (approximately 0.295), improved the prediction of microearthquakes, reduced the influence of epicentral distance, and demonstrated reliability across different events.
Chen-hui Wang [
9] employed a Generalized Regression Neural Network (GRNN) for earthquake magnitude prediction, using seven parameters including the cumulative earthquake frequency and released energy as inputs. Through principal component analysis and a particle swarm algorithm, an optimized model was developed. This model effectively reduced the dimensionality, enhancing the prediction accuracy and computational efficiency, with an average error of approximately 5.17% between the predicted and actual values—outperforming both backpropagation (BP) neural networks and standard GRNNs.
Mousavi [
10] designed a regressor combining a convolutional neural network (CNN) and a Recurrent Neural Network (RNN) to predict earthquake magnitudes using waveform front-end and amplitude information. This regressor is insensitive to data normalization, enabling it to leverage waveform amplitude information during training. The model can predict local magnitudes with an average error close to zero and a standard deviation of approximately 0.2. Lomax [
11] used a CNN model to predict three-component 20 Hz broadband waveforms within 50 s, incorporating the event distance, azimuth, depth, and amplitude. The final convolutional layer of this model has fewer nodes than the output classifications, effectively compressing and transmitting relevant input information, which helps reduce output noise. However, due to the use of long waveforms and limited data, real-time prediction is challenging, and the model is prone to overfitting during CNN training. Chen Wanghao [
12] et al. combined a CNN with a Long Short-Term Memory network (LSTM) to frame earthquake magnitude prediction as a classification problem. By introducing new magnitude prediction information as supplementary content and leveraging the strengths of both convolutional and recurrent architectures, their CNN-LSTM model fully exploits the LSTM’s ability to learn and extract temporal features, thereby improving the magnitude prediction accuracy.
The attention mechanism selectively filters crucial information from large datasets, focusing computational resources on salient features while ignoring irrelevant data [
13]. The Convolutional Block Attention Module (CBAM) [
14,
15] is a lightweight, general-purpose attention module that can be seamlessly integrated into any CNN architecture and trained end to end with the underlying network. To address issues such as missing training samples, overfitting in traditional neural networks, and limitations in feature extraction by single-model approaches, this study proposes a transfer learning framework that transfers knowledge from related domains to target classification tasks. Specifically, we adopt a residual shrinkage network transfer learning method, pretraining a ResNet18 model [
16] on the ImageNet dataset and augmenting it with an attention mechanism and soft thresholding modules to enhance the capture of effective feature information. The extracted feature vectors are then fed into a classifier for training. By transforming the prediction problem into a classification task and using seismic waveforms as input, this approach achieves improved prediction accuracy.
2. Data Preprocessing
2.1. Sample Selection
This study draws on data from the Stanford Earthquake Dataset (STEAD), a globally recognized labeled seismic dataset. From this dataset, local seismic waveforms were selected, each containing 35 attributes (labels). These primarily include the receiver network code, receiver code, receiver type and location, source time function, epicenter location, magnitude type, and arrival times of P-waves and S-waves. Objective attributes such as the seismic station location were excluded, and an analysis of magnitude correlation was conducted. Through feature selection, the data dimensionality was reduced to save costs and enhance the performance of the classification model. In this experiment, the results of the Kendall rank correlation coefficient were used to determine the final experimental parameters as shown in
Table 1, which consisted of four attributes: the magnitude, P-wave arrival time, S-wave arrival time, and end time. The absolute values of the correlation coefficients between the latter three attributes and magnitude were 0.67, 0.75, and 0.86, respectively.
Earthquakes with a magnitude of 3.0 or lower are generally categorized as microearthquakes or weak earthquakes. Given that the accuracy of magnitude classification decreases significantly when the magnitude difference exceeds one unit, this study focuses on predicting microearthquake magnitudes to better capture seismic events with subtle waveform differences. A total of 9000 microseismic signal waveforms with a magnitude of 3.0 or lower were selected. These samples were derived from associated waveforms in continuous time series archived by the Data Management Center of the Seismological Cooperative Research Society [
16]. Each waveform starts 5 to 10 s before the P-wave arrival and ends at least 5 s after the S-wave arrival.
In the waveform images, the arrival times of P-waves and S-waves, as well as the end time of the waveform, were marked to facilitate subsequent magnitude classification based on the images. The ratio of the training set to the test set was 8:2.
2.2. Magnitude Label
Earthquake magnitude predictions generally allow for an error margin of ±0.3 [
17]. Therefore, earthquake magnitude prediction can be treated as a classification task to categorize earthquakes by magnitude. In this study, magnitudes are classified at intervals of 0.1, resulting in 30 magnitude categories, with labeled data available for each category. This classification strategy was specifically designed to align with the granularity of the dataset: the original microseismic data are annotated with a precision of 0.1 magnitude, so the 0.1-interval classification directly corresponds to the annotation intervals. The specific classification is presented in
Table 2.
In the actual prediction process, if the model identifies a sample as label 10, it corresponds to the seismic magnitude range (0.9, 1.0] as shown in
Table 2. In this study, the final predicted value is specified as the median of this range, which is 0.95. The potential magnitude error is ±0.05, falling within the acceptable range.
2.3. Data Preprocessing
To address overfitting in the training process, this study employs data augmentation techniques tailored to waveform line plots generated from microseismic time-series data. The specific procedures are as follows:
First, images like
Figure 1 are subjected to random scaling and cropping. Each 224 × 224 waveform line plot—derived from time-series data where the
x-axis represents a 0–2000 ms duration (sampling rate: 1000 Hz) and the
y-axis amplitude is normalized to the range [−1, 1] via min–max scaling—is resized to 256 × 256, followed by cropping a 224 × 224 region from the center and converting it into a tensor. Scaling is restricted to ±10% of the original dimensions to preserve fine details such as the abrupt amplitude changes of P-waves and S-waves.
Next, waveform-specific image enhancements are applied, including horizontal flipping, which simulates symmetric variations in a time-series distribution while maintaining the proportional mapping of x-axis time intervals; rotation within a ±5° range, to avoid distorting the temporal sequence integrity of waveform phases; and Gaussian noise injection (with a signal-to-noise ratio ≥ 30 dB), to mimic residual high-frequency noise that remains uneliminated by the preprocessing soft thresholding algorithm.
Batch normalization is adopted to stabilize feature learning, with attention modules appended after the nonlinear activation of each convolutional layer. Composed of channel and spatial attention submodules, these modules recalibrate feature weights across channel and spatial dimensions—specifically emphasizing critical waveform segments (e.g., phase arrival times corresponding to x-axis time points) and discriminative amplitude features (within the y-axis range [−1, 1]) while suppressing irrelevant noise. During testing, images undergo identical preprocessing but no augmentation. Training with augmented data ensures the model’s robustness against rotational or flipping variations in unseen data, thereby effectively mitigating overfitting.
3. Network Model Construction
3.1. Residual Network
Neural networks can continuously and automatically extract image features from local to global scales, enabling functions such as image recognition [
18]. Deep learning-based networks can capture richer and more complex features and often adapt well to new tasks. However, deeper networks are more challenging to train due to the vanishing gradient problem. Therefore, this study employs the residual network (ResNet), which addresses this issue by introducing shortcut connections. This innovation allows the network to significantly increase its depth, thereby enhancing accuracy. A schematic of the residual network structure is shown in
Figure 2. In this research, the ResNet18 architecture was utilized to extract feature vector sequences.
To prevent accuracy degradation in deeper layers, the input is directly incorporated into the output via the residual connection , where represents the identity mapping. This architecture enables efficient feature propagation without increasing the number of network parameters or computational complexity. Leveraging this advantage, the ResNet18 model was selected for its simplicity and modularity, which align well with the characteristics of the seismic waveform dataset. Residual networks are widely used in deep learning and have demonstrated strong performance in image processing tasks.
The pretrained ResNet18 model, initialized with weights from ImageNet (a dataset containing 12 million images across 1000 categories), was fine-tuned for this study. During transfer learning, the weights and biases of all layers except the final classification layer were frozen. For the microseismic dataset, only the neurons in the last layer were re-initialized and trained to adapt to the new classification task. This approach allows the model to retain general visual features while quickly adapting to domain-specific patterns, thereby ensuring optimal classification performance.
3.2. Attention Mechanism
In neural networks, attention mechanisms enhance critical components of input data while suppressing irrelevant details, enabling focus on subtle yet important features.
The CBAM, a hybrid attention mechanism integrating both channel and spatial attention, has its structure illustrated in
Figure 3. The channel attention module first processes input feature maps, followed by the spatial attention module, thereby achieving seamless fusion of the two mechanisms.
For the channel attention module, input features are subjected to parallel global maximum pooling and global average pooling to compress the spatial dimensions. The resulting outputs are processed through a shared multi-layer perceptron (MLP)—with the number of neurons matching the number of input channels—to generate channel-wise attention weights via sigmoid activation. These weights refine features by emphasizing critical amplitude-related channels. The calculation of channel attention is as follows:
For the spatial attention module, channel-wise global average pooling and maximum pooling generate two 1 × H × W feature maps, which are concatenated and processed via 1 × 1 convolution and sigmoid activation to produce spatial attention weights. These weights highlight key temporal segments (e.g., P-wave/S-wave arrival times) within the waveform. The two-dimensional feature maps generated via channel-wise pooling are as follows:
The formula for calculating spatial attention is
By sequentially applying channel and spatial attention, the CBAM enables the network to precisely capture task-relevant features in microseismic waveforms. Unlike conventional mechanisms (e.g., SENet) that focus solely on channels, the CBAM retains both inter-channel interactions and spatial dependencies, addressing limitations in waveform feature extraction by optimizing parameters from multi-dimensional perspectives.
3.3. Residual Shrinkage Network Construction
Soft thresholding is a fundamental operation in many signal denoising methods. It sets features with absolute values below a predefined threshold to zero while shrinking larger values toward zero. The threshold, a hyperparameter that must be carefully tuned, directly influences the denoising efficacy. The input–output relationship of soft thresholding is defined as
3.4. Microshock Magnitude Classification Model
Convolutional neural networks (CNNs) learn hierarchical image features through successive convolution and pooling layers. However, increasing the network depth often leads to the vanishing gradient problem and performance degradation, which slows down convergence and reduces the classification accuracy. To address this issue, this study adopts a residual neural network (ResNet) architecture, which introduces shortcut connections between convolutional layers to form residual blocks. These connections enable the network to learn identity mappings, thereby preventing the loss function from increasing even as the depth grows.
Transfer learning, particularly through pretraining and fine-tuning, has proven highly effective in image classification tasks. By leveraging pretrained weights from a large source domain (e.g., ImageNet), the model shown in
Figure 4 can rapidly adapt to the target domain with a limited amount of labeled data. Specifically, we pretrain the ResNet on a source dataset, transfer the learned feature extractors, and retrain only the final fully connected layers on the target microseismic dataset. This approach exploits the similarities in the feature space between domains, thus enabling efficient knowledge transfer.
To further enhance performance, an attention mechanism is integrated to dynamically weight feature channels, which helps focus on discriminative information and mitigate overfitting. The resulting transfer learning framework effectively leverages ResNet’s capability for hierarchical feature extraction and attention-guided channel selection to improve the accuracy of microseismic magnitude classification.
Transfer Learning Strategy Selection: Three common transfer learning approaches were considered. Last-layer fine-tuning: Only update the parameters of the final fully connected layer (classification head) while keeping all other layers frozen. Scratch training: Randomly initialize all weights and train the entire network from scratch. Full-model fine-tuning: Use the pretrained ResNet18 weights as initialization, modify the classification layer, and fine-tune all layers on the target dataset [
19].
Given the substantial domain shift between ImageNet (natural images) and microseismic waveform images, we adopted ResNet18 pretrained on ImageNet with full fine-tuning as the base model for this study. This approach was chosen for the following reasons:
Rationale for Pretrained Model Selection: Despite the inherent differences between natural images and waveform line plots, the low-level convolutional kernels of the pretrained ResNet18 (e.g., edge and texture detectors) can function as effective initial extractors for local waveform features. For instance, these kernels are capable of capturing critical amplitude jumps in waveforms, which are indicative of P-wave and S-wave arrivals.
Fine-Tuning Strategy: By updating the weights of all layers during training, the higher layers of the network can adapt to waveform-specific patterns, such as temporal correlations between different seismic phases. This strategy effectively alleviates the domain mismatch between natural images and seismic data.
Advantages Over Training from Scratch: Preliminary tests indicated that utilizing pretrained weights yields significantly better performance compared to training from scratch. The pretrained initialization provides a more robust starting point, enabling the model to converge more rapidly and achieve higher accuracy. This approach leverages the strengths of transfer learning while allowing the model to fully adapt to the unique characteristics of microseismic waveform data.
Training with Cross-Entropy Loss: The model was trained using the cross-entropy loss function to minimize the divergence between the predicted and ground-truth probability distributions. This loss function is defined as
where
and
are taken as the ground truth and predicted probabilities, respectively.
This choice facilitates efficient training and mitigates issues related to neural network saturation, ensuring stable convergence even when adapting to a novel domain.
3.5. Improvement Based on the Attention Mechanism Model
Incorporating residual modules into the network enhances training efficiency, as the skip connections between low-level and high-level layers within each residual module facilitate backpropagation without compromising performance. Furthermore, by integrating structures such as the CBAM (Convolutional Block Attention Module) with residual modules (as illustrated in
Figure 5), the model can be designed with fewer parameters while maintaining comparable task-specific performance.
The improved residual model addresses the issues of vanishing gradients and performance degradation—problems in which repeated multiplication during backpropagation can lead to extremely small gradients. Following each residual block, an average pooling operation is applied, and the resulting feature embeddings are fed into a classifier independently, enabling multi-task training.
4. Classification of Experimental Results and Analysis
4.1. An Analysis of the Comparative Experimental Results of the Transfer Learning Model
First, experiments were conducted using different neural network models without transfer learning. Subsequently, to evaluate the training performance of the ResNet18 transfer learning model, the test set accuracy, training accuracy, and F1-score were adopted as evaluation metrics. Their respective definitions are as follows:
Here, TP (True Positives) refers to the number of correctly predicted positive cases, FP (False Positives) denotes the number of incorrectly predicted positive cases, and FN (False Negatives) represents the number of incorrectly predicted negative cases. Additionally, P stands for the total number of actual positive cases, and N for the total number of actual negative cases. The F1-score value range is from 0 to 1. Unlike accuracy, it comprehensively incorporates the results of both precision and recall in its calculation. A high F1-score can only be achieved when both precision and recall are high, making it a more balanced metric for evaluating classification performance.
Figure 6 presents the evaluation metrics for model training using three transfer learning strategies. This experiment focused on residual networks, comparing the training process curves of different classification models to verify the effectiveness of the employed transfer learning strategies. In the figure, the training curves show dotted lines representing the test set accuracy, while solid lines correspond to F1-scores. As the number of training epochs gradually increases, these metrics stabilize, indicating that the models possess good generalization ability.
Specific values are provided in the accompanying
Table 3. Among the strategies evaluated, the model with the lowest performance (presumably the first strategy, i.e., fine-tuning only the last layer without full transfer learning) achieved an accuracy of 84.5%, precision of 82.9%, and recall of 91.1%.
The model trained with random initialization (second strategy) yielded an accuracy of 90.1%, precision of 90.0%, and recall of 90.8%.
The model utilizing full-layer weight fine-tuning (third strategy) achieved the highest performance: 92.7% accuracy, 91.2% precision, and 90.8% recall.
Compared to the poorest-performing model, the third strategy demonstrated improvements of 8.2% in accuracy, 8.3% in precision, and 9.7% in recall. Consequently, the full-layer weight fine-tuning strategy was selected for subsequent studies.
4.2. The Magnitude Classification Model of Transfer Learning
In this study, transfer learning was compared with different neural network models. This experiment transformed the magnitude prediction task into a classification problem.
Figure 7 presents the evaluation metrics for transfer learning using different strategies. This study compared the training process curves of various models to verify the effectiveness of the proposed method.
First, various neural network models were trained, and their classification performance was compared using evaluation metrics specific to classification tasks. In this study, comparisons were conducted with traditional magnitude classification models, CNN models, and the ResNet18 model proposed herein, with additional comparative training results from the AlexNet and VGG16 models included. The respective classification evaluation metrics on the test set for these five models are presented in
Figure 7.
As indicated in the figure, the magnitude prediction network utilizing residual networks for transfer learning reached a stable state more rapidly and achieved the highest accuracy. This demonstrates that under the configured settings, training with residual networks yields superior performance. Both the training set accuracy and test set accuracy improve as the number of training iterations increases, while the time required to converge to a high level is shorter. This addresses issues such as vanishing gradients and exploding gradients caused by network deepening. The residual mapping ensures that network performance does not degrade, enabling the model to retain shallow features while learning deep features.
Notably, the mapping mechanism of ResNet18 does not introduce additional parameters or computational overhead but significantly enhances the effectiveness of network training. Consequently, ResNet18 was selected as the basic framework for network design in transfer learning, with the third fine-tuning strategy adopted. The detailed model results are presented in
Table 4.
4.3. Introducing the Training Results of the Attention Mechanism Model
In magnitude classification, an attention mechanism was introduced for model optimization. Comparative experiments were conducted using the AlexNet, VGGNet, and ResNet networks with different attention mechanisms integrated. The classification evaluation metrics of the experimental results are presented in
Figure 8, which shows the respective classification performance of residual networks incorporating the SE attention mechanism and ECA attention mechanism.
As observed from the figure, the model’s accuracy, recall, and F1-score all improved after the introduction of the attention mechanism.
Figure 9 illustrates the performance of four models trained with the CBAM under the following training configuration: a learning rate initialized at 0.001 with step-wise decay (reduced by 0.1 every 10 epochs), a batch size of 32 (optimized for NVIDIA RTX 3090 memory constraints), 50 total epochs with early stopping triggered after 5 consecutive epochs without a validation accuracy improvement (actual convergence occurred at ~30 epochs), and the Ranger optimizer with default parameters (β
1 = 0.9, β
2 = 0.999, weight decay = 1 × 10
−5). As shown, the CBAM-enhanced models demonstrate improved performance by integrating both spatial and channel attention modules. The training curves exhibit a similar shape to those of the transfer-learned ResNet18, but the CBAM approach achieves superior generalization and higher test set accuracy. Specifically, the training accuracy stabilizes around 30 epochs, aligning with the early stopping criterion and indicating efficient convergence under the specified hyperparameters.
5. Experimental Results and Analysis of the Prediction Model
5.1. Prediction Model Results
In this experiment, seismic magnitudes were categorized into 30 classes within the range of 0–3.0, in intervals of 0.1. A total of 300 waveform images were selected for prediction, with 10 waveform samples randomly selected from each class. All prediction images underwent the same preprocessing procedures as the training and test sets. Each waveform image was annotated with the arrival times of P-waves and S-waves, as well as the end time of the waveform, to support subsequent magnitude classification.
Accuracy is defined as the proportion of correctly predicted labels in the entire test dataset. The ResNet model achieved high accuracy on the seismic magnitude waveform test dataset, indicating its strong performance in magnitude classification. This confirms that the shortcut connection mechanism in ResNet effectively mitigates performance saturation in deep networks.
The bar chart as shown in
Figure 10 provides a clear visual comparison of accuracy across different magnitude classes. Notably, the classification performance for magnitudes greater than 2.0 was excellent, with accuracy exceeding 93% for all such classes. The overall classification accuracy reached 96.3%, outperforming other models in both accuracy and effectiveness.
5.2. Size Prediction Results and Analysis
In this study, to more intuitively illustrate the error distribution of magnitude estimates across different models, we generated magnitude prediction result plots, with the result deviations recorded as shown in
Figure 11. The red dots in the figure represent the predicted magnitudes, based on a total of 300 selected data points for prediction. For clear comparison, auxiliary lines
and
y = (
x − 0.3) were added to the figure.
To further validate the efficacy of the proposed model, we compared its prediction results with those of traditional microseismic magnitude prediction models, as well as CNN, AlexNet, VGGNet, and ResNet18 architectures. Traditional CNN models are commonly employed for earthquake classification with magnitude intervals of 0.5 or 1. However, our findings indicate that in more granular magnitude classification tasks (with intervals of 0.1), the traditional CNN model exhibited the lowest accuracy, with the majority of predictions deviating beyond ±0.3 and displaying the highest degree of dispersion.
While transfer learning implementations using AlexNet and VGGNet yielded marginally better results than the CNN approach for refined earthquake magnitude prediction, their predictions still contained substantial errors, characterized by consistent bias. In contrast, predictions from the improved residual network demonstrated the smallest deviations (within ±0.3), minimal dispersion, and the highest concentration of data points clustered around the diagonal line, indicating superior accuracy.
A comparative experiment with ResNet50 revealed that while its accuracy, prediction performance, and classification evaluation metrics were comparable to those of ResNet18, it incurred longer training times. Consequently, ResNet18 was selected as the optimal model for this study.
5.3. ResNet Comparison of Different Layers
In this study, preference was given to networks with lower complexity that maintained satisfactory performance, as opposed to more complex ones. To identify the optimal network architecture for the classification task, we compared results across different ResNet variants.
Table 5 presents the classification performance of ResNet18, ResNet34, and ResNet50.
As shown in the table, all three residual networks achieved high accuracy (around 90%). Specifically, ResNet18 yielded the highest accuracy at 93.72%, followed by ResNet34 at 91.76%, with ResNet50 achieving the lowest at 89.67%. This indicates that increasing the number of network layers does not necessarily improve model performance.
Figure 12 presents the classification and prediction results of transfer learning using ResNet34 and ResNet50. As shown, both residual networks exhibited small prediction errors and low data dispersion. However, individual predictions showed significant deviations—for instance, ResNet50 had a case where the predicted value differed from the actual value by two magnitude units. This suggests that while ResNet34 and ResNet50 (with 34 and 50 layers, respectively) can learn more complex features, their larger parameter counts make them prone to overfitting. Additionally, deeper networks have higher computational costs and require more device memory.
Thus, the selection of the network depth should be based on the actual microseismic magnitude detection equipment and error tolerance requirements.
5.4. An Analysis of the Experimental Results
In the final experiment, a strategy involving randomly initializing all model weights and training all layers from scratch was adopted, with the integration of the CBAM and Ranger optimizer. ResNet18 was selected as the pretrained model for transfer learning to ultimately perform microseismic magnitude classification and prediction. In this study, the allowable error thresholds were set at ±0.2 and ±0.3, and a total of 300 microseismic waveforms were used for prediction in the experiment. Detailed data from the six experiments are presented in
Table 6.
Among the schemes, the one designed in this study yielded 284 samples with errors within ±0.3, corresponding to an accuracy rate of 94.7%; 261 samples with errors within ±0.2, with an accuracy rate of 87%; and an experimental variance of 0.1362. Compared with the traditional magnitude prediction method using the CNN model, this scheme showed 70 more samples with errors within ±0.3, 95 more samples with errors within ±0.2, and a variance reduction of 0.8191, thus emerging as the top-performing approach among the six schemes.
6. Conclusions and Outlook
In this study, ResNet18 was employed for spatial feature extraction, with transfer learning accelerating convergence and enhancing model performance. The integration of the CBAM further enabled the model to focus on critical spatial and channel features, improving the identification accuracy by capturing spatial-channel interdependencies and thereby effectively addressing challenges in waveform feature extraction. Testing on 300 randomly selected waveforms yielded an average accuracy of 97.6% for the proposed model, providing valuable insights for microseismic magnitude prediction and contributing to seismological research. Notably, the model achieved a favorable balance between performance and complexity: compared to ResNet50, its parameter size was reduced by 60%, and the training time per epoch was shortened by 40% (12 min vs. 20 min), making it more adaptable to microseismic monitoring devices with limited resources. It should be acknowledged that this study focused on verifying core performance; while the model selection logic already reflects efficiency considerations, metrics such as FLOPs and inference latency have not been quantified. These will be supplemented in future work to provide a more comprehensive evaluation of the model’s practical applicability.
This study primarily focused on waveform feature extraction. Future work will proceed in two directions: first, leveraging the model’s speed and accuracy to enable applications in real-time prediction scenarios; second, enhancing model efficiency by reducing model parameters and computational overhead through adaptive weight adjustments of influencing factors, while simultaneously enhancing generalization ability.
Practical prediction is influenced by multiple factors, including the source depth, regional attenuation, and station location. Additionally, magnitude measurement methods vary with the source depth. Given that the dataset in this study consisted of waveform images, future research should also consider the impact of abnormal waveforms and noise. Potential improvements include incorporating diverse waveform patterns, adding waveform preprocessing modules for optimization, using modified residual networks with transfer learning for magnitude classification, and ultimately developing new technical solutions for magnitude prediction.