Microseismic Velocity Inversion Based on Deep Learning and Data Augmentation

: Microseismic monitoring plays an essential role for reservoir characterization and earth-quake disaster monitoring and early warning. The accuracy of the subsurface velocity model directly affects the precision of event localization and subsequent processing. It is challenging for traditional methods to realize efficient and accurate microseismic velocity inversion due to the low signal-to-noise ratio of field data. Deep learning can efficiently invert the velocity model by constructing a mapping relationship from the waveform data domain to the velocity model domain. The predicted and reference values are fitted with mean square error as the loss function. To reduce the feature mismatch between the synthetic and real microseismic data, data augmentation is also performed using correlation and convolution operations. Moreover, a hybrid training strategy is proposed by combining synthetic and augmented data. By testing real microseismic data, the results show that the Unet is capable of high-resolution and robust velocity prediction. The data augmentation method complements more high-frequency components, while the hybrid training strategy fully combines the low-frequency and high-frequency components in the data to improve the inversion accuracy.


Introduction
Microseismic monitoring plays an important role for both fault/fracture characterization and seismic risk analysis in unconventional reservoirs and rock masses [1][2][3][4][5].Most current microseismic inversion procedures require realistic velocity models.For example, the reliability of microseismic inversion and interpretation depends heavily on the accuracy of the velocity model [6,7].However, most microseismic velocity models used in production are directly adapted from the well-logging curves, which are generally approximate to simplified models and may be contaminated by noise.Various velocity model calibration methods have been proposed based on traveltime (difference)-based inversion [8][9][10].Additionally, full waveform inversion (FWI), as a strong inversion tool, has also been introduced to microseismic inversion [11,12].However, FWI usually involves a higher computational demand and is also affected by cycle skipping due to the sinusoidal nature of the wavefield and complex scattering [13].Cycle skipping can lead convergence at local minima and thus yield incorrect velocity models.
Traditional traveltime-based velocity inversion and full-waveform inversion rely on data quality, such as signal-to-noise ratio (SNR) [14].However, the real microseismic data are usually of low SNR, which largely affects the accuracy of the inversion.In addition, traditional velocity inversion methods rely on the accuracy of the initial velocity.Recently, deep learning (DL) has shown excellent capabilities for nonlinear mapping function approximation in computer vision, especially in the tasks of reconstructing models and high-resolution images [15,16].The development of DL has also brought new opportunities to seismic and microseismic data processing and inversion [17], such as signal denoising [18], signal identification and classification [19,20], first-arrival picking [21][22][23], source location [24], and velocity model building and calibration [25].Using seismic waveforms as the feature input and velocity models as the labels, the trained models with the nonlinear mapping capability of neural networks can effectively predict velocity models from seismic waveforms.There are already several studies on using DL algorithms to invert velocity models.Araya-Polo et al. [26] extracted features from the acquired seismic data and proposed using deep convolutional neural networks (DCNNs), instead of seismic tomography, to reconstruct velocity models.Yang et al. [27] proposed a supervised deep fully convolutional neural network (FCN) approach to build velocity models directly from raw seismic data.
However, there are only a few studies on DL-based downhole microseismic velocity inversion to take advantage of the nonlinear mapping ability of deep neural networks (DNNs) to carry out velocity inversion tasks [28,29].Unlike velocity model inversion in active seismology, there is generally only one velocity model corresponding to hundreds, possibly even thousands, of microseismic events.The combination of abundant microseismic events within restricted regions and limited velocity model information hinders dataset construction and network performance.Additionally, microseismic processing and interpretation is dependent on activities and geology in the region of interest, which may limit the availability of past microseismic events for DL algorithms.In this sense, the training data play a vital role to ensure the learning performance of the network.FWI in active seismology relies heavily on low-frequency components [30], while field microseismic data generally contain higher frequency contents than active seismic data, and the high-frequency information might be missing in synthetic data considering the computational expense.Yang et al. [31] found that integrating physical information with synthetic data can improve the effectiveness of the training data and network performance.Alkhalifah et al. [32] employed the domain adaptation approach to introduce real signal features into the synthetic data by correlation and convolution operations.They demonstrated the effectiveness of domain adaptation by applying it to seismic imaging problems.Wu et al. [33] proposed to integrate domain knowledge to impose prior constraints for geophysical problems, which can improve the generalizability and interpretability of DNN models.
In this study, we adopt the Unet model to construct a mapping relationship between microseismic waveform data and the velocity model.The data augmentation is implemented by correlation and convolution operations to alleviate the feature differences between the training and real data.We also propose a hybrid training strategy to better integrate the low-frequency feature in synthetic data and high-frequency feature in augmented data.By testing real data of downhole microseismic monitoring, we demonstrate that the proposed data augmentation and hybrid training strategy is reliable and effective in predicting microseismic velocity models.

Velocity Inversion and Network Architecture
The velocity inversion can be expressed as the minimization of the following objective function: where J is the objective function, ∥ ∥ 2 denotes the Euclidean norm, d syn is the synthetic data vector, and d obs is the recorded data vector.Conventional methods for velocity inversion include seismic tomography and fullwaveform inversion, which are based on travel time and waveform, respectively.As mentioned before, the two methods rely on the data quality and the setting of the initial Appl.Sci.2024, 14, 2194 3 of 14 velocity model, both of which cannot be well satisfied in microseismic monitoring.In this paper, we use neural networks to solve this nonlinear function.Neural networks can create strongly nonlinear mappings between microseismic gathers and velocities by building multiple hidden layers: where v ≡ [v p , v s ] denotes the predicted velocity value, and θ indicates the total weight in the network.The training process of the network is realized through forward propagation and back propagation in the network models to update the θ.The testing process involves directly predicting the velocity model by inputting waveform data to the trained model.We adopt the Unet (Figure 1), as it has shown great potential for many geophysical inversion tasks [34,35].We make microseismic data and the associated velocity model {d, v} in pairs as the network input.We use the leaky rectified linear unit (LeakyReLU) activation function, which alleviates the problems of gradient vanishing and allows for a better fitting of the model [36].
waveform inversion, which are based on travel time and waveform, respectively.As men tioned before, the two methods rely on the data quality and the setting of the initial veloc ity model, both of which cannot be well satisfied in microseismic monitoring.In this pa per, we use neural networks to solve this nonlinear function.Neural networks can creat strongly nonlinear mappings between microseismic gathers and velocities by buildin multiple hidden layers: denotes the predicted velocity value, and θ indicates the tota weight in the network.The training process of the network is realized through forwar propagation and back propagation in the network models to update the θ .The testin process involves directly predicting the velocity model by inputting waveform data to th trained model.
We adopt the Unet (Figure 1), as it has shown great potential for many geophysica inversion tasks [34,35].We make microseismic data and the associated velocity mode { } , d v in pairs as the network input.We use the leaky rectified linear unit (LeakyReLU activation function, which alleviates the problems of gradient vanishing and allows for better fitting of the model [36].

Data Augmentation
Domain adaptation refers to learning when the feature distributions of the source and target domains are inconsistent [37].It aims to narrow the distribution gap between the two domains to achieve a better learning performance in the target domain.Based on the idea of domain adaptation, data augmentation is achieved by linear operations of correlation and convolution operations between synthetic and real data [38]: where i is the index of the single trace, j is the index of an arbitrary event of the real data, k is the index of the reference trace and we set k = 1, d i s (t) is the new augmented data, d i s (t) is the single trace of the synthetic data, d k s (t) is the reference trace of the synthetic data, d ij r (t) is the single trace of the real data, ⊗ is the correlation operator, and * is the convolution operator.
Here, we randomly select one reference field event for each synthetic event corresponding to each stage and set the first trace as the reference trace.The high-frequency information in the real data can be implicitly introduced through the operations in Equation ( 3).The correlation operation can eliminate the effects of recording time delays between the synthetic and real data.The data augmentation operation can reduce the feature difference between the training (source) synthetic data and the (target) real data and will finally contribute to enhancing the performance of the neural network model when applying to the real data.

Loss Functions and Quantitative Metrics
Deep learning-based microseismic velocity inversion is a regression problem.We use MSE as the loss function to fit the reference velocity model and the predicted values: where N is the total number of pixels in a single velocity image; x i and x i are a reference velocity value and a predicted value, respectively.We use the regression metrics peak signal-to-noise ratio (PSNR), structural similarity (SSI M), and mean absolute error (MAE) to quantify the prediction results and evaluate the inversion performance [39][40][41].PSNR reflects the degree of global reconstruction of the velocity image.The PSNR unit is dB, and the larger the value, the better the inversion performance: where x and x denote the velocity label and inverted velocity, respectively.
Local structure and detail are important factors when recovering a velocity model.To evaluate the performance of the network model in reconstructing the local details, we use SSI M to characterize the similarity between the predicted velocity model and the reference velocity model.The values range from 0 to 1.The higher the value, the lower the image distortion, indicating that the predicted velocity model is closer to the ground truth: where µ x and µ x represent the mean values of x i and x i values, respectively, σ x and σ x are their standard deviations, σ xx denotes their covariance, and G 1 and G 2 represent the constants to avoid a zero denominator.
MAE is utilized to evaluate the variation in velocity across the stratigraphic interface.The lower the value, the lower the error:

Training Procedure
We investigate three different training strategies, training only the synthetic dataset, training only the augmented dataset, and the hybrid training strategy: where epochs_syn is the number of epochs when training the synthetic data, and w is a weight coefficient that indicates the smoothness of the loss curve, enabling the loss value to have a smooth transition from the synthetic data training stage to the augmented data training stage.
In our single-stage and multi-stage examples, we use different parameter settings.The optimizers are Adam.After many rounds of parameter tuning and tests, we finally select the following hyperparameters: the batch sizes are 32, and w values are 0.1, while the learn rates are 0.001 and 0.0001, training epochs are 200 and 300, and epochs_syn has values of 80 and 140, respectively.
We work with a PyTorch implementation of the neural network [42].All network training and testing in this study was performed on a CPU with a frequency of 2.90 GHz and 512 GB RAM.

Data
To generate more training data, we prepare a horizontally layered model adapted from a field downhole monitoring of five-stage hydraulic fracturing [10], as shown in Figures 2 and 3a.There are 395 events in total and the event numbers from stage 1 to stage 5 are 105, 116, 48, 66, and 60, respectively.The field microseismic data contain three components and we consider only the Z component to reduce the number of operations.The acquisition system consists of 15 receivers (black reverse triangles) placed at a constant spacing of 20 m in a vertical linear array.Each trace has 1201 samples with a time interval of 0.5 ms.Four-layer velocity models are constructed referring to the velocity model from traveltime inversion with eight ball-hit events [10].We obtain 200 velocity models by adding random ±10% perturbations to the P-and S-wave velocities with fixed layer depths.We randomly set 30 source locations in the source region (Figure 3a) for each velocity model.The velocity model has a size of 64 × 200, with a grid spacing of 5 m.A Ricker wavelet with a peak frequency of 100 Hz is used as the source function.We use 6000 synthetic gathers (200 models × 30 sources) as the initial training dataset.The testing dataset included 105 field microseismic events from stage 1 (corresponding to a single reference velocity model).
Figure 3b shows the results of the power spectra comparison.The augmented data approaches the real data in terms of energy distribution by retaining more high-frequency contents of the real data.The exemplary synthetic and real microseismic waveform data are shown in Figure 3c-f.

Result 4.1. Single-Stage Examples
For single-stage examples, we focus on the feasibility of the network and training strategy.The overall quantitative metrics are listed in Table 1.As indicated in Equation ( 8), the hybrid dataset here denotes a hybrid strategy involving both synthetic and augmented data.It shows that the hybrid training strategy outperforms the other two training strategies for almost all metrics in the velocity inversion task under the same conditions.The predicted one-dimensional velocity profiles of the Unet model using the three training strategies are shown in Figure 4.The displayed velocity values correspond to two arbitrary events and are averaged along the horizontal direction.We can find that augmented data and the hybrid training strategy yield better fittings to the reference velocity model.Figure 5 shows the two-dimensional profiles corresponding to Figure 4b by the hybrid training strategy.Training with the synthetic data involves first learning the low-frequency information in the data, and then it can provide an initial velocity model (Figure 5c,d).The model obtained by training the synthetic data (low frequency) may also predict high-frequency velocity components with the real data (with high frequency), but the results have a large error since the model did not learn these high-frequency features.
After training with the augmented data containing high-frequency information, the model improves the precision of the predicted velocity models (Figure 5e,f).
14, x FOR PEER REVIEW 7 of 14

Single-Stage Examples
For single-stage examples, we focus on the feasibility of the network and training strategy.The overall quantitative metrics are listed in Table 1.As indicated in Equation ( 8), the hybrid dataset here denotes a hybrid strategy involving both synthetic and augmented data.It shows that the hybrid training strategy outperforms the other two training strategies for almost all metrics in the velocity inversion task under the same conditions.The predicted one-dimensional velocity profiles of the Unet model using the three training strategies are shown in Figure 4.The displayed velocity values correspond to two arbitrary events and are averaged along the horizontal direction.We can find that augmented data and the hybrid training strategy yield better fittings to the reference velocity model.Figure 5 shows the two-dimensional profiles corresponding to Figure 4b

Robustness Testing
In order to further evaluate the superiority of the proposed data augmentation method and hybrid training strategy, we carry out robustness tests on the real data of the first stage.We denoise the real data by wavelet filtering to obtain the clean signals, and then calculate the SNR of the real data [43]: where S N is the SNR, r S is the real data signal, c S is the clean signal after denoising the real data signal, and n S is the noise of the real data signal.
The distribution of the SNRs for all events in the first stage is shown in Figure 6.Most of the SNRs of the real events are lower than 5 dB.We select a sample event (

S N =
) to quantitatively evaluate the stability and robustness of the network.The predicted two-

Robustness Testing
In order to further evaluate the superiority of the proposed data augmentation method and hybrid training strategy, we carry out robustness tests on the real data of the first stage.We denoise the real data by wavelet filtering to obtain the clean signals, and then calculate the SNR of the real data [43]: where S/N is the SNR, S r is the real data signal, S c is the clean signal after denoising the real data signal, and S n is the noise of the real data signal.The distribution of the SNRs for all events in the first stage is shown in Figure 6.Most of the SNRs of the real events are lower than 5 dB.We select a sample event (S/N = 3.44) to quantitatively evaluate the stability and robustness of the network.The predicted two-dimensional and one-dimensional velocity profiles of the Unet model using the three training strategies are shown in Figures 7 and 8.The detailed values of quantitative metrics are listed in Figure 7.The results suggested that the data augmentation method can significantly improve the prediction accuracy of purely synthetic data by introducing real data information.Moreover, the hybrid training strategy effectively utilizes the useful information of the synthetic data in the low-frequency components and yields the best inversion results.
dimensional and one-dimensional velocity profiles of the Unet model using the three training strategies are shown in Figures 7 and 8.The detailed values of quantitative metrics are listed in Figure 7.The results suggested that the data augmentation method can significantly improve the prediction accuracy of purely synthetic data by introducing real data information.Moreover, the hybrid training strategy effectively utilizes the useful information of the synthetic data in the low-frequency components and yields the best inversion results.dimensional and one-dimensional velocity profiles of the Unet model using the three training strategies are shown in Figures 7 and 8.The detailed values of quantitative metrics are listed in Figure 7.The results suggested that the data augmentation method can significantly improve the prediction accuracy of purely synthetic data by introducing real data information.Moreover, the hybrid training strategy effectively utilizes the useful information of the synthetic data in the low-frequency components and yields the best inversion results.

Multi-Stage Examples
From the results of single-stage examples, we believe that the augmented data and hybrid training strategy have higher accuracies for efficient velocity inversion.Therefore we try to expand the research area to consider more fracturing stages.We consider all five stages, corresponding to five reference velocity models.We generate 12,000 gathers (1000 models × 12 sources) as the initial training dataset.The quantitative metrics are shown in Table 2. Compared to single-stage examples, the predictions are generally worse due to the combined effects of increased area and characteristics and limited field samples.Please also note that these metrics are mean values for all the predictions in five stages.The onedimensional velocity profiles and the loss curves are shown in Figures 9 and 10, respectively.The predictions for the first stage (Figure 9a) are better than other stages (Figure 9b), especially for the two deep layers, mainly due to the largest number and best coverage of the microseismic events in the first stage.The hybrid training strategy can achieve slightly faster convergence rates than the other two strategies.

Multi-Stage Examples
From the results of single-stage examples, we believe that the augmented data and hybrid training strategy have higher accuracies for efficient velocity inversion.Therefore, we try to expand the research area to consider more fracturing stages.We consider all five stages, corresponding to five reference velocity models.We generate 12,000 gathers (1000 models × 12 sources) as the initial training dataset.The quantitative metrics are shown in Table 2. Compared to single-stage examples, the predictions are generally worse due to the combined effects of increased area and characteristics and limited field samples.Please also note that these metrics are mean values for all the predictions in five stages.The one-dimensional velocity profiles and the loss curves are shown in Figures 9 and 10, respectively.The predictions for the first stage (Figure 9a) are better than other stages (Figure 9b), especially for the two deep layers, mainly due to the largest number and best coverage of the microseismic events in the first stage.The hybrid training strategy can achieve slightly faster convergence rates than the other two strategies.

Discussion and Conclusions
We attempt to directly invert the velocity models from microseismic waveforms in this study.The testing results with purely synthetic data demonstrate the Unet model can predict the layered velocity model quite well and in an efficient manner.Since the predicted velocity models are almost the same as the real ones and thus do not contain much information, we do not show those simple results in this manuscript.Zhou et al. [29] demonstrated the effectiveness of a modified Attention Unet in predicting complex synthetic velocity models with microseismic records.They did not consider field microseismic data and adopted Gaussian noise to evaluate the robustness of the model, while we used field data to enhance the synthetic data by data augmentation operations.We also investigate and test many other scenarios by considering different SNRs, source locations, source mechanisms, and model numbers and sizes to mimic the field cases.Specially, the number and coverage of real microseismic events largely determine the features and constraints that can be extracted by the network model.However, these cases just introduce more complicated features which require a larger training dataset and computation expense.Further investigation of the influential factors on deep learning-based microseismic velocity inversion is out of the scope of the current study.
The disadvantage of most current deep learning algorithms is the heavy dependence on the training dataset and weak generalization capability.The introduced data augmentation method and hybrid training strategy proved to be effective in alleviating the feature

Discussion and Conclusions
We attempt to directly invert the velocity models from microseismic waveforms in this study.The testing results with purely synthetic data demonstrate the Unet model can predict the layered velocity model quite well and in an efficient manner.Since the predicted velocity models are almost the same as the real ones and thus do not contain much information, we do not show those simple results in this manuscript.Zhou et al. [29] demonstrated the effectiveness of a modified Attention Unet in predicting complex synthetic velocity models with microseismic records.They did not consider field microseismic data and adopted Gaussian noise to evaluate the robustness of the model, while we used field data to enhance the synthetic data by data augmentation operations.We also investigate and test many other scenarios by considering different SNRs, source locations, source mechanisms, and model numbers and sizes to mimic the field cases.Specially, the number and coverage of real microseismic events largely determine the features and constraints that can be extracted by the network model.However, these cases just introduce more complicated features which require a larger training dataset and computation expense.Further investigation of the influential factors on deep learning-based microseismic velocity inversion is out of the scope of the current study.
The disadvantage of most current deep learning algorithms is the heavy dependence on the training dataset and weak generalization capability.The introduced data augmentation method and hybrid training strategy proved to be effective in alleviating the feature gap in data domains and improving the generalization ability of the network model, which may provide guidance for other deep learning-based seismic inversion tasks.Transfer learning is also helpful to fill the feature gap, but also relies on the scale of the training data.Another feasible approach to realize seismic inversion with a limited training dataset is combing data-driven algorithms with the physical laws of seismic wave propagation, to provide more physical constraints and optimize the learning performance.In this work, we only consider a horizontally layered model, which is the most-commonly used model in microseismic processing.We will investigate the performance of the proposed method on heterogeneous models and compare it with conventional velocity inversion methods (e.g., FWI method).One of the advantages of deep learning methods is the weak dependence on the raypath coverage since we can train the model with a large and complete dataset.
In this paper, we propose an improved deep learning method for microseismic velocity inversion.The synthetic data are augmented to incorporate the features of the real data, and a hybrid training strategy that integrates the synthetic and augmented data is introduced.The Unet model can directly predict the layered velocity model from microseismic waveforms.Training the synthetic data involves first learning the low-frequency information in the data, and then it can provide an initial velocity model.Then, the augmented data are trained to learn the high-frequency information, which can improve the precision of the predicted velocity model.The hybrid training strategy makes better use of the data and enables the model to learn more imbedded connections between the waveforms and velocity models.Field downhole microseismic examples demonstrate the feasibility and superiority of the proposed method for efficient inversion of microseismic velocity models.

Figure 1 .
Figure 1.Unet network architecture.Gathers are input features, and the outputs are velocity model Each box represents the output feature map of the convolutional layer.The number at the top o each box indicates the channel number in the corresponding feature map.The encoder consists of convolution layer with a 3 × 3 convolution kernel size (blue arrow), a batch normalization (BN) laye a leaky rectified linear unit (LeakyReLU), and a 2 × 2 maximum pooling layer and the Dropout laye (yellow arrow).Each decoder replaces the maximum pooling layer with a 5 × 5 transposed convo lution layer (black arrows).Skip connections indicate the corresponding channel feature maps con necting the encoder and decoder sections (green arrows).

Figure 1 .
Figure 1.Unet network architecture.Gathers are input features, and the outputs are velocity models.Each box represents the output feature map of the convolutional layer.The number at the top of each box indicates the channel number in the corresponding feature map.The encoder consists of a convolution layer with a 3 × 3 convolution kernel size (blue arrow), a batch normalization (BN) layer, a leaky rectified linear unit (LeakyReLU), and a 2 × 2 maximum pooling layer and the Dropout layer (yellow arrow).Each decoder replaces the maximum pooling layer with a 5 × 5 transposed convolution layer (black arrows).Skip connections indicate the corresponding channel feature maps connecting the encoder and decoder sections (green arrows).

Figure 2 .
Figure 2. The layout of a real downhole microseismic monitoring project.(a) Three-dimensional view.(b) Side view of (a).Black reverse triangles indicate the receivers and the dots are microseismic events.

Figure 3 .
Figure 3. Model and data.(a) A horizontally layered model for downhole microseismic monitoring.The black rectangle indicates the region where the sources are located, the red pentagram indicates an arbitrary source, and black reverse triangles indicate the receivers.(b) Power spectra comparison.(c) The original noise-free synthetic waveforms generated by ray-tracing.(d) Real microseismic data.(e) Result of the real data autocorrelation.(f) The augmented data for the synthetic waveforms in (c).

Figure 2 . 14 Figure 2 .
Figure 2. The layout of a real downhole microseismic monitoring project.(a) Three-dimensional view.(b) Side view of (a).Black reverse triangles indicate the receivers and the dots are microseismic events.

Figure 3 .
Figure 3. Model and data.(a) A horizontally layered model for downhole microseismic monitoring.The black rectangle indicates the region where the sources are located, the red pentagram indicates an arbitrary source, and black reverse triangles indicate the receivers.(b) Power spectra comparison.(c) The original noise-free synthetic waveforms generated by ray-tracing.(d) Real microseismic data.(e) Result of the real data autocorrelation.(f) The augmented data for the synthetic waveforms in (c).

Figure 3 .
Figure 3. Model and data.(a) A horizontally layered model for downhole microseismic monitoring.The black rectangle indicates the region where the sources are located, the red pentagram indicates an arbitrary source, and black reverse triangles indicate the receivers.(b) Power spectra comparison.(c) The original noise-free synthetic waveforms generated by ray-tracing.(d) Real microseismic data.(e) Result of the real data autocorrelation.(f) The augmented data for the synthetic waveforms in (c).
by the hybrid training strategy.Training with the synthetic data involves first learning the lowfrequency information in the data, and then it can provide an initial velocity model (Figure 5c,d).The model obtained by training the synthetic data (low frequency) may also predict high-frequency velocity components with the real data (with high frequency), but the results have a large error since the model did not learn these high-frequency features.After training with the augmented data containing high-frequency information, the model improves the precision of the predicted velocity models (Figure 5e,f).

Figure 4 .
Figure 4. One-dimensional profiles of the reference and predicted velocity values of two arbitrary events from the first stage.(a) Velocity curves of one sample event.The red solid and dashed lines indicate the reference velocity for P-and S-wave, respectively, and the blue, magenta, and black dashed lines indicate the results of training with synthetic, augmented, and hybrid dataset.(b) Velocity curves of another sample event.The meanings of the symbols and colors are the same with (a).

Figure 4 .
Figure 4. One-dimensional profiles of the reference and predicted velocity values of two arbitrary events from the first stage.(a) Velocity curves of one sample event.The red solid and dashed lines indicate the reference velocity for P-and S-wave, respectively, and the blue, magenta, and black dashed lines indicate the results of training with synthetic, augmented, and hybrid dataset.(b) Velocity curves of another sample event.The meanings of the symbols and colors are the same with (a).

Figure 5 .
Figure 5. Two-dimensional profiles of the reference and predicted velocity values of an arbitrary event from the first stage.(a,b) The reference P-and S-wave velocities.(c,d) Predictions of P-and Swave velocities trained with the synthetic dataset (when the epoch is 80).(e,f) Predictions of P-and S-wave velocities trained with both the synthetic and the augmented dataset (when the epoch is 200).

Figure 5 .
Figure 5. Two-dimensional profiles of the reference and predicted velocity values of an arbitrary event from the first stage.(a,b) The reference P-and S-wave velocities.(c,d) Predictions of P-and S-wave velocities with the synthetic dataset (when the epoch is 80).(e,f) Predictions of P-and S-wave velocities trained with both the synthetic and the augmented dataset (when the epoch is 200).

Figure 6 .
Figure 6.The distribution of SNRs of the events of the first stage.Figure 6.The distribution of SNRs of the events of the first stage.

Figure 6 .
Figure 6.The distribution of SNRs of the events of the first stage.Figure 6.The distribution of SNRs of the events of the first stage.

Figure 6 .
Figure 6.The distribution of SNRs of the events of the first stage.

Figure 7 .
Figure 7. Two-dimensional profiles of the predicted velocity values of the sample event.(a,b) Predictions of P-and S-wave velocities trained with the synthetic dataset only.(c,d) Predictions of P-and S-wave velocities trained with the augmented dataset only.(e,f) Predictions of P-and S-wave velocities trained with the hybrid strategy involving both synthetic and augmented data.The reference velocity models are shown in Figure 5a,b.

Figure 7 .
Figure 7. Two-dimensional profiles of the predicted velocity values of the sample event.(a,b) Predictions of P-and S-wave velocities trained with the synthetic dataset only.(c,d) Predictions of Pand S-wave velocities trained with the augmented dataset only.(e,f) Predictions of P-and S-wave velocities trained with the hybrid strategy involving both synthetic and augmented data.The refer ence velocity models are shown in Figure 5a,b.

Figure 8 .
Figure 8. One-dimensional profiles of the reference and predicted velocity values of the sample event.The meanings of the symbols and colors are the same as Figure 4.

Figure 8 .
Figure 8. One-dimensional profiles of the reference and predicted velocity values of the sample event.The meanings of the symbols and colors are the same as Figure 4.

Figure 9 .
Figure 9. One-dimensional profiles of the reference and predicted velocity values of two arbitrary events from two stages.(a) Velocity curves of one sample event.The red solid and dashed lines indicate the reference velocity for the P-and S-wave, respectively, and the blue, magenta, and black dashed lines indicate the results of training with synthetic, augmented, and hybrid dataset.(b) Velocity curves of another sample event.The meanings of the symbols and colors are the same as Figure 4.

Figure 9 .
Figure 9. One-dimensional profiles of the reference and predicted velocity values of two arbitrary events from two stages.(a) Velocity curves of one sample event.The red solid and dashed lines indicate the reference velocity for the P-and S-wave, respectively, and the blue, magenta, and black dashed lines indicate the results of training with synthetic, augmented, and hybrid dataset.(b) Velocity curves of another sample event.The meanings of the symbols and colors are the same as Figure 4.

Figure 9 .
Figure 9. One-dimensional profiles of the reference and predicted velocity values of two arbitrary events from two stages.(a) Velocity curves of one sample event.The red solid and dashed lines indicate the reference velocity for the P-and S-wave, respectively, and the blue, magenta, and black dashed lines indicate the results of training with synthetic, augmented, and hybrid dataset.(b) Velocity curves of another sample event.The meanings of the symbols and colors are the same as Figure 4.

Figure 10 .
Figure 10.The loss curves for three different training strategies.

Figure 10 .
Figure 10.The loss curves for three different training strategies.

Table 1 .
The mean values of quantitative metrics for single-stage examples.

Table 1 .
The mean values of quantitative metrics for single-stage examples.

Table 2 .
The mean values of quantitative metrics for multi-stage examples.

Table 2 .
The mean values of quantitative metrics for multi-stage examples.