A Clutter Suppression Method Based on LSTM Network for Ground Penetrating Radar

: It is critical to estimate and eliminate the wavelets of ground penetrating radar (GPR), so as to optimally compensate the energy attenuation and phase distortion. This paper presents a new wavelet extraction method based on a two-layer Long Short-Term Memory (LSTM) network. It only uses several random A-scan echoes (i.e., single channel detection echo sequence) to accurately predict the wavelet of any scene. The layered detection scenes with objects buried in different region are set for the 3D Finite-Difference Time-Domain simulator to generate radar echoes as a dataset. Additionally, the simulation echoes of different scenes are used to test the performance of the neural network. Multiple experiments indicate that the trained network can directly predict the wavelets quickly and accurately, although the simulation environment becomes quite different. Moreover, the measured data collected by the Qingdao Radio Research Institute radar and the unmanned aerial vehicle ground penetrating radar are used for test. The predicted wavelets can perfectly offset the original data. Therefore, the presented LSTM network can effectively predict the wavelets and their tailing oscillations for different detection scenes. The LSTM network has obvious advantages compared with other wavelet extraction methods in practical engineering.


Introduction
Ground penetrating radar (GPR) is an electromagnetic (EM) imaging device that uses the reflection and scattering characteristics of EM waves in discontinuous media to achieve non-destructive detection [1]. It has been used in many engineering detection fields such as ground ice detection [2], underground pipeline detection, and criminal investigations [3]. Recently, space exploration probes, such as Lunar and Mars exploration, are also equipped with GPR equipment for geological stratification studies. However, due to the complex underground environment, the radar echo always contains clutter such as multiple waves, antenna coupled waves, and reflected waves from other non-detection targets, which seriously obscure the signal of the buried targets and bring great difficulty to the interpretation of GPR data.
Many scholars have conducted in-depth research into methods of clutter suppression. Some research designed an appropriate antenna system to enhance the echo of a buried object and reduce the background noise. For example, Liu et al. [4] developed a hybrid dual-polarization GPR system with one circularly polarized transmit antenna and two linearly polarized receive antennas to improve detection and estimation of slender tubular targets. Others are based on signal processing algorithms. For example, the reference wave method averages a few A-scans echoes without target information to find the wavelets of the detection scene, and then subtracts the mean from the original data [5]. The background matrix subtraction (BMS) method is an improvement of the average cancellation As shown in Figure 1a, the GPR detection trolley is used to detect the buried object in a sand bunker. The trolley moves at a constant speed along a straight line to collect 512 channels of A-Scan data; each has 1024 samples. Generally, the convolution of GPR wavelet and reflection sequence constitutes the echo of GPR detection system. The radar first receives the strong reflection signal of the upper surface. When the transmitted pulse encounters the target in the underground, there is a reflected signal of the target, which is relatively weak. Figure 1b is the time stacking diagram of the first 512 sampling points. It can be seen that the strong wavelet submerges the signal of target. Moreover, the wavelets at the adjacent monitoring points on the same survey line in the same detection area are very similar.
Three correlation coefficients of Pearson's linear correlation coefficient [14], Kendall's tau coefficient [15], and Spearman's rho [16] are used to evaluate the relevance of adjacent A-Scan echoes. The correlation coefficient greater than zero indicates that the two groups of data are positively correlated. On the contrary, a correlation coefficient less than zero indicates that they are negatively correlated. Moreover, the greater absolute value indicates the stronger correlation. Three correlation coefficients of the A-Scan echoes in Figure 1 are shown in Figure 2. The mean values are 1, 0.9998, and 0.9894, respectively, which proves the strong correlation between adjacent echoes. Several isolated valley points on the curves correspond to the buried target, especially on the line of Kendall's tau. This strong correlation in time series could be well explored by neural networks with memory. Three correlation coefficients of Pearson's linear correlation coefficient [14], Kendall's tau coefficient [15], and Spearman's rho [16] are used to evaluate the relevance of adjacent A-Scan echoes. The correlation coefficient greater than zero indicates that the two groups of data are positively correlated. On the contrary, a correlation coefficient less than zero indicates that they are negatively correlated. Moreover, the greater absolute value indicates the stronger correlation. Three correlation coefficients of the A-Scan echoes in Figure  1 are shown in Figure 2. The mean values are 1, 0.9998, and 0.9894, respectively, which proves the strong correlation between adjacent echoes. Several isolated valley points on the curves correspond to the buried target, especially on the line of Kendall's tau. This strong correlation in time series could be well explored by neural networks with memory.

The Structure of Two-Layer LSTM Model Network
Recurrent neural networks (RNNs) are commonly applied to explore relations in sequential data. The special network structure can selectively store the past information and use them together with current input to speculate the future information. Long Short-Term Memory (LSTM) is a variation of RNN with the capability to prevent gradients decaying or exploding. It can fully explore the non-linear relationship between variables and process complex long-term time series dynamic information [17]. Figure 3 shows the cell structure of LSTM network, which has a new memory unit Rt and three control gates, namely input gate , forget gate , and output gate . The computing flow is expressed as the following equations [18].  Three correlation coefficients of Pearson's linear correlation coefficient [14], K tau coefficient [15], and Spearman's rho [16] are used to evaluate the relevance of A-Scan echoes. The correlation coefficient greater than zero indicates that the tw of data are positively correlated. On the contrary, a correlation coefficient less t indicates that they are negatively correlated. Moreover, the greater absolute va cates the stronger correlation. Three correlation coefficients of the A-Scan echoes 1 are shown in Figure 2. The mean values are 1, 0.9998, and 0.9894, respectivel proves the strong correlation between adjacent echoes. Several isolated valley p the curves correspond to the buried target, especially on the line of Kendall's strong correlation in time series could be well explored by neural networks with

The Structure of Two-Layer LSTM Model Network
Recurrent neural networks (RNNs) are commonly applied to explore relatio quential data. The special network structure can selectively store the past informa use them together with current input to speculate the future information. Lon Term Memory (LSTM) is a variation of RNN with the capability to prevent grad caying or exploding. It can fully explore the non-linear relationship between varia process complex long-term time series dynamic information [17]. Figure 3 shows the cell structure of LSTM network, which has a new memor and three control gates, namely input gate , forget gate , and output gate . T puting flow is expressed as the following equations [18].

The Structure of Two-Layer LSTM Model Network
Recurrent neural networks (RNNs) are commonly applied to explore relations in sequential data. The special network structure can selectively store the past information and use them together with current input to speculate the future information. Long Short-Term Memory (LSTM) is a variation of RNN with the capability to prevent gradients decaying or exploding. It can fully explore the non-linear relationship between variables and process complex long-term time series dynamic information [17]. Figure 3 shows the cell structure of LSTM network, which has a new memory unit R t and three control gates, namely input gate 1 , forget gate 2 , and output gate 3 . The computing flow is expressed as the following equations [18].
Here, yt is the output and xt is the input. σ is the sigmoid function. tanh is the hyperbolic tangent function. w and b are the weights and biases, respectively. The subscripts I, F, O, and R represent the input gate, forget gate, output gate and memory unit, respectively. The symbol " * " means convolution. As is well known, the wavelet sequence of a given GPR system and detection environment is always stable. The characteristics of LSTM network makes it possible to extract the wavelet sequences from the GPR echoes. In the following, multiple A-scan echoes on one survey line are connected end to end to form a longer sequence for wavelet prediction. The network structure contains two-layer LSTM, two-layer Dropout, and one-layer Dense. The dropout probability of 0.2 is set to prevent overfitting. The timestep is 100, batch size is 200, epoch is 200, and learning rate is 1 × 10 −6 . The input length of each timestep is 100, and the output length is 1. If the loss does not decrease within 10 epochs, the learning rate is dynamically adjusted to be 0.1 times that of before.

Network Training
As shown in Figure 4a, three-layer infinite medium is set up to simulate air, cement, and limestone from top to bottom. The relative permittivity of cement and limestone are εr2 = 6 and εr3 = 9, respectively. The cement layer thickness is d1 = 0.4 m. The top air layer and the bottom limestone layer are both infinite. A PEC cube object with a 0.2 m edge length is partially buried between the cement layer and the limestone layer. The depth from the top of the cube to the ground is d = 0.3 m. The distance between the transmitting and receiving antennas is L = 0.4 m, and the height above the ground is h = 0.4 m. The two antennas are vertically located on both sides of the survey line and move synchronously along the survey line of the red arrow. The signal source is a ricker wave with center frequency of 200 MHz, time sampling interval of 0.0385 ns. The spatial sampling interval along the survey line is 0.04 m. A 3D-FDTD simulation tool [19] is used to generate the Ascan data. In total, 95 channels of A-Scan echoes, each with 780 samples, are accumulated along survey line to form the B-Scan image, as shown in Figure 4b. In order to eliminate the signal similarity of dense sampling along the survey line, we selected 30 channels of data at equal intervals as the data set. From these data, 18 channels of A-scan echoes are Here, y t is the output and x t is the input. σ is the sigmoid function. tanh is the hyperbolic tangent function. w and b are the weights and biases, respectively. The subscripts I, F, O, and R represent the input gate, forget gate, output gate and memory unit, respectively. The symbol " * " means convolution. As is well known, the wavelet sequence of a given GPR system and detection environment is always stable. The characteristics of LSTM network makes it possible to extract the wavelet sequences from the GPR echoes. In the following, multiple A-scan echoes on one survey line are connected end to end to form a longer sequence for wavelet prediction.
The network structure contains two-layer LSTM, two-layer Dropout, and one-layer Dense. The dropout probability of 0.2 is set to prevent overfitting. The timestep is 100, batch size is 200, epoch is 200, and learning rate is 1 × 10 −6 . The input length of each timestep is 100, and the output length is 1. If the loss does not decrease within 10 epochs, the learning rate is dynamically adjusted to be 0.1 times that of before.

Network Training
As shown in Figure 4a, three-layer infinite medium is set up to simulate air, cement, and limestone from top to bottom. The relative permittivity of cement and limestone are ε r2 = 6 and ε r3 = 9, respectively. The cement layer thickness is d 1 = 0.4 m. The top air layer and the bottom limestone layer are both infinite. A PEC cube object with a 0.2 m edge length is partially buried between the cement layer and the limestone layer. The depth from the top of the cube to the ground is d = 0.3 m. The distance between the transmitting and receiving antennas is L = 0.4 m, and the height above the ground is h = 0.4 m. The two antennas are vertically located on both sides of the survey line and move synchronously along the survey line of the red arrow. The signal source is a ricker wave with center frequency of 200 MHz, time sampling interval of 0.0385 ns. The spatial sampling interval along the survey line is 0.04 m. A 3D-FDTD simulation tool [19] is used to generate the A-scan data. In total, 95 channels of A-Scan echoes, each with 780 samples, are accumulated along survey line to form the B-Scan image, as shown in Figure 4b. In order to eliminate the signal similarity of dense sampling along the survey line, we selected 30 channels of data at equal intervals as the data set. From these data, 18 channels of A-scan echoes are randomly selected for the training set, 6 channels are for the validation set, and another 6 channels are for the testing set. randomly selected for the training set, 6 channels are for the validation set, and another 6 channels are for the testing set.  The 18 channels of A-scan echoes of the training set are randomly connected end-to end to construct a longer sequence with 780 × 18 = 14,040 samples as the input sequence. In each step of network training, the input length is 100, which means the first 100 samples of the input sequence are used to predict the next sample. The time window of the input data moves backward step by step. The network uses the mean square error (MSE) as the loss function. Figure 5 shows the loss during the training process. The loss of the one-layer network decreases slowly, and the final loss is much greater than the loss of the two-layer network. The loss of the three-layer network drops quickly, but it appears a slight over-fitting has resulted in poor prediction results. The two-layer network converges after the 180 steps, with the loss less than 0.001. Figure 6 compares the predicted wavelet. It can be seen that the red line predicted by the two-layer network is most similar to the ideal wavelet, which is the echo when there is no target buried in the ground. So, the two-layer LSTM network is superior in training time and performance and will be used for wavelet prediction in the following.  The 18 channels of A-scan echoes of the training set are randomly connected end-to end to construct a longer sequence with 780 × 18 = 14,040 samples as the input sequence. In each step of network training, the input length is 100, which means the first 100 samples of the input sequence are used to predict the next sample. The time window of the input data moves backward step by step. The network uses the mean square error (MSE) as the loss function. Figure 5 shows the loss during the training process. The loss of the one-layer network decreases slowly, and the final loss is much greater than the loss of the two-layer network. The loss of the three-layer network drops quickly, but it appears a slight over-fitting has resulted in poor prediction results. The two-layer network converges after the 180 steps, with the loss less than 0.001. Figure 6 compares the predicted wavelet. It can be seen that the red line predicted by the two-layer network is most similar to the ideal wavelet, which is the echo when there is no target buried in the ground. So, the two-layer LSTM network is superior in training time and performance and will be used for wavelet prediction in the following. randomly selected for the training set, 6 channels are for the validation set, and anot channels are for the testing set.  The 18 channels of A-scan echoes of the training set are randomly connected en end to construct a longer sequence with 780 × 18 = 14,040 samples as the input sequ In each step of network training, the input length is 100, which means the first 100 sam of the input sequence are used to predict the next sample. The time window of the i data moves backward step by step. The network uses the mean square error (MSE) a loss function. Figure 5 shows the loss during the training process. The loss of the one-layer netw decreases slowly, and the final loss is much greater than the loss of the two-layer netw The loss of the three-layer network drops quickly, but it appears a slight over-fitting resulted in poor prediction results. The two-layer network converges after the 180 s with the loss less than 0.001. Figure 6 compares the predicted wavelet. It can be seen the red line predicted by the two-layer network is most similar to the ideal wavelet, w is the echo when there is no target buried in the ground. So, the two-layer LSTM netw is superior in training time and performance and will be used for wavelet predictio the following.

Network Testing
From the test set, we randomly selected two channels t 780 × 2 = 1560 samples as the network input for the predi the 5th, 32nd, and 48th channels of A-Scan data (correspon 4b, respectively) are used to evaluate the wavelet prediction Position A is far away from the target, and it contains less A-scan data is used as the ground truth of the wavelet. P and the corresponding A-scan echo contains some target the predicted wavelet of green line is consistent with the g The enlarged view in the upper right corner shows that the is non-zero after the 510th sampling point, due to the reflec layer.
Another interesting observation is the wavelet predict sition C, which is just above the PEC target. The strong r much larger than the wavelet of the background. At this tim the very strong target interference, resulting in the failure fore, for the wavelet prediction of the actual measured dat echo containing strong target signal. Amplitude Figure 6. Wavelets predicted by different network layers.

Network Testing
From the test set, we randomly selected two channels to connect into a sequence with 780 × 2 = 1560 samples as the network input for the prediction of wavelet. For example, the 5th, 32nd, and 48th channels of A-Scan data (corresponding to A, B, and C in Figure 4b, respectively) are used to evaluate the wavelet prediction ability of the trained network. Position A is far away from the target, and it contains less target signal. This channel of A-scan data is used as the ground truth of the wavelet. Position B is close to the target, and the corresponding A-scan echo contains some target signals. As shown in Figure 7, the predicted wavelet of green line is consistent with the ground truth of black dot line. The enlarged view in the upper right corner shows that the predicted wavelet amplitude is non-zero after the 510th sampling point, due to the reflected signal of the underground layer.

Network Testing
From the test set, we randomly selected two channels to connect into a seque 780 × 2 = 1560 samples as the network input for the prediction of wavelet. For e the 5th, 32nd, and 48th channels of A-Scan data (corresponding to A, B, and C i 4b, respectively) are used to evaluate the wavelet prediction ability of the trained n Position A is far away from the target, and it contains less target signal. This ch A-scan data is used as the ground truth of the wavelet. Position B is close to th and the corresponding A-scan echo contains some target signals. As shown in F the predicted wavelet of green line is consistent with the ground truth of black The enlarged view in the upper right corner shows that the predicted wavelet am is non-zero after the 510th sampling point, due to the reflected signal of the unde layer.
Another interesting observation is the wavelet prediction with the A-scan da sition C, which is just above the PEC target. The strong reflected signal of the much larger than the wavelet of the background. At this time, the network canno the very strong target interference, resulting in the failure of wavelet prediction fore, for the wavelet prediction of the actual measured data, it is necessary to rem echo containing strong target signal. The effect of wavelet removal and deconvolution strongly depends on the q the wavelet. If the extracted wavelet is incorrect, the direct wave cannot be offset original data, and interference is also induced. The inaccurate wavelet tailing w the resolution of deep detection. In the following, the results of several methods of Another interesting observation is the wavelet prediction with the A-scan data at position C, which is just above the PEC target. The strong reflected signal of the metal is much larger than the wavelet of the background. At this time, the network cannot ignore the very strong target interference, resulting in the failure of wavelet prediction. Therefore, for the wavelet prediction of the actual measured data, it is necessary to remove the echo containing strong target signal.
The effect of wavelet removal and deconvolution strongly depends on the quality of the wavelet. If the extracted wavelet is incorrect, the direct wave cannot be offset with the original data, and interference is also induced. The inaccurate wavelet tailing will affect the resolution of deep detection. In the following, the results of several methods of wavelet removal and deconvolution are compared to evaluate the accuracy of wavelet. Figure 8 shows the results of wavelet removal with different wavelet extraction methods. The above two figures still contain residual direct waves and the layered interference. Although the reference wave method removes the direct wave cleanly, the layered interference is still obvious. The removal of the wavelet predicted by the above LSTM network can effectively remove the direct wave and the layered interference at the same time and improve the signal-to-noise ratio and resolution.
FOR PEER REVIEW Figure 8 shows the results of wavelet removal with different wavelet extraction ods. The above two figures still contain residual direct waves and the layered interfe Although the reference wave method removes the direct wave cleanly, the layered ference is still obvious. The removal of the wavelet predicted by the above LSTM ne can effectively remove the direct wave and the layered interference at the same tim improve the signal-to-noise ratio and resolution. Figure 9 shows the deconvolution results of the wavelets. It can be seen that t convolution with the LSTM-predicted wavelet can effectively compress the wavel ing and the layered signal so that the signal of the deep target is highlighted.   Figure 9 shows the deconvolution results of the wavelets. It can be seen that the deconvolution with the LSTM-predicted wavelet can effectively compress the wavelet tailing and the layered signal so that the signal of the deep target is highlighted.

Other Simulation Scene
According to the convolution echo model of the GPR system, regardless of en mental noise and other interference, the GPR wavelet corresponds to the collecte without a buried target. However, the engineering GPR detection will encounter t lowing problems: (1) The underground layered structure is diverse. The number of and layer thickness in mountainous, rural, and urban areas are very different; (

Other Simulation Scene
According to the convolution echo model of the GPR system, regardless of environmental noise and other interference, the GPR wavelet corresponds to the collected data without a buried target. However, the engineering GPR detection will encounter the following problems: (1) The underground layered structure is diverse. The number of layers and layer thickness in mountainous, rural, and urban areas are very different; (2) The shape and material of the buried targets are different; (3) The background medium of the detection environment is complex, such as clay, sand, gravel, etc. Their dielectric constants are different; and (4) Different radar systems may use different source waveforms. For example, the LTD series GPR of China Research Institute of Radio wave Propagation (CRIRP) uses Ricker, while the lunar or Mars rover GPR often uses the chirp signal. In the following, we set up two groups of simulation experiments to evaluate the feasibility of the above LSTM network method for wavelet prediction in different detection scenes. These simulation scenes are derived from Figure 4 by changing the scene parameters to create eight new models. The 3D-FDTD simulator is used to generate the A-scan and B-scan data. In total, 95 channels of A-Scan echoes, with 780 samples for each channel, are simulated for each new model.

Different Layered Structure
In the first group of experiments, we change the layered structure. We firstly add another layer at different depths to generate two new models, as shown in Figure 10. The network trained above with the original model in Figure 4 does not need to be trained again. We select three channels of A-scan data, corresponding to positions A, B, and C, to test the generalization ability of the LSTM network trained above. Position A is far away from the target, so the corresponding A-scan echo, shown as the black line in Figure 10, is used as the ground truth of the wavelet for the new scene. The yellow line and the green dotted line are the predicted wavelet with the A-scan echo at position B and position C, respectively. It can be seen that the wavelet of the new scene contains a new signal peak at the tail of the echo, which is consistent to the new layer position. Therefore, the proposed network can predict the response of the new layer structure.
again. We select three channels of A-scan data, corresponding to positions A, B, and C, to test the generalization ability of the LSTM network trained above. Position A is far away from the target, so the corresponding A-scan echo, shown as the black line in Figure 10, is used as the ground truth of the wavelet for the new scene. The yellow line and the green dotted line are the predicted wavelet with the A-scan echo at position B and position C, respectively. It can be seen that the wavelet of the new scene contains a new signal peak at the tail of the echo, which is consistent to the new layer position. Therefore, the proposed network can predict the response of the new layer structure. As shown in Figure 11, model 3 only changes the thickness d 1 . According to the layered geometry, the position of the second layer signal will change from the 387th sampling point to the 465th sampling point. Model 4 further changes the dielectric constants of the two layers, and the position of the second layer signal is moved to the 417th sampling point. The LSTM network trained above with the original model is used to predict the wavelet for the new model directly. The prediction results correctly indicate these layered geometries. Therefore, the proposed network can capture the layer thickness and background medium to make accurate predictions quickly.
The predicted wavelets of the new models are used for background elimination. Figure 12 indicates that the predicted wavelets removal can highlight the target information, while the B-scan obtained with other methods still has some layered signal residues. Therefore, the LSTM wavelet prediction network model is suitable for the layered medium model of different layers and thicknesses. point. The LSTM network trained above with the original model is used to predict th wavelet for the new model directly. The prediction results correctly indicate these layered geometries. Therefore, the proposed network can capture the layer thickness and back ground medium to make accurate predictions quickly. The predicted wavelets of the new models are used for background elimination. Fig  ure 12 indicates that the predicted wavelets removal can highlight the target information while the B-scan obtained with other methods still has some layered signal residues Therefore, the LSTM wavelet prediction network model is suitable for the layered me dium model of different layers and thicknesses.

Different Radar System and Underground Medium
In the second group of experiments, we consider the change of multiple parameters including source waveform, background permittivity, target shape, and depth. The lay ered geometry of the background is the same as that of model 1. The original cubic PEC  The predicted wavelets of the new models are used for background elimination. Fig  ure 12 indicates that the predicted wavelets removal can highlight the target information while the B-scan obtained with other methods still has some layered signal residues Therefore, the LSTM wavelet prediction network model is suitable for the layered me dium model of different layers and thicknesses.

Different Radar System and Underground Medium
In the second group of experiments, we consider the change of multiple parameters including source waveform, background permittivity, target shape, and depth. The lay ered geometry of the background is the same as that of model 1. The original cubic PEC

Different Radar System and Underground Medium
In the second group of experiments, we consider the change of multiple parameters, including source waveform, background permittivity, target shape, and depth. The layered geometry of the background is the same as that of model 1. The original cubic PEC target is replaced by an infinite air cylinder with radius R. The simulation parameters are listed in Table 1. Figure 13 shows the prediction results of models 5-8. The blue line is the wavelet of the original model in Figure 4. The black line is the A-Scan without the target signal, and it is the ground truth of the wavelet for the new model, which is quite different from the original model. The network is directly used to predict the wavelet for the new models, without retraining. The red line is the predicted wavelet with the A-can echo at position B of the derived model. Although multiple parameters, such as the layered geometry, medium parameters, target information, and excitation waveform, are all different from the original model in Figure 4, the LSTM network still can predict the accurate wavelets.  Figure 13 shows the prediction results of models 5-8. The blue line is the wavelet of the original model in Figure 4. The black line is the A-Scan without the target signal, and it is the ground truth of the wavelet for the new model, which is quite different from the original model. The network is directly used to predict the wavelet for the new models, without retraining. The red line is the predicted wavelet with the A-can echo at position B of the derived model. Although multiple parameters, such as the layered geometry, medium parameters, target information, and excitation waveform, are all different from the original model in Figure 4, the LSTM network still can predict the accurate wavelets.  Figure 13. The predicted wavelet of derived models 5-8.

Wavelet Prediction of Measured Data
In order to explore the practicability of this method, a group of measured data are used to test the performance of the above method for wavelet extraction. Several landmines are buried in red clay and clay scenes for experiments, and the LTD series GPR of China Research Institute of Radio wave Propagation (CRIRP) is used to collect echo data. Red clay is a soil with high water content formed by carbonate weathering and has a large relative permittivity εr = 12. The water content of the clay is low, and the relative permittivity εr = 12 is low. The parameters of red clay and clay are listed in Table 2. There are 893 and 692 channels of A-scan data from red clay and clay scenes, respectively, each with 1024 sampling points, as shown in Figure 14. Different background media cause different attenuation of EM waves in two scenarios. Therefore, the echo data are significantly different.

Wavelet Prediction of Measured Data
In order to explore the practicability of this method, a group of measured data are used to test the performance of the above method for wavelet extraction. Several landmines are buried in red clay and clay scenes for experiments, and the LTD series GPR of China Research Institute of Radio wave Propagation (CRIRP) is used to collect echo data. Red clay is a soil with high water content formed by carbonate weathering and has a large relative permittivity ε r = 12. The water content of the clay is low, and the relative permittivity ε r = 12 is low. The parameters of red clay and clay are listed in Table 2. There are 893 and 692 channels of A-scan data from red clay and clay scenes, respectively, each with 1024 sampling points, as shown in Figure 14. Different background media cause different attenuation of EM waves in two scenarios. Therefore, the echo data are significantly different. use the network trained with the simulation data to predict the wavelet of the actual measurement data, you must accurately set the antenna parameters in the simulation model. However, it is difficult to measure the system parameters of the actual detection equipment accurately. Moreover, system components, noise, and the internal interference is difficult to be reproduced ideally. According to the learning characteristics of LSTM network for sequence correlation, we try to train the network directly with the measured data. A total of 38 channels of A-scan data of the red clay detection are randomly selected to splice in sequence as the training set. The training process is the same as above and ends when the loss is less than 0.0001. Then, another 12 different channels of A-scan data are randomly selected to predict the wavelet, as shown in Figure 15. It can be seen that the network can well predict the wavelet whether the echo contains the target signal or not. The strong signal at the 200-th track of the blue line corresponds to the reflected signal from the mine. The LSTM network can separate the wavelet from the target signal and environmental interference. Then, the trained network with red clay data is directly used to predict the wavelet of the clay scene, as shown in Figure 15b. The predicted wavelet is close to the A-Scan without the target signal, which is the ground truth of the wavelet for the clay background.

Conclusions
This paper presented a wavelet prediction method based on the LSTM network. This method takes advantage of the strong correlation of the GPR signals to construct quasiperiodic input signal by splicing the data head and tail. The network is trained to learn the commonality of the input data, while ignoring the random interference and predicting a smooth wavelet. The antenna parameters of the radar system are different from the simulation model. The radar system parameters often affect the wavelet prediction network. If you want to use the network trained with the simulation data to predict the wavelet of the actual measurement data, you must accurately set the antenna parameters in the simulation model. However, it is difficult to measure the system parameters of the actual detection equipment accurately. Moreover, system components, noise, and the internal interference is difficult to be reproduced ideally. According to the learning characteristics of LSTM network for sequence correlation, we try to train the network directly with the measured data.
A total of 38 channels of A-scan data of the red clay detection are randomly selected to splice in sequence as the training set. The training process is the same as above and ends when the loss is less than 0.0001. Then, another 12 different channels of A-scan data are randomly selected to predict the wavelet, as shown in Figure 15. It can be seen that the network can well predict the wavelet whether the echo contains the target signal or not. The strong signal at the 200-th track of the blue line corresponds to the reflected signal from the mine. The LSTM network can separate the wavelet from the target signal and environmental interference. Then, the trained network with red clay data is directly used to predict the wavelet of the clay scene, as shown in Figure 15b. The predicted wavelet is close to the A-Scan without the target signal, which is the ground truth of the wavelet for the clay background.

Clay
20% 4 The antenna parameters of the radar system are different from the simulation model. The radar system parameters often affect the wavelet prediction network. If you want to use the network trained with the simulation data to predict the wavelet of the actual measurement data, you must accurately set the antenna parameters in the simulation model. However, it is difficult to measure the system parameters of the actual detection equipment accurately. Moreover, system components, noise, and the internal interference is difficult to be reproduced ideally. According to the learning characteristics of LSTM network for sequence correlation, we try to train the network directly with the measured data. A total of 38 channels of A-scan data of the red clay detection are randomly selected to splice in sequence as the training set. The training process is the same as above and ends when the loss is less than 0.0001. Then, another 12 different channels of A-scan data are randomly selected to predict the wavelet, as shown in Figure 15. It can be seen that the network can well predict the wavelet whether the echo contains the target signal or not. The strong signal at the 200-th track of the blue line corresponds to the reflected signal from the mine. The LSTM network can separate the wavelet from the target signal and environmental interference. Then, the trained network with red clay data is directly used to predict the wavelet of the clay scene, as shown in Figure 15b. The predicted wavelet is close to the A-Scan without the target signal, which is the ground truth of the wavelet for the clay background.

Conclusions
This paper presented a wavelet prediction method based on the LSTM network. This method takes advantage of the strong correlation of the GPR signals to construct quasiperiodic input signal by splicing the data head and tail. The network is trained to learn the commonality of the input data, while ignoring the random interference and predicting a smooth wavelet.

Conclusions
This paper presented a wavelet prediction method based on the LSTM network. This method takes advantage of the strong correlation of the GPR signals to construct quasiperiodic input signal by splicing the data head and tail. The network is trained to learn the commonality of the input data, while ignoring the random interference and predicting a smooth wavelet.
Several groups of experiments (corresponding to different excitation source waveform, target location, shape, material, background geometry, etc.) show the generalization ability of the network for different detection scenarios. As long as the antenna system parameters remain unchanged, the network trained with the simulation data of one arbitrary scene can be used to predict the wavelet of many different scenes. In order to expand the applicability of the network to different media, the network is optimized by expanding the range of dielectric parameters of the simulation model. For the application of actual detection data, we directly use a small amount of measured data for network training, and then use the trained network for detection data in other different scenes. Compared with the wavelet extracted by SVD, the predicted wavelet has obvious advantages in the integrity and adaptability of the detection area, and the method can be used in large-scale underground exploration projects such as pipeline detection under urban roads, defect detection inside a tunnel, and so on.
This method has many advantages, as follows: 1. This method does not rely on prior knowledge and can effectively extract the wavelets of different scenes.

2.
There is no need for artificial marking during network training. The A-Scan echo can be directly used as training data, and the input data is easy to obtain. 3.
The trained network has good generalization ability and can solve many practical problems, such as heavy marking of a large-scale detection area, the inability to label special detection environments, and poor processing results caused by inaccurate calibration.
The proposed neural network method has very strong generalization ability for wavelet prediction of the same antenna system. However, if the antenna system is changed, the network must be retrained. This issue needs to be further studied in the future. The probable approach is to conduct a large number of network training experiments by controlling a single variable or multiple variables, such as antenna system parameters or the environment parameters. The large number of detection data will make the network learn its internal connections. Moreover, we will also try to use the Transform network structure to predict.