Full-Waveform Inversion of Time-Lapse Crosshole GPR Data Using Markov Chain Monte Carlo Method

: Crosshole ground-penetrating radar (GPR) is an important tool for a wide range of geoscientiﬁc and engineering investigations, and the Markov chain Monte Carlo (MCMC) method is a heuristic global optimization method that can be used to solve the inversion problem. In this paper, we use time-lapse GPR full-waveform data to invert the dielectric permittivity. An inversion based on the MCMC method does not rely on an accurate initial model and can introduce any complex prior information. Time-lapse ground-penetrating radar has great potential to monitor the properties of a subsurface. For the time-lapse inversion, we used the double difference method to invert the time-lapse target area accurately and full-waveform data. We propose a local sampling strategy taking advantage of the a priori information in the Monte Carlo method, which can sample only the target area with a sequential Gibbs sampler. This method reduces the calculation and improves the inversion accuracy of the target area. We have provided inversion results of the synthetic time-lapse waveform data that show that the proposed method signiﬁcantly improves accuracy in the target area.


Introduction
Crosshole ground-penetrating radar is widely used in geoscientific and engineering investigations (e.g., identify cracks, analysis moisture) [1,2]. It is a popular tool for mapping subsurface electrical properties, especially electrical conductivity and dielectric permittivity, which are closely related to the underground characteristics of environments (e.g., water content) [3][4][5][6]. It has also shown great potential for mapping and monitoring time-lapse changes, which improves our understanding of dynamic processes [7][8][9]. This technique uses transmitting radar antennas to generate high-frequency electromagnetic energy in a borehole to acquire crosshole radar data.
To invert GPR data, both deterministic inversion algorithms and probabilistic inversion methods have been proposed [10][11][12][13][14][15]. Most deterministic inversion methods depend on an accurate prior model. In contrast, probabilistic inversion methods in which prior information is based on probability statistics reduce dependence on an accurate prior model. In a Bayesian formulation, the solution of inverse problem is a posteriori probability density. The extended Metropolis algorithm [16] is proposed to sample the posteriori probability density which can solve highly nonlinear inverse problems. Hansen et al. proposed a sequential Gibbs sampler to sample a priori information defined by any geostatistical algorithm [17,18]. Sequential Gibbs sampling could serve as a black box algorithm which performs a random walk according to a priori information in the extended Metropolis algorithm [19]. In this way, for choosing a priori model, the extended Metropolis algorithm becomes very flexible [20][21][22][23]. Regarding complex statistical models, Remy et al. give various examples [24].
According to the forward assumptions, conventional ray tomography has weaknesses, including low resolution, as a result of utilizing only a small portion of the signal information to solve the inversion problems [25,26]. Ernst applied the waveform inversion (FWI) method to improve the resolution by using all the received signals [4]. FWI was tested on synthetic crosshole data, and proved successful for characterizing a gravel aquifer [27]. Qin proposed two-stage Bayesian inversion to decrease the computational cost using crosshole GPR data [15]. Cordua implemented the first inversion example, which used full-waveform data to get a solution of a posteriori probability density [28]. Here, we study full-waveform time-lapse GPR data to image changes in dielectric permittivity, which are believed to indicate moisture content.
There are two time-lapse inversion strategies for performing the time-lapse inversion: a sequential difference strategy and a double difference strategy [29,30]. The main difference between the two strategies is that the double difference strategy reduces the influence of the non-target area and improves the inversion accuracy of the target area. In this work, we use the double difference strategy combined with Monte Carlo inversion. For the time-lapse GPR data, this requires at least two inversion calculations of the two sets of data at different observation times. These two inversions cost much calculation time, and at the second inversion, the non-target region will also affect the accuracy of target region. To solve this problem, we use the Monte Carlo inversion to achieve local sampling. In this way, the reduction of the sampling area eliminates the impact of the non-target area and improves computational efficiency.
In this paper, we use MCMC method combined with double difference strategy to realize time-lapse GPR inversion. Taking the advantage of a priori information in the MCMC method, we achieve local sampling in time-lapse inversion. In this way, we can only sample the target area which eliminates the impact of the non-target area and reduce the calculations. The paper is organized as follows: Firstly, we introduce the formula of the probabilistic inversion method and the forward method based on the waveform. Subsequently, we present the implementation of local sampling using an extended Metropolis algorithm. Further, we briefly introduce double difference as a timelapse inversion strategy. Finally, we present an analysis of results from synthetic data using local compared with full sampling.

Probabilistically Formulated Inversion
For geophysical inverse problems, a set of parameters m is used to describe the subsurface; the observed data can be represented by a dataset d. The forward problem refers to the use of a function f to obtain d where the function f is solved by a physical relation. The corresponding inverse problem can be expressed as The main difficulty of the inversion is the inverse operator f −1 , which is non-trivial to obtain or does not exist. Furthermore, the forward operator f is based on an approximation to the correct physical relation. In this paper, the model m represents a relative dielectric permittivity of the subsurface, and the data are the waveform data between boreholes. The inverse problem uses a priori information to obtain the model parameters. A probabilistic approach to solving inverse problems can be formulated [10].
where σ M (m) is a posteriori probability distribution that is the solution of the inverse problem. k is a normalization factor. A priori probability density ρ M (m) represents the prior information of the model parameters. L(m) is the likelihood function, which is a probabilistic measure of how well the data can match a given the model of data uncertainty. Its general formula is given by where ρ D (g(m)) represent measurement uncertainties. It is mainly related to the instrument where the data is recorded. θ(d|m) represents the modeling error due to the defective forward method or an defective parameterization. µ D (d) describes the homogeneous state of information. It ensures that when the coordinate system changes, parameterization is invariant. In most cases, µ D (d) can be assumed to a constant.
In this research, we mainly consider the data uncertainties and we do a perfect assume about other errors. The particular likelihood function as follows: where g(m) k and d k obs are vectors that contain the simulated and observed waveform traces related to the kth transmitter-receiver pair. K is the total number of waveform traces (i.e., transmitter-receiver pairs). The factor c is a normalization constant. Where C D is the covariance matrix that describes the variances and covariances of the data uncertainty. It is the Gaussian-distributed data uncertainties that are added to the waveform data in Synthetic examples.

Forward Model Based on Waveform
There are several types of forward model to simulate the GPR signal's wavefield. In the forward model of our time-lapse inversion scheme, we used a 2D FDTD solution of Maxwell's equations in transverse electric mode. The forward data of this FDTD method is the vertical component of the electrical field and its conceptual simplicity.
Using Cartesian coordinates for wave propagation in the (x, z) plane, the transverse electric or TE mode of Maxwell's equations are as follows [31]: In the formula, E x and E z represent the horizontal and vertical components of the electric field, respectively. H y is the magnetic field perpendicular to the propagation plane, σ is the electrical conductivity, ε is the dielectric permittivity, and µ represents the magnetic permeability which is equivalent to the free-space permeability assuming to be constant in the following. A generalized perfectly matched layer (GPML) surrounding the edges of the FDTD grid absorbs the artificial reflections at the edges of the model space.
We used FDTD techniques [31,32] based on staggered-grid finite-difference operators that are second-order accurate in both space and time to solve Equations (6a)-(6c). In near-surface, electromagnetic signals are mainly affected by dielectric permittivity and electrical conductivity. Permittivity and conductivity primarily control the phases and the amplitudes of the observed signals, respectively. In this study, we consider only the influence of the dielectric permittivity. For all examples presented in this paper, we inverted for permittivity while keeping the conductivities fixed [33].

Local Sampling Based on Extended Metropolis Algorithm
In most cases, geophysical inversion problems are non-linear and require non-Gaussian statistics to describe. We require an algorithm that can use the priori probability density without an explicit expression. The extended Metropolis algorithm can solve this by using sequential Gibbs sampling to serve as a black box algorithm. Sequential Gibbs sampling can perform a random walk in the a priori probability density [16]. There are two randomized steps constitute the extended Metropolis algorithm: (1) Giving a current model m cur , it conforms the a priori probability density. Generating a perturbation in the current model m cur , we get a candidate model, m pro . (2) Decide whether to accept the proposed model m pro , the probability of acceptance is measured by the ratio of likelihood function: where L(m cur ) and L m pro are the likelihood evaluated of the current model and the proposed model, respectively. If accepted, the proposed model instead of the current model. Therefore, a realization of the posteriori probability density is completed. Otherwise, we reject the proposed model and the current model is counted again. For time-lapse inversion, two inversions are required; we mainly focus on the change in the target area. The ideal method would be only to invert the target area in the second inversion. However, this is difficult to achieve. Instead of local inversion, we can take advantage of the a priori information of the MCMC method to implement local sampling. The prior information of the MCMC method is a large number of model distributions obtained by sampling. Thus, in the second inversion, we can sample only the target area to ensure that changes occur only in the target range, reducing the influence of the non-target area.
To perform the local sampling Monte Carlo inversion, we used a sequential Gibbs sampler to sample ρ M (m) directly, as part of the extended Metropolis algorithm. The flow of the second inversion was as follows:

1.
Starting in the current model, m cur , which is the result of the first inversion, the range size of the target area is ∇m loc . A new model candidate, m pro , which samples in the ∇m loc only using the sequential Gibbs sampler.

2.
The proposed model is accepted with probability P acc = min(1, L(m pro ) L(m cur ) ).

3.
If m pro is accepted, we use m pro instead of m cur . Consequently, the proposed model takes the place of the current model, m cur = m pro . Otherwise, the random walker stays at a location in m cur , and m cur is counted again.

Time-Lapse Inversion-Double Difference Strategy
There are two widely used time-lapse inversion strategies: sequential difference strategies and double difference strategies. The main difference between two methods is the setting of the initial model and the forward data processing before inversion. We will briefly introduce the two methods. In time-lapse inversion, at different times T 1 ,T 2 , we obtain two corresponding observational data d obs1 ,d obs2 . The target models m T1 and m T2 are the solution of GPR inversion at time T 1 ,T 2 . After completing the time-lapse inversion, the change region can be obtained by subtracting m T1 from m T2 .
The sequential difference strategy uses m T1 as the initial model to obtain m T2 . As the change in the model is localized and only occurs in a small region, starting from the model m T1 is a good candidate for the time-lapse inversion and can reduce the computation cost [34,35].
To improve the accuracy of the inversion results in the change region, Waldhauser and Ellsworth proposed the double difference strategy, which improves on the sequential difference strategy [29]. Compared with the sequential difference strategy, the main difference in the double difference strategy is using the m T1 to get forward data d * obs1 instead of d obs1 , which reduces the influence of the non-target region. In this paper, we for the most part use the double difference strategy to solve the time-lapse inversion. Figure 1 shows schematic diagrams of the double difference method combing a localsampling Monte Carlo inversion.
Remote Sens. 2021, 13, x FOR PEER REVIEW 5 of 12 model is a good candidate for the time-lapse inversion and can reduce the computation cost [34,35].
To improve the accuracy of the inversion results in the change region, Waldhauser and Ellsworth proposed the double difference strategy, which improves on the sequential difference strategy [29]. Compared with the sequential difference strategy, the main difference in the double difference strategy is using the to get forward data * instead of , which reduces the influence of the non-target region. In this paper, we for the most part use the double difference strategy to solve the time-lapse inversion. Figure 1 shows schematic diagrams of the double difference method combing a localsampling Monte Carlo inversion.

Synthetic Examples
To test the effectiveness of the method, we simulate synthetic GPR data in Figure 2. A reference model , of size 5.2 × 12 m (52 × 120 = 6240 pixels of size 0.1 m × 0.1 m), was generated from a multivariate Gaussian probability distribution. The electrical conductivity of the synthetic reference model was set to a constant value of 3 mS/m. The mean of the relative dielectric permittivity is 3.97 and the variance is 0.75. Figure 2 shows the recording geometry of the ground-penetrating radar (GPR) cross borehole. There are four transmitters represented by red crosses and each borehole contains two transmitters (one at 3 m and one at 9 m). Black dots are receivers equally spaced in two boreholes (the interval is 1.5 m). Due to the effects of wave guiding [36], the travel path is not between the center of the antenna, but the tips of the antenna [37]. We omitted

Synthetic Examples
To test the effectiveness of the method, we simulate synthetic GPR data m T1 in Figure 2. A reference model m T1 , of size 5.2 × 12 m (52 × 120 = 6240 pixels of size 0.1 m × 0.1 m), was generated from a multivariate Gaussian probability distribution. The electrical conductivity of the synthetic reference model was set to a constant value of 3 mS/m. The mean of the relative dielectric permittivity ε r is 3.97 and the variance is 0.75.
Remote Sens. 2021, 13, x FOR PEER REVIEW 6 of 12 data where the angle between a transmitter-receiver and horizontal is larger than 45°. These are similar parameters to those in Cordua et al. [28]. We used the FDTD algorithm to calculate a full-waveform synthetic dataset. The FDTD model with a regular grid consisting of square cells of 0.1 × 0.1 m to solve Maxwell's equations and the number of boundary cells is 40. The source pulse is a ricker wavelet and the central frequency of the ricker wavelet is 100 MHz. In the synthetic data, we assume source pulse is known. However, it should be noted that the source wavelet needs to be estimated for the real data. Some estimation methods can solve this problem [38,39].
The vertical component of the electrical field is the recorded synthetic observational data which contains a total of 20 waveforms traces. Gaussian-distributed data uncertainties are added to the waveform data. The average signal-to-noise ratio is set to 25. A transmitter at depths of 3 m in the left borehole generates five waveform traces as Figure 3 shows. Blue dotted curves and red curves represent noise-free waveforms and noisy waveforms, respectively. The signal-to-noise ratios from the top are 8, 15, 24, 32 and 40, respectively. As can be seen from picture 3, the matching of waveforms from top to bottom gradually becomes better between the Noise-free waveforms and noisy waveforms.   Figure 2 shows the recording geometry of the ground-penetrating radar (GPR) cross borehole. There are four transmitters represented by red crosses and each borehole contains two transmitters (one at 3 m and one at 9 m). Black dots are receivers equally spaced in two boreholes (the interval is 1.5 m). Due to the effects of wave guiding [36], the travel path is not between the center of the antenna, but the tips of the antenna [37]. We omitted data where the angle between a transmitter-receiver and horizontal is larger than 45 • . These are similar parameters to those in Cordua et al. [28].
We used the FDTD algorithm to calculate a full-waveform synthetic dataset. The FDTD model with a regular grid consisting of square cells of 0.1 × 0.1 m to solve Maxwell's equations and the number of boundary cells is 40. The source pulse is a ricker wavelet and the central frequency of the ricker wavelet is 100 MHz. In the synthetic data, we assume source pulse is known. However, it should be noted that the source wavelet needs to be estimated for the real data. Some estimation methods can solve this problem [38,39].
The vertical component of the electrical field is the recorded synthetic observational data which contains a total of 20 waveforms traces. Gaussian-distributed data uncertainties are added to the waveform data. The average signal-to-noise ratio is set to 25. A transmitter at depths of 3 m in the left borehole generates five waveform traces as Figure 3 shows. Blue dotted curves and red curves represent noise-free waveforms and noisy waveforms, respectively. The signal-to-noise ratios from the top are 8, 15, 24, 32 and 40, respectively. As can be seen from picture 3, the matching of waveforms from top to bottom gradually becomes better between the Noise-free waveforms and noisy waveforms.
For the time-lapse inversion, m T1 represents the initial model at time T 1 . We designed a perturbation in initial model m T1 to obtain the monitor model m T2 , which simulated the monitor model at observational time T 2 . Compared with m T1 , there was a high region of the relative dielectric permittivity near the surface in m T2 . m T1 , m T2 and the perturbation as shown in Figure 4.
Before the inversion, we first needed to sample the priori probability density that generated the reference model m T1 . Figure 5 shows five sampling models generated from the priori probability density; each of them is different.
ties are added to the waveform data. The average signal-to-noise ratio is set to 25. A transmitter at depths of 3 m in the left borehole generates five waveform traces as Figure 3 shows. Blue dotted curves and red curves represent noise-free waveforms and noisy waveforms, respectively. The signal-to-noise ratios from the top are 8, 15, 24, 32 and 40, respectively. As can be seen from picture 3, the matching of waveforms from top to bottom gradually becomes better between the Noise-free waveforms and noisy waveforms.  For the time-lapse inversion, represents the initial model at time . We designed a perturbation in initial model to obtain the monitor model , which simulated the monitor model at observational time . Compared with , there was a high region of the relative dielectric permittivity near the surface in . , and the perturbation as shown in Figure 4. Before the inversion, we first needed to sample the priori probability density that generated the reference model . Figure 5 shows five sampling models generated from the priori probability density; each of them is different. We used the extended Metropolis algorithm to obtain an a posteriori probability density. Figure 6 shows five inversion models that are generated from the posteriori probability density at . We can see that the extended Metropolis algorithm obtained an effec- For the time-lapse inversion, represents the initial model at time . We designed a perturbation in initial model to obtain the monitor model , which simulated the monitor model at observational time . Compared with , there was a high region of the relative dielectric permittivity near the surface in . , and the perturbation as shown in Figure 4. Before the inversion, we first needed to sample the priori probability density that generated the reference model . Figure 5 shows five sampling models generated from the priori probability density; each of them is different. We used the extended Metropolis algorithm to obtain an a posteriori probability density. Figure 6 shows five inversion models that are generated from the posteriori probability density at . We can see that the extended Metropolis algorithm obtained an effective inversion result. The inversion models contain the main features of . It is clear that relatively high dielectric permittivity structures are located at the bottom of the model and lower dielectric permittivity structures stay at the top of the model. In addition, Fig-Figure 5. Five sampling models of the priori probability density.
We used the extended Metropolis algorithm to obtain an a posteriori probability density. Figure 6 shows five inversion models that are generated from the posteriori probability density at T 1 . We can see that the extended Metropolis algorithm obtained an effective inversion result. The inversion models contain the main features of m T1 . It is clear that relatively high dielectric permittivity structures are located at the bottom of the model and lower dielectric permittivity structures stay at the top of the model. In addition, Figure 7 shows the five waveform traces of a transmitter at depths of 3 m in the left borehole simulated by an a posteriori model. The simulated waveforms are consistent with observed waveforms.  In the double difference strategy, to invert the mode , we selected a result of a posteriori sample as an initial model for the second inversion. In addition, we used the initial model to calculate forward data instead of observational data in the second inversion. Unlike the full sampling method in the first inversion, we used the sequential Gibbs algorithm to sample only the target area in the second inversion. In order to compare the inversion effect of local sampling, we set up a group of full sampling models for comparison. The results are shown in Figure 8. They clearly demonstrate that the local sampling performs better in the target area.  In the double difference strategy, to invert the mode , we selected a result of a posteriori sample as an initial model for the second inversion. In addition, we used the initial model to calculate forward data instead of observational data in the second inversion. Unlike the full sampling method in the first inversion, we used the sequential Gibbs algorithm to sample only the target area in the second inversion. In order to compare the inversion effect of local sampling, we set up a group of full sampling models for comparison. The results are shown in Figure 8. They clearly demonstrate that the local sampling performs better in the target area. In the double difference strategy, to invert the mode m T2 , we selected a result of a posteriori sample as an initial model for the second inversion. In addition, we used the initial model to calculate forward data instead of observational data in the second inversion. Unlike the full sampling method in the first inversion, we used the sequential Gibbs algorithm to sample only the target area in the second inversion. In order to compare the inversion effect of local sampling, we set up a group of full sampling models for comparison. The results are shown in Figure 8. They clearly demonstrate that the local sampling performs better in the target area. Remote Sens. 2021, 13, x FOR PEER REVIEW 9 of 12 In order to analyze the disturbance location from the inversion results, we enlarged the target area in Figure 9. The first picture in Figure 9b is the designed perturbation model. It shows that Figure 9a is far from the designed perturbation model, while Figure  9b reflects the characteristics of the designed perturbation model. Furthermore, we made In order to analyze the disturbance location from the inversion results, we enlarged the target area in Figure 9. The first picture in Figure 9b is the designed perturbation model. It shows that Figure 9a is far from the designed perturbation model, while Figure 9b reflects the characteristics of the designed perturbation model. Furthermore, we made a statistical analysis of the target area; the histogram of the statistical data is shown in Figure 10; the specific mean and variance are shown in Table 1.
Remote Sens. 2021, 13, x FOR PEER REVIEW 10 of 12 a statistical analysis of the target area; the histogram of the statistical data is shown in Figure 10; the specific mean and variance are shown in Table 1.    Figure 10a-c is the distribution of the designed perturbation, all sampling, and local sampling, respectively. It is obvious that the local sampling histogram is more similar to the design perturbation model. From Table 1, the mean and variance of local sampling is closer to the design perturbation model. It proves the effectiveness of the local sampling MCMC method.

Conclusions
This paper proposes a general framework for time-lapse inversion based on the MCMC method, which combines the extended Metropolis algorithm with a double difference strategy. In the double difference strategy, the result of the first inversion is taken as the initial model and forward data are used instead of observational data, using fullwaveform data to invert the relative dielectric permittivity based on Maxwell's equations. The waveform data are able to infer the model parameters effectively. In the extended Metropolis algorithm, sequential Gibbs sampling is used as a black box algorithm for Remote Sens. 2021, 13, x FOR PEER REVIEW 10 of a statistical analysis of the target area; the histogram of the statistical data is shown Figure 10; the specific mean and variance are shown in Table 1.    Figure 10a-c is the distribution of the designed perturbation, all sampling, and loc sampling, respectively. It is obvious that the local sampling histogram is more similar the design perturbation model. From Table 1, the mean and variance of local sampling closer to the design perturbation model. It proves the effectiveness of the local samplin MCMC method.

Conclusions
This paper proposes a general framework for time-lapse inversion based on t MCMC method, which combines the extended Metropolis algorithm with a double d ference strategy. In the double difference strategy, the result of the first inversion is tak as the initial model and forward data are used instead of observational data, using fu waveform data to invert the relative dielectric permittivity based on Maxwell's equation The waveform data are able to infer the model parameters effectively. In the extend Metropolis algorithm, sequential Gibbs sampling is used as a black box algorithm f   Figure 10a-c is the distribution of the designed perturbation, all sampling, and local sampling, respectively. It is obvious that the local sampling histogram is more similar to the design perturbation model. From Table 1, the mean and variance of local sampling is closer to the design perturbation model. It proves the effectiveness of the local sampling MCMC method.

Conclusions
This paper proposes a general framework for time-lapse inversion based on the MCMC method, which combines the extended Metropolis algorithm with a double difference strategy. In the double difference strategy, the result of the first inversion is taken as the initial model and forward data are used instead of observational data, using fullwaveform data to invert the relative dielectric permittivity based on Maxwell's equations. The waveform data are able to infer the model parameters effectively. In the extended Metropolis algorithm, sequential Gibbs sampling is used as a black box algorithm for sampling a priori probability density. This method can implement local sampling, which only samples the target area, reducing the influence of the non-target area. This makes the inversion result more accurate. The synthetic time-lapse GPR shows that, compared with full sampling, local sampling can obviously improve the resolution of the target region. It should be noted that we do a perfect assume about measured error and modeling error. We mainly consider the data uncertainties. It is a limitation of our current method.

Conflicts of Interest:
The authors declare no conflict of interest.