Improved Generalized Cross-Validation and Unbiased Predictive Risk Estimator Methods Using the RGSVD: Application to Inversion of Potential Field Data

: The inversion of potential ﬁeld data has widely utilized the generalized cross-validation (GCV) and the unbiased predictive risk estimator (UPRE) methods to determine the regularization parameter. However, these two methods are time-consuming and it is difﬁcult for them to determine the optimal linear search range including the optimal regularization. To solve these problems, this article improves the GCV and UPRE methods using the RGSVD (randomized generalized singular value decomposition) algorithm. The improved methods ﬁrst use the randomized algorithm to compute an approximate generalized singular value decomposition (GSVD) with less computational time. Then, the optimal linear search range is determined based on the generalized singular values. Finally, the GCV and the UPRE functions are efﬁciently computed on the basis of the results from the RGSVD algorithm. In this way, the GCV and UPRE methods using the RGSVD algorithm are able to determine the optimal regularization parameter fast and effectively. One comparative test shows the effectiveness and efﬁciency of the GCV and the UPRE methods using the RGSVD algorithm.


Introduction
Various geological and geophysical problems can be solved by the inversion of potential field data [1,2]. Generally, the inversion of potential field data is an ill-posed problem, which means that this inversion is usually non-unique and unstable [3]. Tikhonov regularization [4] can solve ill-posed problems. Estimating an optimal regularization parameter is very significant for the Tikhonov regularization, e.g., [5]. In the literature, many methods have been introduced to determine an optimal regularization parameter, such as the Morozov discrepancy principle (MDP) method [6], the generalized cross-validation (GCV) method [7], the L-curve (LC) method [8], and the unbiased predictive risk estimator (UPRE) method [9].
Here, we focus on the GCV and the UPRE methods. The conjugate gradient (CG) method and generalized singular value decomposition (GSVD) can be used to compute the GCV and the UPRE functions in a linear search range to find the optimal regularization parameter. However, when using the CG method, the process is time-consuming, and it is difficult to determine the optimal linear search range including the optimal regularization parameter. When using the GSVD, the optimal linear search range can be determined by analyzing the spectrum (generalized singular values) of the kernel matrix [9], but computational costs and memory requirements limit the application of this method.
In this paper, the RGSVD [10] algorithm was adopted in the GCV and the UPRE methods for determination of the optimal regularization parameter. The RGSVD algorithm uses a randomized algorithm to compute an approximation of the GSVD with less memory requirements and computing time [10], with which the optimal linear search range can be determined based on the generalized singular values. The result from the RGSVD facilitates efficient computation of the GCV and UPRE functions. Therefore, the regularization parameter can be determined fast and effectively by the improved GCV and UPRE methods using the RGSVD algorithm. One comparative test demonstrated the performances of the GCV and the UPRE methods using the RGSVD algorithm.

Inversion Methodology
Generally, the model domain is discretized into many cells, whose physical properties are the model parameters. According to the linear relationship between model parameters and measured data, the inverse problem of potential field data has the matrix form as where d obs ∈ R n represents the measured data vector, m ∈ R m is the unknown parameters, and G ∈ R n×m represents the kernel matrix. n represents the number of measured data, and m represents the number of the unknown parameters. In general, m is much larger than n. Therefore, the inversion of potential field data belongs to an under-determined problem. Through the singular value decomposition (SVD) of the matrix G or the least squares solution, Equation (1) can be used to calculate the inversion result directly. However, in this way, the inversion result may not conform to reality. These ill-posed problems can be solved by introducing a regularization term [4]. Then, the objective function of this inversion has the form In Equation (2), W d (Gm − d obs ) 2 2 is the data misfit, W m Zm 2 2 is the regularization term, and µ is the regularization parameter, which is used to balance these two terms. In the data misfit, W d represents a data weighting matrix. In the regularization term, W m ∈ R 4m×m represents a smooth constraint matrix and its matrix form is W m = [W s ; W x ; W y ; W z ], where W s ∈ R m×m , W x ∈ R m×m , W y ∈ R m×m , and W z ∈ R m×m are different component matrices, respectively [11,12]; and Z represents a depth-weighting matrix [11,12].
Generally, the physical bound constraint is incorporated into this inversion for obtaining a geologically plausible inversion result. Then, the inverse problem of potential field data becomes the following constrained minimization problem where m min and m max are vectors consisting of the lower and upper physical bounds on the unknown model values. Because the matrix Z is a diagonal matrix, the following transformations can be performed easily: Then, Equation (2) is rewritten as The constrained minimization problem (3) becomes the following new problem minimize : φ = hy − r 2 2 + µ 2 W m y 2 2 subjectto : y min ≤ y ≤ y max , where y min = Zm min and y max = Zm max are vectors of the lower and upper bounds for y.
The logarithmic barrier method is adopted to incorporate the physical bound constraint into the objective function [13][14][15]. Then, the new objective function has the form where −2λ is the barrier function, and λ is the barrier parameter. The Newton method [13][14][15] is used to solve the minimization of (7) iteratively. The regularization parameter µ is kept fixed during the iterations, and the barrier parameter λ decreases with iteration [13]. At the kth iteration, one step of the Newton method is applied for (7) to yield where is the vector with all entries one, and y (0) is an initial model. The solution ∆y (k) is the search direction at the kth iteration. The strategy of Li and Oldenburg [13] is used to obtain the final inversion result, and Algorithm 1 shows the detailed steps. In step 9, the solution ∆y (k) can be obtained by the preconditioned conjugate gradient (PCG) method, and the preconditioner P (k) has the form where A (k) has the form Algorithm 1. Inversion of potential field data Preparation : d obs , G, W d , m min , m max and K max .

1.
Calculate W m , and Z.
Estimate the regularization parameter µ.

5.
Calculate 11. Exit the loop if the termination criterion is satisfied.
When the number of iterations k reaches the preset maximum number of iterations K max or the change of the objective function is less than 1%, the iterative process ends. The method of estimating the regularization parameter is discussed in Section 3.

Estimation of Regularization Parameter
When estimating the regularization parameter, we only focused on (5) without considering the barrier function. In this way, the regularization parameter is still suitable in the minimization of (7). Here, the GCV [7] and the UPRE [9] methods are used to estimate the regularization parameter.

Generalized Cross-Validation Method
The randomized trace estimation was introduced into the GCV function to solve the difficulty of a trace calculation in the GCV function [13]. Then, this function for Equation (5) has the form where R is a random vector consisting of −1 and 1, each with a probability of 0.5. The parameter µ which minimizes (11) is the optimal regularization parameter µ opt . This parameter µ opt can be determined by line search within a range for regularization parameter, and the process can be solved by using the CG method [13]. However, when using the CG method, an optimal range for regularization parameter is difficult to determine, and sometimes the optimal regularization parameter µ opt is not within the selected range. Moreover, the GCV method using the CG method is time-consuming. The GSVD of the matrix pair [h, W m ] can be used to compute the GCV function [5]. Based on the GSVD, the spectrum of h can be obtained. The optimal linear search range can be determined by analyzing the spectrum of h [9]. However, the calculation of GSVD is time-consuming and has a large memory requirement. When the scale of data is large, the calculation of GSVD is even impractical. Instead, the RGSVD algorithm is adopted. The RGSVD algorithm uses the randomized algorithm to provide a low-rank approximation of the GSVD [10]. Compared with the GSVD, the RGSVD can reduce computational costs and memory demands, and results with good accuracy can be obtained. Algorithm 2 shows the RGSVD algorithm [10], which is used in this study. The parameter q determines the accuracy and efficiency of the RGSVD algorithm. As the value of the parameter q increases, the result becomes more accurate and the computation time is longer.

Algorithm 2. Randomized generalized singular value decomposition (RGSVD) algorithm.
Given h ∈ R n×m (n ≤ m) and W m ∈ R 4m×m , a target matrix rank q (q ≤ n), calculate an approximate GSVD of the matrix pair [h, W m ]: h ≈ UCX, W m ≈ VSX with U ∈ R n×q , V ∈ R 4m×q , C ∈ R q×q , S ∈ R q×q , and X ∈ R q×m .

1.
Generate a q × n Gaussian random matrix A.

2.
Calculate the q × m matrix Y = Ah.

3.
Calculate the m × q orthonormal matrix Q via QR factorization Y T = QR.

4.
Form the n × q matrix B 1 = hQ and the 4m × q matrix B 2 = W m Q.

6.
Form the q × m matrix X = W T Q T . 7.
Note : h ≈ UCX and W m ≈ VSX.
In the RGSVD algorithm, C = diag c 1 , c 2 . . . , c q ∈ R q×q with 0 < c 1 ≤ c 2 ≤ · · · ≤ c q < 1, (12) and S = diag(s 1 , s 2 , . . . , s 2 ) ∈ R q×q with According to the result of the RGSVD, Equation (11) has the form where γ i = c i /s i denotes the ith generalized singular value, u i represents the ith column of matrix U, and (X † ) i denotes the ith column of the Moore-Penrose inverse of matrix X.
The parameter µ opt can be found between the minimum and maximum of the generalized singular value γ i . Here, the parameter q is set as n. Therefore, the full generalized singular values can be obtained. The value of n is slightly large, but the computation time of the RGSVD algorithm with q = n is still much shorter than that of GSVD.

Unbiased Predictive Risk Estimator Method
The UPRE function for Equation (5) where trace[ ] is the trace of the term in the brackets. The randomized trace estimation [9] was also introduced in Equation (15) to solve the difficulty from the calculation of the trace. Then, the UPRE function is approximated by The result of the RGSVD is introduced into Equation (16), and Equation (16) has the form Similarly, when the value of Equation (17) is the minimum, its corresponding parameter µ is the optimal regularization parameter µ opt , which can be found between the minimum and maximum of the generalized singular value γ i as well. Here, the parameter q is also set as n.

Synthetic Example Tests
In this study, one example was used to demonstrate the performances of the GCV and the UPRE methods using the RGSVD algorithm. For comparison, the CG method and the GSVD method were also used in the GCV and UPRE methods. The following tests were run on a computer with a 3.00 GHz processor and 32 GB RAM.
The model consists of two cuboids, and Figure 1a shows its 3D perspective view. The dimensions of the two cuboids are both 300 m × 300 m × 200 m, and the depths of their tops are both 50 m. Both of them have a density contrast of 1 g/cm 3 . The gravity data were generated on a grid with 31 × 21 = 651 points and 50 m × 50 m spacings. Meanwhile, random Gaussian noise, whose standard deviation is 0.02 mGal plus 2% of each datum, were incorporated in the gravity data (Figure 1b). The subsurface was divided into 30 × 20 × 10 = 6000 cells, and dimension of each cell is 50 m × 50 m × 50 m. In the following inversions, the density range was 0-1 g/cm 3 and K max was set as 50. First, we used the GCV method to determine the regularization parameter in inversion. In the GCV method, the CG method, the GSVD, and the RGSVD algorithm were used, respectively. Figure 2a-c show the inversion results from the GCV methods using the CG method, the GSVD, and the RGSVD algorithm, respectively. These three inversion results are very similar: two source bodies with cuboid shape are in the actual position of the true model. Their corresponding GCV function curves are shown in Figure 3a-c, and these three curves have similar trends. Meanwhile, Table 1 records the corresponding optimal regularization parameters µ opt , and the corresponding elapsed times for the process of choosing the regularization parameter. The GCV methods using the GSVD and the RGSVD algorithm have very similar µ opt , and the µ opt of the GCV method using the CG method is slightly large. The GCV method using the RGSVD algorithm has the shortest elapsed time, at only 1.3 s.
Then, the UPRE method was implemented in inversion. Similarly, the UPRE method used the CG method, GSVD, and the RGSVD algorithm, respectively. Their inversion results are shown in Figure 2d-f, respectively. These inversion results are similar to those obtained using the GCV method. The curves for the UPRE function are demonstrated in Figure 3d-f, and they also have similar trends. Table 1 also lists the value of optimal regularization parameters and the elapsed times for the UPRE method. Their optimal regularization parameters are the same as them from the GCV method, and the elapsed times are close to them from the GCV method. The UPRE method using the RGSVD algorithm also has the shortest elapsed time.
Through the above comparative test, it was concluded that the GCV and the UPRE methods using the RGSVD algorithm can quickly provide an optimal regularization parameter that can generate a satisfying inversion result.

Conclusions
We introduced the GCV and the UPRE methods using the RGSVD algorithm, with which the optimal regularization parameter can be determined fast in the inversion of potential field data. We demonstrated the effect of these two methods through a comparative test.