The Optimal Shape Parameter for the Least Squares Approximation Based on the Radial Basis Function

: The radial basis function (RBF) is a class of approximation functions commonly used in interpolation and least squares. The RBF is especially suitable for scattered data approximation and high dimensional function approximation. The smoothness and approximation accuracy of the RBF are affected by its shape parameter. There has been some research on the shape parameter, but the research on the optimal shape parameter of the least squares based on the RBF is scarce. This paper proposes a way for the measurement of the optimal shape parameter of the least squares approximation based on the RBF and an algorithm to solve the corresponding optimal parameter. The method consists of considering the shape parameter as an optimization variable of the least squares problem, such that the linear least squares problem becomes nonlinear. A dimensionality reduction is applied to the nonlinear least squares problem in order to simplify the objective function. To solve the optimization problem efﬁciently after the dimensional reduction, the derivative-free optimization is adopted. The numerical experiments indicate that the proposed method is efﬁcient and reliable. Multiple kinds of RBFs are tested for their effects and compared. It is found through the experiments that the RBF least squares with the optimal shape parameter is much better than the polynomial least squares. The method is successfully applied to the ﬁtting of real data.


Introduction
The radial basis function (RBF) is a class of approximation functions that are widely used nowadays in many fields. It is generally believed that the RBF method was originally introduced in Hardy's work [1]. Since then, there have been many theoretical research and application methods of RBF approximation. Two valuable books [2,3] are recommended here. There are two groups of radial basis functions: global RBF and compactly supported RBF [2][3][4][5].
The RBF approximation function space P = span{Φ 1 (x), . . . , Φ M (x)} for any R d can be obtained by a one-dimensional radial function φ(r)(r > 0) and the given center points {y j } M j=1 in R d . Here, Φ j (x) = φ(||x − y j ||), ∀1 ≤ j ≤ M. Because the number and the distribution of center points are almost arbitrary, P is flexible and easily constructed. Therefore, the RBF approximation is suitable for the scattered data approximation and the high dimensional function approximation.
Almost all radial functions have a shape parameter. The shape parameter of RBF is related to the shape of the basis function and the approximation function space P. Therefore, the selection of shape parameters can affect the approximation accuracy of the RBF and computational stability. An important problem is how to define and select a parameter that can achieve the optimal approximation accuracy for RBF approximation. The problem is called the selection of optimal shape parameters in this paper.

The Nonlinear Least Squares with Optimal Shape Parameter
The least squares method is a classical and widely used approximation method. The general form of the RBF least squares is as follows: for a sampling function f (x) : Ω → R and the given sampled data {(x i , f i )} N i=1 , the RBFs {Φ j (x; α)} M j=1 are obtained by the given center points {y j } M j=1 and a radial function φ(r; α) : R + → R. Here, Φ j (x; α) = φ(||x − y j ||; α), ∀1 ≤ j ≤ M. The following optimization problem (1) is solved in the approximation function space P = span{Φ 1 (x; α), . . . , Φ M (x; α)}, where α is a given parameter called the shape parameter, c is the coefficient vector, and c j is the j-th component of c.
Since this paper focuses on the selection of the shape parameter, we assume that the solution to problem (1) always exists and is unique. Hence, M ≤ N. We set the solution to problem (1) to be c, and the corresponding approximation function is s(x; α) = ∑ M j=1 c j · Φ j (x; α). Research has been conducted on the RBF least square approximation. Fasshauer discussed the solvability conditions and approximation error bounds of problem (1) in his work Meshfree Approximation Methods with Matlab [3]. Majdisova and Skala indicated the excellent performances of RBF least squares in applications in [9,10].
The basic idea of the least squares is that the smaller the object function of problem (1) is, the closer the approximation function s(x; α) is to the sampling function f (x). Hence, the optimal value of problem (1) under a certain shape parameter α, i.e., ∑ N i=1 ( f i − s(x i ; α)) 2 , can be used to measure the shape parameter. We change the shape parameter from a constant into a variable and add it into the optimization problem to get the following optimization problem: where D is the domain of definition of the shape parameter or a given restricted interval. The dimension of the object function thus changes from M to M + 1, and the corresponding least squares problem changes from linear to nonlinear.
We set the solution to problem (2) to be (c * , α * ), and refer to α * as the optimal shape parameter for the RBF least squares. The approximation function s According to the references [15,16], the classical method for solving such a least squares problem is the Gauss-Newton method and the methods derived from it, such as the Levenberg-Marquardt method, the structural Newton method, and the generalized Gauss-Newton method. However, all of these methods require the derivative of the object function or the generalized derivative of the object function.
The derivative properties of different types of radial basis functions vary greatly, and M can be huge in some applications and research. These two points make it unwise to solve problem (2) directly using the Gauss-Newton method.

The Dimensional Reduction of Nonlinear Least Squares
As described in the previous section, it is not recommended to solve problem (2) directly. According to the literature [17], problem (2) is a separable nonlinear least squares problem, and there is an equivalent problem min where It is worth noting that, after the dimensional reduction, problem (3) becomes one-dimensional, but the expression of R 2 (α) is not clear.

Remark 1.
For a given α, the calculation of R 2 (α) is to solve a classical linear least squares problem, so it is relatively easy to obtain the value of R 2 (α). However, for the variable α, it is difficult to obtain a specific expression of R 2 (α) with respect to α. This process is particularly difficult when M is large.
Golub and Pereyra have proven the equivalency between problem (2) and problem (3) in [17]. We thus begin to describe an important theorem in the literature [17].
We use F to denote the vector [ f 1 , . . . , f N ] T and define the following matrix We use R(c, α) to denote the objective function of problem (2), namely Theorem 1. R(c, α) and R 2 (α) are defined as (6) and (4), respectively. We assume that, in the open set Ω ⊆ R, Φ(α) has a constant rank r ≤ min(M, N).
The proof of Theorem 1 is too complicated to be given here, so no proof will be given. Interested readers can refer to the proof in the literature [17]. We have supposed that the solution to problem (1) always exists and is unique, so the condition of Theorem 1 is always true. Therefore, we can claim that problem (2) is equivalent to problem (3). The solution to problem (3) is our optimal shape parameter for the RBF least squares.

Remark 2.
The symbol Φ(α) + in (7) represents the Moore-Penrose generalized inverse of Φ(α). When the basis functions are linearly independent and M ≤ N, the rank of Φ(α) is M, andĉ in (7) is the solution to problem (1) corresponding to α. Theorem 1 indicates that, if the radial basis functions we use are linearly independent, problem (2) is equivalent to problem (3). Hence, we can obtain the optimal shape parameter by solving problem (3).

The Solution of Optimal Shape Parameter
We have proposed the definition of the optimal shape parameter and the corresponding optimization problem in Section 2. The dimension of the equivalent problem (3) is reduced from M + 1 to 1 in Section 3. Although the dimension of the problem has been reduced, how to solve problem (3) efficiently still needs to be discussed.
Golub and Pereyra have conducted some research on the separable nonlinear least square problem such as problem (2) in [17]. They point out that it is feasible to use the Gauss-Newton method to solve problem (2) and problem (3) when the initial value is close to the solution. Their experiments show that, when M is relatively small, the efficiency of the Gauss-Newton method in solving (3) is equivalent to that of the Gauss-Newton method in solving (2). If M is substantially increased, obtaining the specific expression of R 2 (α) will become extremely difficult, thereby increasing the cost of solving problem (3) with the Gauss-Newton method.
The problems that Cotnoir and Terzić discussed in [18] are rooted in practical problems, naturally involving quite large N and M. They believe that it is difficult to solve a problem like (3) by using gradient-based methods, including the Gauss-Newton method, when M or N is large. Hence, they suggest using the direct search method to solve problem (3). Among the direct methods, they recommend the genetic algorithm (GA) method.
In order to efficiently and accurately obtain the optimal shape parameter in the case of the relatively big N or M, we use the derivative-free optimization (DFO) method instead of the above methods to solve problem (3). Among the various DFO methods, we choose Powell's UOBYQA method [19] for problem (3). Powell's UOBYQA replaces the objective function with its quadratic interpolation in iteration. Intuitively, the convergence rate of the method is nonlinear. Among several classic DFO methods, the performance of Powell's UOBYQA was outstanding in our trials.
The UOBYQA method is proposed for an arbitrary dimension optimization problem. The construction and updating of the interpolation point set is complicated for high dimension problems. Because our problem (3) is one-dimensional, we can simplify the construction and updating of the interpolation point set in the original UOBYQA method. Hence, we propose a matching algorithm combining the characteristics of problem (3) with the UOBYQA method.

Powell's UOBYQA Method
UOBYQA, short for unconstrained optimization by quadratic approximation, is a trust-region derivative-free optimization method for solving unconstrained optimization problems. Its essence is to replace the objective function with its quadratic interpolation in each trust region, so as to solve the subproblem of the trust region.
For the objective function F(x), the UOBYQA method begins with an initial point x 0 , an interpolation set N 0 , and a trust-region radius r 0 . In the method, N 0 must guarantee that the quadratic interpolation of F(x) exists and is unique. At each step, x k , N k and r k are changed. At step (k + 1), the quadratic interpolation on N k is set to m k (x), the iterate point is set to x k , and the current trust-region radius is set to r k . The subproblem of the trust region is then We set the solution to the subproblem tox, and then calculate the number If ρ > 0, we make x k+1 =x and update the interpolation set N k+1 , or else x k+1 = x k and N k+1 = N k . The update of the trust-region radius r k+1 depends on the size of ρ.
The UOBYQA method has two obvious advantages. Firstly, the method does not need the derivative of the objective function or even the specific expression of the objective function. The method only requires the value of F(x) for any x. Secondly, due to the sequence {x k } ∞ k=0 obtained by the UOBYQA method, the corresponding function value sequence {F(x k )} ∞ k=0 is a monotonic decreasing sequence.
The disadvantages of the UOBYQA method are also obvious. If x ∈ R d , the interpolation set N k for each k has 1 2 (d + 1)(d + 2) points, and these points must satisfy specific distribution conditions. Hence, the difficulty of updating N k increases with d. If d is relatively large, the interpolation of the objective function at each step is difficult. Therefore, the UOBYQA method is not efficient for the high dimension problems.

A Modified UOBYQA Method
It is worth noting that problem (3) is one-dimensional, so it can be appropriately solved by the UOBYQA method. For problem (3), N k has three different points, which can make the interpolation exist and unique. However, it is also worth noting that problem (3) may not be an unconstrained optimization problem. Generally, the D in problem (3) is R \ {0} or R + \ {0}. According to the characteristics of the radial basis function, (3) is an unconstrained optimization problem.
In conclusion, for different kinds of RBFs, D in problem (3) can be processed accordingly, making the problem an unconstrained problem, so that the UOBYQA method can solve problem (3) efficiently.
Since problem (3) is one-dimensional, the difficulty of quadratic interpolation is low. In order to make the interpolation more accurate and the update of the set N k easier, the set N k is specially constructed at each step. Specifically, N k = {x k − 1 2 r k , x k , x k + 1 2 r k }. These three points can always divide the trust region into five equal parts.
After applying Powell's UOBYQA method to problem (3) and achieving a small improvement, we obtain the following algorithm (Algorithm 1):

Algorithm 1 DFO algorithm
Input: objective function R 2 (α), trust-region radius r, initial value α 0 Output: optimal shape parameter α * if ρ > 0.75 then 10: r=2r; 11: end if 12: if ρ < 0.5 then 13: r= 1 4 r; 14: end if 15: if ρ > 0 then 16: α k+1 =α; 17: D=|α k+1 − α k |; end if 23: end while 24: α * = α k ; 25: return α * Remark 3. The above algorithm is a trust-region method. Compared with the classical line search methods, the algorithm can converge quickly with a wide range of convergence. The algorithm avoids complicated derivative calculations and does not concern the specific expression of R 2 (α); thus, it is easy to apply the above algorithm to different kinds of basis functions or sampling points. For different problems, all we need to do is to modify the objective function R 2 (α) without any other preparation like the Gauss-Newton method. Namely, the above algorithm is simple and convenient for programming implementation when solving problems such as problem (3), and has a wide applicability.
In the following section, we would use the DFO method to denote Algorithm 1.

Numerical Experiments
To demonstrate the feasibility and efficiency of our DFO method, we conduct a series of numerical experiments with it. In these experiments, in which the default experimental data are sampled from the Franke's test function, its expression is In all experiments, N represents the number of sampling points, and M represents the number of basis functions. Our sampling points and center points are all the quasi-scattered Halton points [3,20] in (0, 1) × (0, 1). The default initial value α 0 of our DFO method equals 1, if there are no additional instructions.
We use the error between the approximation function and the sample value to measure the shape parameter α * , where c * is the solution to the classical least squares corresponding to α * . The above formula not only measures the shape parameter, but also shows the approximation effect of the approximation function to the sampling function. All experiments were implemented on a CPU with Intel Core i7-8700 and with a main frequency of 3.20 GHz. The programming language which we use is Matlab R2019a.
We have opened the sample point sets and center point sets of each experiment, so that the interested readers can reproduce those experiments. Those data are in the Supplementary Materials.

The Comparison of the DFO Method and the Gauss-Newton Method
In this section, the radial basis function is the thin plate spline function (TPS) φ(r) = (αr) 2 · ln (αr), α > 0.
As mentioned in Section 3, if M is relatively large, obtaining the objective function R 2 (α) of problem (3) is difficult. If M is particularly large, the computational cost of obtaining the expression and derivative of R 2 (α) can be higher than that of solving problem (3). Hence, the Gauss-Newton method is used to solve problem (2), and the DFO method is used to solve problem (3) in this section. Although the problems are solved differently, the Formula (8) can be used to measure the shape parameters in these two ways.

The Convergence Comparison
The classical Gauss-Newton method is a method with local convergence. It can converge quickly when the initial point is near the critical point. In addition, our DFO method is a trust-region method. Its convergence rate is theoretically higher than that of the line search method. Hence, we first compare the convergence between the Gauss-Newton method and the DFO method.
In this experiment, N = 5000, and M takes 50, 60, 70, and 80, respectively. At first, we set α 0 = 1. The results of the two methods in these cases are shown in Table 1. The results in Table 1 indicate that α * obtained by the two methods can be different in the same case. According to the size of R 2 (α * ), α * obtained by the DFO method is more likely to be the optimal shape parameter than that obtained by the Gauss-Newton method. In order to further understand the reason for the result, we use a numerical method to draw the image of R 2 (α) for the case of M = 50 around two α * s obtained by the two methods in Figure 1. In Figure 1, the red asterisk represents the α * obtained by the Gauss-Newton method, and the red circle represents the α * obtained by the DFO method. We can find that, beginning with α 0 = 1, the DFO method can "climb" over a maximum to a minimum on the left, but the Gauss-Newton method can only reach a minimum near the initial value.
We then set α 0 = 0.5. The results of the two methods are shown in Table 2. According to Table 2, the results of DFO are almost identical to those of Gauss-Newton. Comparing the results in Table 2 with those in Table 1, the DFO method can achieve the same minimum point beginning with two different initial values. This shows that our DFO method is independent of the initial value selection, while the Gauss-Newton method depends on the initial value.

The Comparison of Computational Efficiency
At first, we set α 0 = 1. In order to compare the computational efficiency of the DFO method with that of the Gauss-Newton method, we list their computational times in Table 3, when the α * s obtained by the two methods are almost the same. In this experiment, N = 5000, but M takes 10, 20, 30, and 40, respectively. The results in Table 3 indicate that, when M is small, the Gauss-Newton method takes less time than the DFO method. However, with the increase of M, the computational time of the DFO method is significantly lower than that of the Gauss-Newton method. Such results are predictable. The problem that the Gauss-Newton method solves is (M + 1)-dimensional, but the problem that the DFO method solves is one-dimensional. This shows that our dimensional reduction is meaningful.
Thus, we set α 0 = 0.5. The initial value is nearer to the α * in Table 3. We list their computational times in Table 4.  Comparing the results in Table 4 with those in Table 3, the computational time of DFO is hardly affected by the initial value, but the computational time of Gauss-Newton is heavily affected by the initial value. The efficiency of the DFO method is less dependent on the initial value.
According to Table 4, the nearer initial value can not make the Gauss-Newton faster than DFO. Combining these results with those in Table 3, we conclude that, when M is large, the computational efficiency of the DFO method is higher than that of the Gauss-Newton method.

Remark 4.
Ideally, we should use the Gauss-Newton method to solve problem (3) in order to make a comparison with the DFO method. However, obtaining the expression and derivatives of R 2 (α) is difficult. The time of obtaining R 2 (α) and R 2 (α) is longer than the time shown in Table 3. Hence, we choose to use the Gauss-Newton method to solve problem (2) instead of problem (3).

The Results of the DFO Method with a Larger Amount of Data
In practice, we often need to approximate large-scale data and use a large number of bases in order to obtain better approximation results. Namely, both N and M are large in general. Hence, we test the DFO method with larger N and M. The RBF is still the TPS function. N = 10, 000 and M takes 25, 50, 100, 200, 400, and 800, respectively. Just like in the previous experiment, we use R 2 (α * ) to show the approximation effect. In this experiment, we introduce a graph of R 2 (α * ) and a graph of the computational time of the DFO method in relation to M. Figure 2 shows the relationship between the error R 2 (α * ) and M. Figure 3 shows the relationship between the computational time and M.  Figure 2 indicates that, for the same sampling data, the error R 2 (α * ) decreases rapidly with the increase of M. In other words, the RBF least squares with a large M and our optimal shape parameter can achieve an excellent approximation. Figure 3 indicates that the relationship between the computational time and M is approximately linear. Hence, the DFO method and our optimal shape parameter are suitable for the case of a large M.

The Comparison of Results on Different Radial Bases
In order to compare the results on different radial bases adequately, we conduct experiments in two cases. Just like in the previous experiment, we use R 2 (α * ) to show the approximation effect. At first, we compare R 2 (α * ) and the computational time on different radial bases under two sizes of sampling datasets. Secondly, we add some new test functions and show the results of different radial bases.
Although there are many kinds of RBFs at present, our experiment is only carried out under four commonly used kinds. The four kinds of RBFs are shown in Table 5.

RBF φ(r) Domain of Definition
Thin plate spline (TPS) (αr) 2 * ln (αr) We compare the approximation effects and computational time of the DFO method with these basis functions in every experiment.

The Comparison of Two Sizes of Sampling Datasets
The number of sampling datasets is N = 5000. They are the quasi-scattered Halton points. The number of center points, M, takes 10, 20, . . . , 100, respectively. Figure 4 shows the relationship between R 2 (α * ) and M corresponding to the four kinds of basis functions.   In Figures 4 and 5, the red polyline represents TPS-RBF, the green polyline represents MQ-RBF, the blue polyline represents IQ-RBF, and the black polyline represents CS-RBF [2]. Figure 4 indicates that, whatever the basis function is, R 2 (α * ) decreases rapidly with the increase of M. When M is small, the differences among R 2 (α * ) of the four basis functions are significant. In this case, the R 2 (α * ) of IQ is the lowest, and the R 2 (α * ) of TPS is the highest. When M becomes large, all the R 2 (α * ) values of the four basis functions decrease rapidly. In most cases, the approximation effect of IQ is the best, and the approximation effect of TPS is the worst. Figure 5 indicates that, when M is small, the differences among the computational time of the four kinds of basis functions are not significant. As M increases, the differences in the computational time become obvious. In most cases, the computational time of IQ is the lowest. However, if M = 100, the computational time of IQ becomes relatively high.
We then reduce the N from 5000 to 2500, and others are unchanged. The results of this experiment are shown in Figures 6 and 7.  If we compare the results in Figures 6 and 7 with those in Figures 4 and 5, we can find that the conclusions are similar. The performance of IQ in our experiments is excellent, which is compared with the other three kinds of basis functions.
Based on the above analysis, we find that the performance of IQ is the best in the four kinds of RBFs. Hence, we recommend IQ-RBF for the high-precision problems.

The Comparison of the Four New Test Functions
In this section, we add the following four new test functions, g 2 (x, y) = 1.25 + cos(5.4y) 6 + 6(3x − 1) 2 , They are functions commonly used to test the performance of the RBF approximation. The four test functions are only used in this section.
In this section, the number of sampling datasets is always N = 5000, and M takes 10, 20, . . . , 100, respectively. The approximation effects and computational times of the DFO method are shown in Figures 8 and 9, respectively. Each of the two figures has four subfigures corresponding to the four test functions.           Figure 8 shows that the performances of MQ and IQ are better than that of TPS and CSRBF. In most cases, the R 2 (α * ) of IQ is the lowest among the four kinds of basis functions. However, as shown in Figure 8d, when M = 100, the R 2 (α * ) of g 4 (x, y) based on IQ indicates that the DFO method may have failed. Figure 9c shows that there is a special case in which the DFO method may take longer than expected. In most cases, the computational time of every basis function is low and the relationship between the computational time and M is approximately linear, though we cannot prove this in theory yet. According to Figure 9, CSRBF spends the longest time in most cases, IQ has a good performance with respect to g 1 (x, y) and g 4 (x, y), and TPS has a good performance with respect to g 2 (x, y) and g 3 (x, y).
Combining the results of Section 5.3.1 with those of this section, although there may be some unexpected results with respect to IQ-RBF on the approximation effect and computational time, IQ-RBF is the most recommended of the four basis functions in most cases.

Curved Surface Fitting
In this section, we present the application of the RBF least squares with the optimal shape parameter to the surface fitting of the simulated data and real data. We compare the approximation effect of the RBF least squares with that of the polynomial least squares with the same dimension on the simulated data.

Curved Surface Fitting on Simulated Data
In this experiment, the simulated data were sampled from Franke's test function. Figure 10 is the image of Franke's test function. We sampled exactly at N = 5000 points to construct the simulated data, which are used for the least squares approximation with a polynomial basis and IQ-RBF. The polynomial space is selected as a binary polynomial space with a total degree less than or equal to 8. The dimension of the polynomial space is 45. Accordingly, the IQ-RBF approximation space is constructed with M = 45 basis functions.
The approximation function of the least squares based on polynomial is p * (x, y). The approximation function p * (x, y) and the error function |p * (x, y) − f (x, y)| on [0, 1] × [0, 1] are shown in Figures 11 and 12 respectively.  According to Figures 11 and 12, the error at some points is relatively large, though the approximation function p * (x, y) has a certain approximation accuracy to the surface by the simulation data. A comparison between Figures 10 and 11 shows an obvious difference between p * (x, y) and f (x, y). In Figure 12, the maximum error is 0.2577, and the average error is 0.0132.
The optimal shape parameter of IQ-RBF with these data is α * = 0.3047; the optimal approximation function of the least squares based on IQ-RBF is s * (x, y). The approximation function s * (x, y) and the error function |s * (x, y) − f (x, y)| on [0, 1] × [0, 1] are shown in Figures 13 and 14, respectively.  show that s * (x, y) is closer to f (x, y) than p * (x, y). Figure 14 indicates that the error |s * (x, y) − f (x, y)| oscillates violently, but the numbers are small. In Figure 14, the maximum error is 0.0463, and the average error is 0.0049.
By comparing Figure 12 with Figure 14, we can find that the maximum error and average error of s * (x, y) are lower than those of p * (x, y). Table 6 shows the errors of the least squares approximation for the binary polynomial spaces with a total degree less than or equal to 6-10, respectively. The errors of the IQ-RBF approximation whose approximation space has the same dimension as the corresponding polynomial spaces are also shown in the table. According to Table 6, the results about the binary polynomial spaces with a total degree less than or equal to 8 appear to be extensive. Therefore, we think that the curved surface fitting by the RBF least squares with the optimal shape parameter is better than that by the least squares based on polynomial bases.

Curved Surface Fitting on Real Data
The real data set is obtained from GPS data of Mount Vel'ký Rozsutec in the Malá Fatra through the online tool GPSVisualizer (http://www.gpsvisualizer.com/elevation). These data have 24,190 points in total and are shown in Figure 15. We only use IQ-RBF to fit the real data and select the M = 500 points in the region as the center point to generate the RBF approximation function space. We begin our DFO method with the initial value α 0 = 1. In this case, the optimal shape parameter is α * = 424.8005, and the fitting surface is shown in Figure 16. By comparing Figure 15 with Figure 16, we find that the RBF least squares with the optimal shape parameter have a smooth and continuous approximation. The peaks, ridges, and ravines of the mountain are well reconstructed. This approximation is acceptable.
We tabulate the error of RBF least approximation on different shape parameters in Table 7. The error is the famous RMSE (root mean square error). The initial value is far away from our optimal shape parameter α * = 424.8005. The error at the initial value α 0 = 1 is very high. The results in Table 7 show that the optimal shape parameter can significantly improve the error, and our DFO method is effective when it begins with a bad initial value.

Conclusions
This paper investigates the selection of shape parameters in the RBF least squares. The approximation error of the least squares is used to measure the shape parameter, and we propose a corresponding optimization problem (2) to obtain the optimal shape parameter. In order to reduce the difficulty in solving the problem, an equivalent one-dimensional optimization problem (3) is obtained through variable separation. According to the characteristics of problem (3), we propose to solve it by using the derivative-free optimization method.
Among the various DFO methods, we choose Powell's UOBYQA method, and we propose our DFO method by combining the characteristics of problem (3) with Powell's method in Section 4.
We test our method in different cases by numerical experiments. Our method outperforms the classical Gauss-Newton method in the experiments. The performance of our DFO method to solve the least squares approximation based on IQ-RBF is excellent. The surface fitting by the RBF least squares with the optimal shape parameter is better than that by the least squares based on polynomial bases.
The approximation function s * (x) with the optimal shape parameter can approximate the sampling function more closely than the normal least squares approximation. Hence, the optimal shape parameter is applied to the various applications of least squares. For example, when solving partial differential equations with least squares, we can approximate the exact solution better with the method proposed in Section 4; alternatively, when fitting the surface, we can approximate the sampled surface better by the least squares based on RBF with the optimal shape parameter.
The method used in this paper to define the optimal shape parameter directly implies the generalization that each basis function has an independent shape parameter, which would mean that the problem (2) would become a 2M-dimensional optimization problem. Such a generalization will undoubtedly improve the approximation effect, but the difficulty of solving problem (2) will further increase. How the 2M-dimensional nonlinear optimization can be solved efficiently is worth considering.