Department of Mathematics, Western Washington University, Bellingham, WA 98225-9063, USA
Authors to whom correspondence should be addressed.
Received: 7 June 2017 / Accepted: 1 July 2017 / Published: 10 July 2017
The nonlinear least squares problem , where is a full-rank matrix, , and with , can be solved by first solving a reduced problem to find the optimal value of y, and then solving the resulting linear least squares problem to find the optimal value of z. We have previously justified the use of the reduced function , where is a matrix whose columns form an orthonormal basis for the nullspace of , and presented a quadratically convergent Gauss–Newton type method for solving based on the use of QR factorization. In this note, we show how LU factorization can replace the QR factorization in those computations, halving the associated computational cost while also providing opportunities to exploit sparsity and thus further enhance computational efficiency.
separable equations; nonlinear least squares; full-rank matrices; QR factorization; over-determined systems; Gauss–Newton method; least squares solutions; LU factorization; quadratic convergence
Many applications [1,2,3] lead to the need to find a solution of the separable nonlinear overdetermined least squares problem
where the full-rank matrix and the vector are twice Lipschitz continuously differentiable functions of , and . Here, we are using the Euclidean norm. The classic Golub–Pereyra variable projection method  replaces the problem (1) by the simpler least squares problem
Once has been obtained by solving (2), can be found by solving the resulting linear least squares problem
where is the pseudo-inverse of . An alternative method that reduces (1) to a smaller least squares problem was proposed in [5,6] for the case , and the associated iterative technique was then applied to the case in  but without a complete analytical justification. The reduced system is
where is an matrix whose columns form an orthonormal basis for . In , we presented a quadratically convergent Gauss–Newton type iterative method, incorporating second derivative terms, to solve this problem, and provided a complete theoretical justification for our technique. In particular, we showed in  that our Gauss–Newton type method, and also the standard Gauss–Newton method [9,10,11], which omits the second derivative terms and is not quadratically convergent, are both invariant with respect to the specific basis matrix that is used at any particular point in the iteration. This makes it possible to substitute, at each point of the iteration, any easily computed local orthonormal basis for the nullspace of . This observation also makes it possible to compute both the first and the second derivatives involved in the computation very cheaply.
Many instances of (1), such as those arising from the discretization of linear differential equations with only a few nonlinear parameters, involve , while ℓ and n are fixed and small. In this case, the main computational cost in each step of our quadratically convergent Gauss–Newton type method of  is the QR factorization of , costing approximately flops  per iteration. In this note, we show how the relevant computations can instead be performed by using one LU factorization of per iteration, so that the computational cost of the computation is essentially halved to flops . We note that, in the case of discretizations of differential equations, the matrix is typically very sparse and strongly patterned. Since LU factorization more easily exploits and preserves such sparsity and patterns than does QR factorization, this technique also opens up opportunities for further reductions in the computational cost in such applications.
In a previous paper , we presented the use of LU factorizations in the context of solving separable systems of nonlinear equations and zero-residual separable nonlinear least squares problems. We mentioned there that extending that approach to nonzero residual problems remained an open question. The current paper resolves that question, while also improving on the technique of  in other ways. In particular, our new method does not involve the singular value decomposition, and in contrast to the use of the standard Gauss–Newton method in , it retains quadratic convergence even in the case of nonzero residual problems. The latter is achieved by retaining the second derivative terms, and we show here how those terms can be computed very efficiently. This is in contrast to the usual use of the Gauss–Newton method, which sacrifices the quadratic convergence for the convenience of not computing the second derivatives.
In the next section, we present the relevant results from  relating to our technique for solving (5) by our Gauss–Newton type method using QR factorization, and show how those results and methods can be modified to use LU factorization. The resulting algorithm and some numerical examples to illustrate the method are given in the following two sections. The numerical results exhibit the quadratic convergence expected from our Gauss–Newton type method.
We begin by presenting the relevant background results from . Assume that and are twice Lipschitz continuously differentiable in a neighborhood of the least squares solution of (1). As shown in [5,6,7,8], there exists a smoothly differentiable matrix whose columns form an orthonormal basis of in a neighborhood of . Then, finding the least squares solution of (1) can be reduced to finding the least squares solution of (5). Our Gauss–Newton type method for solving (5) takes the form
where and , and denote the derivative of , the i-th component of , and the Hessian matrix of , respectively.
In , we proved that, at any point in the iteration (6), the particular matrix whose columns form an orthonormal basis for the nullspace of can be replaced by any other orthonormal basis for this space, without changing . Thus, instead of using the function in (6), we may use a different function at each iteration point . This freedom to use any orthonormal basis for the nullspace of at the point is key to our numerical technique. It also permits us to write simply and instead of and in the rest of this note, regardless of the particular matrix being used, and for simplicity we write instead of . The analysis below centers on the efficient computation of the terms required to use (6) at the particular point .
The following results are derived and used in [5,6,7,8] and are required for our analysis below. With the terms as defined above, and writing for the i-th column of the matrix ,
and the -th entry of the Hessian matrix is
Here, we use and to denote the termwise first derivative w.r.t. and second derivative w.r.t. , respectively, of , for .
where the -entry of the upper triangular matrix is
Throughout the rest of this paper, we assume not only that has full rank N, but also that it is sufficiently well-conditioned that LU factorization with partial pivoting can be performed safely. For separable nonlinear systems with a rank-deficient , the technique of bordered matrices [13,14,15,16] may be applied to produce a full rank matrix.
The following theorem lays the foundation for our approach to using the LU factorization.
Let the LU factorization of the rectangular matrix with some permutation matrix be
where is , is an permutation matrix, is an unit lower triangular matrix and is an nonsingular upper triangular matrix.
Then, the thin positive QR factorization
produces a thin orthonormal matrix and a small nonsingular upper triangular matrix , and
(a) By direct computation, is a product of nonsingular matrices:
From (17), we see that the matrix constructed in the theorem satisfies (7) at , and thus, as shown in , Equations (11) and (12) hold for this matrix at .
To determine the computational cost, recall that we assume that , while ℓ and n are fixed and small. Then, the thin QR factorization in (16) is cheap since the matrix has only ℓ columns. More specifically, the LU factorization of the dimensional in (13) costs flops, with an additional comparisons to produce , while computing only costs flops and the thin QR factorization of the dimensional matrix takes only flops .
Notice that solving (11) and (12) involves solving several equations of the form . However, the matrix defined in (10) and which is used in (11) and (12), differs from the matrix defined in (14) whose LU factors are given in (18). To reduce the cost of solving (11) and (12) from to operations, we can exploit the factorization (18) of the matrix , as below. This result is essentially an application of the Sherman–Morrison–Woodbury formula.
Let be the LU factorization of with a permutation matrix as in (13), and let the matrices , and be defined as in (14)–(16), while is defined by (10). Denote
Finally, we show how the value defined by (3) can be computed efficiently using the LU factorization of .
if and only if
Let the QR factorization of be
Since is full rank, we see that
Since is an orthonormal matrix, there exists an orthogonal matrix such that
Hence, the conclusion follows. ☐
Computing via (21) involves the matrix of (10), but the technique of Theorem 2 may again be used to replace this by the known LU factorization (18) of obtained from the LU factorization of . Thus, the cost of this step is also only flops.
We now give a detailed Algorithm 1 based on the analysis from the last section.
Algorithm 1: Given the function with a matrix that is full rank in a neighborhood of , a small positive real number , a positive integer , and a point near the solution . For , do steps –:
If stop, output and from (21) using (19) and (20). Otherwise, if , replace m by and go to ; if , output that the method fails to obtain within given and .
Note that the cost of solving (6) for is only since the matrix involved is only and we assume . Thus, the overall cost per iteration remains , the primary cost being the LU factorization of . The standard Gauss–Newton method, which omits the second derivative terms, omits steps (g) and (h), but the resulting method is no longer quadratically convergent.
We present three examples to illustrate our algorithm for computing the least squares solution of overdetermined separable equations. Our examples have large N and small , approximating the theoretical characteristic that as . Similar results hold if we use much larger N.
Consider the least squares solution of the overdetermined system
and (thus ), and where is the -th standard unit vector in . The selected value of is the value of the exact solution . Here, .
The problem can be regarded as a constrained generalized eigenvalue problem. We choose , , and . For the eigenvalues of matrices of the form of , see . Beginning with and using Algorithm 1, we obtain the results in Table 1 for , which shows quadratic convergence of the method.
After is obtained, we compute using (21), where , . We find that , and the 2-norm of the residual of is ≈0.06.
Consider the least squares solution of the overdetermined system
Here, we use , thus , and .
We choose . The least squares solution of (23) is . Beginning with and using Algorithm 1, we obtain the results in Table 2 for . These results show the expected quadratic convergence. After is obtained, we compute using (21). Here, , , and the exact residual is . We find that , while 2-norm of the residual of is .
Our last example arises from a discretization, using the finite element method, of the one-dimensional elliptic interface problem , for , where for and for . The boundary conditions are and , and the interface conditions are and . For more on interface problems, see . In this discretization, the slopes of the basis tent functions on either side of the interface are modified from and to , and , on the respective line segments. In our setting, and are given, and represent the nonlinear variables.
Consider the least squares solution of the overdetermined system
where the matrix is defined as
where and , , and the square matrices F are tridiagonal of the form
the vectors E have all entries one, are the jth standard unit vectors, and
The exact least squares solution is .
We use , , , , , , and . Using the Algorithm 1 with , we obtain the results in Table 3 for , which shows quadratic convergence of the method. After is obtained, we obtain using (21). Here, and , and the exact residual is . We find that and the 2-norm of the residual of is .
We have shown how LU factorization can be used to solve separable nonlinear least squares problems with nonzero residuals. Our Gauss-Newton type method retains the second derivative terms and is quadratically convergent. The technique is most efficient for situations in which there are few nonlinear terms and variables. In that context we show how the second derivative terms can be computed relatively cheaply. Our numerical examples demonstrate the effectiveness of the technique in some areas of application.
Publication costs were provided by Western Washington University, Bellingham, WA 98225, USA.
Yunqiu Shen conceived the key ideas and drafted the paper; Tjalling Ypma refined the details and the presentation.
Conflicts of Interest
The authors declare no conflict of interest.
Golub, G.H.; Pereyra, V. Separable nonlinear least squares: the variable projection method and its applications. Topic Review. Inverse Probl.2003, 19, R1–R26. [Google Scholar] [CrossRef]
Mullen, K.M.; van Stokkum, I.M. The variable projection algorithm in time-resolved spectroscopy, microscopy and mass spectrometry applications. Numer. Algorithms2009, 51, 319–340. [Google Scholar] [CrossRef]
Shen, Y.-Q.; Ypma, T.J. Solving separable nonlinear equations using LU factorization. ISRN Math. Anal.2013, 2013, 258072. [Google Scholar] [CrossRef] [PubMed]
Golub, G.H.; Pereyra, V. The differentiation of pseudo-inverses and nonlinear least squares problems whose variables separate. SIAM J. Numer. Anal.1973, 10, 413–432. [Google Scholar] [CrossRef]
Shen, Y.-Q.; Ypma, T.J. Solving nonlinear systems of equations with only one nonlinear variable. J. Comput. Appl. Math.1990, 30, 235–246. [Google Scholar] [CrossRef]
Ypma, T.J.; Shen, Y.-Q. Solving N+m nonlinear equations with only m nonlinear variables. Computing1990, 44, 259–271. [Google Scholar] [CrossRef]
Lukeman, G.G. Separable Overdetermined Nonlinear Systems: An Application of the Shen-Ypma Algorithm; VDM Verlag: Saarbrucken, Germany, 2009. [Google Scholar]
Shen, Y.; Ypma, T.J. Solving Separable Least Squares Problems using QR factorization. J. Comp. Appl. Math. submitted.
Dennis, J.E., Jr.; Schnabel, R.B. Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Corrected Reprint of the 1983 Original; SIAM: Philadelphia, PA, USA, 1996. [Google Scholar]
Deuflhard, P.; Hohmann, A. Numerical Analysis in Modern Scientific Computing, 2nd ed.; Springer: New York, NY, USA, 2003. [Google Scholar]
Ortega, J.M.; Rheinboldt, W.C. Iterative Solution of Nonlinear Equations in Several Variables; Academic Press: New York, NY, USA, 1970. [Google Scholar]
Golub, G.H.; Van Loan, C.F. Matrix Computations, 3rd ed.; Johns Hopkins: Baltimore, MD, USA, 1996. [Google Scholar]
Shen, Y.-Q.; Ypma, T.J. Newton’s method for singular nonlinear equations using approximate left and right nullspaces of the Jacobian. Appl. Numer. Math.2005, 54, 256–265. [Google Scholar] [CrossRef]