1. Introduction
Many applications [
1,
2,
3] lead to the need to find a solution
of the separable nonlinear overdetermined least squares problem
where the full-rank matrix
and the vector
are twice Lipschitz continuously differentiable functions of
,
and
. Here, we are using the Euclidean norm. The classic Golub–Pereyra variable projection method [
4] replaces the problem (
1) by the simpler least squares problem
Once
has been obtained by solving (
2),
can be found by solving the resulting linear least squares problem
formally,
where
is the pseudo-inverse of
. An alternative method that reduces (
1) to a smaller least squares problem
was proposed in [
5,
6] for the case
, and the associated iterative technique was then applied to the case
in [
7] but without a complete analytical justification. The reduced system is
where
is an
matrix whose columns form an orthonormal basis for
. In [
8], we presented a quadratically convergent Gauss–Newton type iterative method, incorporating second derivative terms, to solve this problem, and provided a complete theoretical justification for our technique. In particular, we showed in [
8] that our Gauss–Newton type method, and also the standard Gauss–Newton method [
9,
10,
11], which omits the second derivative terms and is not quadratically convergent, are both invariant with respect to the specific basis matrix
that is used at any particular point in the iteration. This makes it possible to substitute, at each point of the iteration, any easily computed local orthonormal basis for the nullspace of
. This observation also makes it possible to compute both the first and the second derivatives involved in the computation very cheaply.
Many instances of (
1), such as those arising from the discretization of linear differential equations with only a few nonlinear parameters, involve
, while
ℓ and
n are fixed and small. In this case, the main computational cost in each step of our quadratically convergent Gauss–Newton type method of [
8] is the QR factorization of
, costing approximately
flops [
12] per iteration. In this note, we show how the relevant computations can instead be performed by using one LU factorization of
per iteration, so that the computational cost of the computation is essentially halved to
flops [
12]. We note that, in the case of discretizations of differential equations, the matrix
is typically very sparse and strongly patterned. Since LU factorization more easily exploits and preserves such sparsity and patterns than does QR factorization, this technique also opens up opportunities for further reductions in the computational cost in such applications.
In a previous paper [
3], we presented the use of LU factorizations in the context of solving separable systems of nonlinear equations and zero-residual separable nonlinear least squares problems. We mentioned there that extending that approach to nonzero residual problems remained an open question. The current paper resolves that question, while also improving on the technique of [
3] in other ways. In particular, our new method does not involve the singular value decomposition, and in contrast to the use of the standard Gauss–Newton method in [
3], it retains quadratic convergence even in the case of nonzero residual problems. The latter is achieved by retaining the second derivative terms, and we show here how those terms can be computed very efficiently. This is in contrast to the usual use of the Gauss–Newton method, which sacrifices the quadratic convergence for the convenience of not computing the second derivatives.
In the next section, we present the relevant results from [
8] relating to our technique for solving (
5) by our Gauss–Newton type method using QR factorization, and show how those results and methods can be modified to use LU factorization. The resulting algorithm and some numerical examples to illustrate the method are given in the following two sections. The numerical results exhibit the quadratic convergence expected from our Gauss–Newton type method.
2. Analysis
We begin by presenting the relevant background results from [
8]. Assume that
and
are twice Lipschitz continuously differentiable in a neighborhood of the least squares solution
of (
1). As shown in [
5,
6,
7,
8], there exists a smoothly differentiable
matrix
whose columns form an orthonormal basis of
in a neighborhood of
. Then, finding the least squares solution of (
1) can be reduced to finding the least squares solution of (
5). Our Gauss–Newton type method for solving (
5) takes the form
where
and
,
and
denote the derivative of
, the
i-th component of
, and the Hessian matrix of
, respectively.
In [
8], we proved that, at any point
in the iteration (
6), the particular matrix
whose columns form an orthonormal basis for the nullspace of
can be replaced by any other orthonormal basis
for this space, without changing
. Thus, instead of using the function
in (
6), we may use a different function
at each iteration point
. This freedom to use any orthonormal basis for the nullspace of
at the point
is key to our numerical technique. It also permits us to write simply
and
instead of
and
in the rest of this note, regardless of the particular matrix
being used, and for simplicity we write
instead of
. The analysis below centers on the efficient computation of the terms required to use (
6) at the particular point
.
The following results are derived and used in [
5,
6,
7,
8] and are required for our analysis below. With the terms as defined above, and writing
for the
i-th column of the matrix
,
and the
-th entry of the
Hessian matrix
is
Here, we use and to denote the termwise first derivative w.r.t. and second derivative w.r.t. , respectively, of , for .
Define the
nonsingular matrix
Then, we showed in [
8] that
and
where the
-entry
of the upper triangular matrix
is
Throughout the rest of this paper, we assume not only that
has full rank
N, but also that it is sufficiently well-conditioned that LU factorization with partial pivoting can be performed safely. For separable nonlinear systems with a rank-deficient
, the technique of bordered matrices [
13,
14,
15,
16] may be applied to produce a full rank matrix.
The following theorem lays the foundation for our approach to using the LU factorization.
Theorem 1. Let the LU factorization of the rectangular matrix with some permutation matrix bewhere is , is an permutation matrix, is an unit lower triangular matrix and is an nonsingular upper triangular matrix. - (a)
The matrixwhereis nonsingular. - (b)
DefineThen, the thin positive QR factorizationproduces a thin orthonormal matrix and a small nonsingular upper triangular matrix , and
Proof. (a) By direct computation,
is a product of nonsingular matrices:
(b) By (
15) and (
16),
☐
From (
17), we see that the matrix
constructed in the theorem satisfies (
7) at
, and thus, as shown in [
8], Equations (
11) and (
12) hold for this matrix at
.
To determine the computational cost, recall that we assume that
, while
ℓ and
n are fixed and small. Then, the thin QR factorization in (
16) is cheap since the matrix
has only
ℓ columns. More specifically, the LU factorization of the
dimensional
in (
13) costs
flops, with an additional
comparisons to produce
[
12], while computing
only costs
flops and the thin QR factorization of the
dimensional matrix
takes only
flops [
12].
Notice that solving (
11) and (
12) involves solving several equations of the form
. However, the matrix
defined in (
10) and which is used in (
11) and (
12), differs from the matrix
defined in (
14) whose LU factors are given in (
18). To reduce the cost of solving (
11) and (
12) from
to
operations, we can exploit the factorization (
18) of the matrix
, as below. This result is essentially an application of the Sherman–Morrison–Woodbury formula.
Theorem 2. Let be the LU factorization of with a permutation matrix as in (13), and let the matrices , and be defined as in (14)–(16), while is defined by (10). Denote Then,
- (a)
;
- (b)
For , if and only if
Proof. Part (b) follows directly from (a). ☐
Finally, we show how the value
defined by (
3) can be computed efficiently using the LU factorization of
.
Theorem 3. if and only if Proof. Let the QR factorization of
be
Since
is full rank, we see that
Since
is an orthonormal matrix, there exists an
orthogonal matrix
such that
Hence, the conclusion follows. ☐
Computing
via (
21) involves the matrix
of (
10), but the technique of Theorem 2 may again be used to replace this by the known LU factorization (
18) of
obtained from the LU factorization of
. Thus, the cost of this step is also only
flops.
4. Examples
We present three examples to illustrate our algorithm for computing the least squares solution of overdetermined separable equations. Our examples have large N and small , approximating the theoretical characteristic that as . Similar results hold if we use much larger N.
Example 1. Consider the least squares solution of the overdetermined systemwherewithand (thus ), and where is the -th standard unit vector in . The selected value of is the value of the exact solution . Here, . The problem can be regarded as a constrained generalized eigenvalue problem. We choose
,
, and
. For the eigenvalues of matrices of the form of
, see [
17]. Beginning with
and using Algorithm 1, we obtain the results in
Table 1 for
, which shows quadratic convergence of the method.
After
is obtained, we compute
using (
21), where
,
. We find that
, and the 2-norm of the residual of
is ≈0.06.
Example 2. Consider the least squares solution of the overdetermined systemwhere Here, we use , thus , and .
We choose
. The least squares solution of (
23) is
. Beginning with
and using Algorithm 1, we obtain the results in
Table 2 for
. These results show the expected quadratic convergence. After
is obtained, we compute
using (
21). Here,
,
, and the exact residual is
. We find that
, while 2-norm of the residual of
is
.
Our last example arises from a discretization, using the finite element method, of the one-dimensional elliptic interface problem
, for
, where
for
and
for
. The boundary conditions are
and
, and the interface conditions are
and
. For more on interface problems, see [
18]. In this discretization, the slopes of the basis tent functions on either side of the interface
are modified from
and
to
,
and
,
on the respective line segments. In our setting,
and
are given, and
represent the nonlinear variables.
Example 3. Consider the least squares solution of the overdetermined systemwhere the matrix is defined aswith andwhere and , , and the square matrices F are tridiagonal of the formthe vectors E have all entries one, are the jth standard unit vectors, and The exact least squares solution is .
We use
,
,
,
,
,
,
and
. Using the Algorithm 1 with
, we obtain the results in
Table 3 for
, which shows quadratic convergence of the method. After
is obtained, we obtain
using (
21). Here,
and
, and the exact residual is
. We find that
and the 2-norm of the residual of
is
.