Improved Convergence Analysis of Gauss-Newton-Secant Method for Solving Nonlinear Least Squares Problems

Abstract: We study an iterative differential-difference method for solving nonlinear least squares problems, which uses, instead of the Jacobian, the sum of derivative of differentiable parts of operator and divided difference of nondifferentiable parts. Moreover, we introduce a method that uses the derivative of differentiable parts instead of the Jacobian. Results that establish the conditions of convergence, radius and the convergence order of the proposed methods in earlier work are presented. The numerical examples illustrate the theoretical results.


Introduction
Nonlinear least squares problems often arise while solving overdetermined systems of nonlinear equations, parameter estimation of physical processes by measurement results, constructing nonlinear regression models for solving engineering problems, etc.
The nonlinear least squares problem has the form min where the residual function F : R p → R m (m ≥ p) is nonlinear in x; F is a continuously differentiable function.Effective methods for solving nonlinear least squares problems is the Gauss-Newton method [1-3] However, in practice, there are often problems with the calculation of derivatives.Hence, one can use the iterative-difference methods.These methods do not require calculation of derivatives.Moreover, they do not perform worse than Gauss-Newton method in terms of the convergence rate and the number of iterations.In some cases, nonlinear functions consist of differentiable and nondifferentiable parts.However, it is possible to use iterative-difference methods [4][5][6][7] x n+1 = x n − (A T n A n ) −1 A T n F(x n ), n = 0, 1, ..., where A n = F(x n , x n−1 ), It is desirable to build iterative methods that take into account properties of the problem.In particular, we can use only derivative of differentiable part of operator instead of full Jacobian, which in fact, does not exist.The methods obtained using this approach converge slowly.More efficient methods use sum of the derivatives of the differentiable part and divided difference of the nondifferentiable part of the operator instead of the Jacobian.Such an approach shows great results in the case of solving nonlinear equations.
In this work we study a combined method for solving nonlinear least squares problem, based on the Gauss-Newton, secant methods.We also use a method, requiring only derivative from the differentiable part of operator.We prove the local convergence and show efficiency on test cases when comparing with secant type methods [5,6].The convergence region of iterative methods is small in general.This fact limits the number of initial approximations.It is therefore important to extend this region without requiring additional hypotheses.The new approach [8] leads to larger convergence radius than before [9].We achieve this goal by locating an at least as small region as before containing the iterates.Then, the new Lipschitz constants are at least as tight as the old Lipschitz constants.Moreover, using more precise estimates on the distances involved, under weaker hypotheses, and under the same computational cost, we provide an analysis of the Gauss-Newton-Secant method with the following advantages over the corresponding results in [9]: larger convergence region; finer error estimates on the distances involved, and an at least as precise information on the location of the solution.
The rest of the paper is given as follows.Section 2 contains the statement of the problem, in Sections 3 and 4, we present the local convergence analysis of the first and second method, respectively.In Section 5, we provide the numerical examples.The article ends with some conclusions.

Description of the Problem
Consider the nonlinear least squares problem where residual function F + G : R p → R m (m ≥ p) is nonlinear in x; F is continuously differentiable function; G is continuous function, differentiability of which, in general, is not required.We propose a modification of the Gauss-Newton method to find a solution of problem (4): Here, F (x n ) is Fréchet derivative by F(x); G(x n , x n−1 ) is a divided difference of order one for function G (x) [10], where vectors x n , x n−1 and x 0 , x −1 are given initial approximations, satisfying (5) we get Gauss-Newton type iterative method for solving problem (4) In case of m = p, problem (4) turns into a system of nonlinear equations Then, it is well known ([3], p. 267) that techniques for minimizing problem (4) are techniques for finding a solution x * of Equation (7).In this case (5) transforms into the Newton-Secant combined method [11,12] and method (6) into Newton's-type method for solving nonlinear Equation (7) [13] We assume from now on that function G is differentiable at x = x * .

Local Convergence Analysis (5)
Sufficient conditions and the convergence order of the iterative process (5) are presented.However first, we need some crucial definitions.They are needed to provide a clear relationship between the Lipschitz constants appearing in the local convergence analysis and the relationship between them.Definition 1.The Fréchet derivative F satisfies the center-Lipschitz condition on D, if there exists L 0 > 0 such that for each Definition 2. The divided difference G(x, y) satisfies the center-Lipschitz condition D × D, if there exists Let B > 0 and α > 0. Define function ϕ : [0, +∞) → [0, +∞) by Let U(x * , r * ) = {x : ||x − x * || ≤ r * }, r * > 0. Suppose that equation ϕ(r) = 1 has at least one positive solution.Denote by γ the smallest such solution.Define Definition 3. The Fréchet derivative F satisfies the restricted Lipschitz condition on D 0 , if there exists L > 0 such that for each x, y Definition 4. The first order divided difference G(x, y) satisfies the restricted Lipschitz condition on D 0 × D 0 , if there exists M > 0 such that for each x, y, u Next, we also state the definitions given in [9], so we can compare them to preceding ones.
Definition 5.The Fréchet derivative F satisfies the Lipschitz condition on D, if there exists L 1 > 0 such that for each x, y Definition 6.The first order divided difference G(x, y) satisfies the Lipschitz condition on D × D, if there exists M 1 > 0 such that for each x, y, u, v ∈ and since D 0 ⊆ D. If any of (17)-( 20) are strict inequalities, then the following advantages are obtained over the work in [9] using L 1 and M 1 instead of the new constants: (a 1 ) At least as large convergence domain leading to at least as many initial choices.
(a 2 ) At least as tight upper bounds on the distances ||x n − x * ||, so at most as many iterations are needed to obtain a desired error tolerance.
It is always true that D 0 is at least as small and included in D by (12).Here lies the new idea and the reason for the advantages.Notice that these advantages are obtained under the same computational cost, as in [9], since the new constants L 0 , M 0 , L and M are special cases of constants L 1 and M 1 .This technique of using the center Lipschitz condition in combination with the restricted convergence region has been used on Newton's, Secant and Newton-like methods [14] and can be used on other methods in order to extend their applicability.
The Euclidean norm, and the corresponding matrix norm are used in this study which has the advantage A T = A .
The proof of the next result follows the corresponding one in [9] but there are crucial differences where we use (L 0 , L) instead of L 1 and M 0 , M instead of M 1 .
Theorem 1.Let F + G : R p → R m be continuous on set D ⊆ R p , F be continuously differentiable in this set, and G(• , • ) : D × D → L(R p , R m ) be a divided difference of order one.Suppose, the problem (4) has a solution x * on set D, and the inverse operator , ( 10), ( 13), ( 14) hold, and γ defined in (11) exists.Moreover, and U(x * , r * ) ⊆ D, where r * is the unique positive zero of function q, defined by Then, for x 0 , x −1 ∈ U(x * , r * ) method (5) is well defined and generates the sequence {x n }, n = 0, 1, ..., which belongs to set U(x * , r * ), and converges to the solution x * .Moreover, the following error bounds hold Proof.According to the intermediate value theorem on [0, r] for sufficiently large r and in view of ( 22) function q has at least one positive zero.Denote by r * the least such positive zero.Moreover, we have q (r) ≥ 0 for r ≥ 0. Indeed, this zero is unique on [0, r].
As in the derivation of (28), using ( 9), (21) and the definition of function ϕ, we get in turn that exists and Therefore, iteration x k+1 is well defined, and the following estimate holds That proves x k+1 ∈ U(x * , r * ) and estimate (24) for n = k.Thus, method ( 5) is well defined, x n ∈ U(x * , r * ) for all n ≥ 0 and estimate (24) holds for all n ≥ 0. It remains to prove that x n → x * for n → ∞.
Define a and b on [0, r * ] by According to r * , we get Using estimate (24), the definitions of constants C i , i = 1, 2, ..., 7, and functions a and b, for n ≥ 0 we get following As it was shown in [1], under conditions (29)-(32) sequence {x n } converges to x * , as n → ∞.
Let G(x) ≡ 0 in (4), corresponding to the residual functions being differentiable.Then, from Theorem 1, we obtain the following corollary.
and U(x * , r * ) ⊆ D, where r * is unique positive zero of function q, defined by Then, for x 0 ∈ U(x * , r * ) method ( 6) is well defined and generates the sequence {x n }, n = 0, 1, ... which belongs to set U(x * , r * ), and converges to the solution x * .Moreover, the following error bounds hold Proof.According to intermediate value theorem on [0, r] for sufficiently large r and in view of (39) function q has a least positive zero, denoted by r * , and q (r) ≥ 0 for r ≥ 0. Indeed, this zero is unique on [0, r].The proof analogous to the one given in Theorem 1.
Let A n = F (x n ), and set n = 0.By assuming x 0 , x −1 ∈ U(x * , r * ).By analogy to (26) in Theorem 1, we get Taking into account, that from inequality (43), definition of r * given in (40), we get From the Banach Lemma on invertible operators [3], and (45) A T 0 A 0 is invertible.Then, from (43)-(45), we get Hence, iteration x 1 is well defined.Next, we will show that x 1 ∈ U(x * , r * ).We have the estimate exists and Therefore iteration x k+1 is well defined, and we get in turn that That proves x k ∈ U(x * , r * ), and estimate (41) for n = k.Thus, iterative process ( 6) is well defined, x n ∈ U(x * , r * ) for all n ≥ 0 and estimate (41) holds for all n ≥ 0.

Define function
Using estimate (41), the definitions of constants C i , i = 1, 2, 3 and function a, for n ≥ 0, we get the following For any r * > 0 and initial point x 0 ∈ U(x * , r * ), r exists and 0 < r < r * such that x 0 ∈ U(x * , r ).Similarly to the proof that all iterates stay in U(x * , r * ), we show that all iterates stay in U(x * , r ).So, estimation (47) holds, if r * is replaced by r .In particular, from (47) for n ≥ 0, we get where a = a(r ).Obviously a ≥ 0, a < a(r * ) = 1.Therefore, we obtain However, a n+1 → 0 for n → ∞.Hence, sequence {x n } converges to x * as n → ∞, with a rate of geometric progression.
The same type of improvements as in Theorem 1 are obtained for Theorem 2 (see Remark 2).
Remark 3. As we can see from estimations (41) and (42), convergence of method (6) depends on α, L 0 , L and M. For problems with weak nonlinearity (α, L 0 , L and M 0 -"small") convergence rate of iterative process is linear.In case of strongly nonlinear problems (α, L 0 , L and/or M 0 -"large") method (6) may not converge at all.

Numerical Experiments
Let us compare the convergence rate of combined method (5), Gauss-Newton type method (6) Secant-type method for solving nonlinear least squares problem [5,6] on some test cases with Testing is carried out on nonlinear systems with a nondifferentiable operator with zero and non-zero residual.Classic Gauss-Newton and Newton methods can not be used for solving such a problem.Results are searched with an accuracy ε =

Conclusions
Based on the theoretical studies, the numerical experiments, and the comparison of obtained results, we can argue that the combined differential-difference method (5) converges faster than Gauss-Newton type method (6) and Secant type method (48).Moreover, the method has high convergence order (1 + √ 5)/2 in case of zero residual and does not require calculation of derivatives of the nondifferentiable part of operator.Therefore, the proposed method (5) solves the problem efficiently and fast.

Table 1 .
Number of iteration made to solve test problem.