Gauss–Newton–Secant Method for Solving Nonlinear Least Squares Problems under Generalized Lipschitz Conditions

: We develop a local convergence of an iterative method for solving nonlinear least squares problems with operator decomposition under the classical and generalized Lipschitz conditions. We consider the case of both zero and nonzero residuals and determine their convergence orders. We use two types of Lipschitz conditions (center and restricted region conditions) to study the convergence of the method. Moreover, we obtain a larger radius of convergence and tighter error estimates than in previous works. Hence, we extend the applicability of this method under the same computational effort.


Introduction
Nonlinear least squares problems often arise while solving overdetermined systems of nonlinear equations, estimating parameters of physical processes by measurement results, constructing nonlinear regression models for solving engineering problems, etc. The most used method for solving nonlinear least squares problems is the Gauss-Newton method [1]. In the case when the derivative can not be calculated, difference methods are used [2,3].
Some nonlinear functions have a differentiable and a nondifferentiable part. In this case, a good idea is to use a sum of the derivative of the differentiable part of the operator and the divided difference of the nondifferentiable part instead of the Jacobian [4][5][6]. Numerical study shows that these methods converge faster than Gauss-Newton type's method or difference methods.
In this paper, we study the local convergence of the Gauss-Newton-Secant method under the classical and generalized Lipschitz conditions for first-order Fréchet derivative and divided differences.
Let us consider the nonlinear least squares problem: where residual function F + G : R p → R m (m ≥ p) is nonlinear in x, F is a continuously differentiable function, and G is a continuous function, the differentiability of which, in general, is not required. We propose the following modification of the Gauss-Newton method combined with the Secant-type method [4,6] for finding the solution to problem (1): x n+1 = x n − (A T n A n ) −1 A T n (F(x n ) + G(x n )), n = 0, 1, . . . , (2) where A n = F (x n ) + G(x n , x n−1 ), F (x n ) is a Fréchet derivative of F(x); G(x n , x n−1 ) is a divided difference of the first order of function G x [7] at points x n , x n−1 ; and x 0 , x −1 are given. Setting A n = F (x n ), for solving problem (1), from (2) we obtain an iterative Gauss-Newton-type method: x n+1 = x n − (F (x n ) T F (x n )) −1 F (x n ) T (F(x n ) + G(x n )), n = 0, 1, . . . .
For m = p, problem (1) turns into a system of nonlinear equations: In this case, method (2) is transformed into the combined Newton-Secant method [8][9][10]: and method (3) into the Newtons-type method for solving nonlinear equations [11]: The convergence domain is small (in general), and error estimates are pessimistic. These problems restrict the applicability of these methods. The novelty of our work is in the claim that these problems can be addressed without adding hypotheses. In particular, our idea is to use a center and restricted radius Lipschitz conditions. Such an approach to the study of the convergence of methods allows for extending the convergence ball of the method and improving error estimates.
The remainder of the paper is organized as follows: Section 2 deals with the local convergence analysis. The numerical experiments appear in Section 3. Section 4 contains the concluding remarks and ideas about future works.

Local Convergence Analysis
Let us consider, at first, some auxiliary lemmas needed to obtain the main results. Let  Additionally, h(t) at t = 0 is defined as h(0) = lim where ρ(x) = x − x * , x * ∈ D is a solution of problem (1), and L 0 is an integrable, positive, and nondecreasing function on [0, T].
The functions M 0 , L, M, L 1 and M 1 introduced next are as the function L 0 : integrable, positive, and nondecreasing functions defined on [0, 2R]. Definition 2. The first order divided difference G(x, y) satisfies the center Lipschitz condition on Let B > 0 and α > 0. We define function ϕ on [0, +∞) by Suppose that equation has at least one positive solution. Denote by γ the minimal such solution. Then, we can Definition 4. The first order divided difference G(x, y) satisfies the restricted radius Lipschitz condition on Ω 0 with M average if Definition 5. The Fréchet derivative F satisfies the radius Lipschitz condition on D with L 1 average if Definition 6. The first order divided difference G(x, y) satisfies the radius Lipschitz condition on D with M 1 average if since Ω 0 ⊆ D. By L(L 0 , M 0 ), we mean that L (or M) depends on L 0 and M 0 by the definition of Ω 0 . In case any of (15)- (17) are strict inequalities, the following benefits are obtained over the work in [4] using L 1 , M 1 instead of the new functions: (a1) An at least as large convergence region leading to at least as many initial choices; (a2) At least as tight upper bounds on the distances x n − x * , so at least as few iterations are needed to obtain a desired error tolerance.
These benefits are obtained under the same computational effort as in [4], since the new functions L 0 , M 0 , L, and M are special cases of the functions L 1 and M 1 . This technique of using the center Lipschitz condition in combination with the restricted convergence region has been used by us on Newton's, Secant, Newton-like methods [14,15], and can be used on other methods, too, with the same benefits.
The proof of the next result follows as the corresponding one in [4], but there are crucial differences, where we use (L 0 , L) instead of L 1 and (M 0 , M) instead of M 1 used in [4].
We use the Euclidean norm. Note that the following equality is satisfied for the Theorem 1. Let F + G : R p → R m be continuous on an open convex subset D ⊂ R p , F be a continuously differentiable function, and G be a continuous function. Suppose that problem (1) has a solution x * ∈ D; the inverse operation (11), and (12) hold, and γ given in (10) exists. Furthermore, and Ω = Ω(x * , r * ) ⊆ D, where r * is the unique positive zero of the function q given by Then, for x 0 , x −1 ∈ Ω, the iterative sequence {x n }, n = 0, 1, . . . , generated by (2), is well defined, remains in Ω, and converges to x * . Moreover, the following error estimates hold for each n = 0, 1, 2, . . .: where (23) Proof. We obtain Thus, the graph of function q(r) crosses the positive r-axis only once on (0, R). Finally, from the monotonicity of q and since q(γ) > 0, we obtain r * < γ, so Ω(x * , r * ) ⊂ Ω 0 . We denote A n = F (x n ) + G(x n , x n−1 ). Let n = 0. By the assumption x 0 , x −1 ∈ Ω, we obtain the following estimation: Using conditions (11) and (12), we obtain where ρ k = ρ(x k ). Then, from inequality (29) and the equation q(r) = 0, we obtain by (10) Next, from (29)-(31) and the Banach lemma [16], it follows that (A T 0 A 0 ) −1 exists, and Hence, x 1 is correctly defined. Next, we will show that x 1 ∈ Ω(x * , r * ).
We can write Consequently, A T k A k −1 exists, and Therefore, x k+1 is correctly defined, and the following estimate holds: This proves that x k+1 ∈ Ω(x * , r * ) and estimate (22) for n = k. Thus, by the induction method, (2) is correctly defined, x n ∈ Ω(x * , r * ), and estimate (22) holds for each n = 0, 1, 2, . . .. It remains to be proven that x n → x * for n → ∞.
According to the choice of r * , we obtain Using estimate (22), the definition of functions a, b and constants C i (i = 1, 2, 3, 4), we have According to the proof in [17], under the conditions (42)-(45), the sequence {x n } converges to x * for n → ∞.
If η = 0, we have the nonlinear least squares problem with zero residual. Then, the constants C 1 = 0 and C 2 = 0, and estimate (22) takes the form This inequality can be written as Then, we can write an equation for determining the convergence order as follows: Therefore, the positive root, t * = 1 + √ 5 2 of the latter equation is the order of convergence of method (2). In case G(x) ≡ 0 in (1), we obtain the following consequences.
Indeed, if G(x) ≡ 0, then C 3 = 0, and estimate (22) takes the form which indicates the quadratic convergence rate of method (2).

Remark 2.
If L 0 = L = L 1 and M 0 = M = M 1 , our results specialize to the corresponding ones in [4]. Otherwise, they constitute an improvement as already noted in Remark 1. As an example, let q 1 , g 1 , C 1 1 , C 1 2 , C 1 3 , C 1 4 , r 1 * denote the functions and parameters where L 0 , L, M 0 , M are replaced by L 1 , L 1 , M 1 , M 1 , respectively. Then, we have in view of (15)-(17) that q(r) ≤ q 1 (r), and Hence, we have the new error bounds (22) being tighter than the corresponding (6) in [4], and the rest of the advantages (already mentioned in Remark 1) holding true.
Next, we study the convergence of method (2) if L 0 , L, M 0 , M are constants, as a consequence of Theorem 1.

Corollary 3.
Let F + G : R p → R m be continuous on an open convex subset D ⊂ R p , F be a continuously differentiable, and G be a continuous function on D. Suppose that problem (1) has a solution x * ∈ D, and the inverse operation exists, such that (A T * A * ) −1 ≤ B. Suppose that the Fréchet derivative F satisfies the classic Lipschitz conditions for each x, y ∈ Ω 0 (59) and the function G has a first order divided difference G(x, y) that satisfies where Furthermore, and Then, for each x 0 , x −1 ∈ Ω, the iterative sequence {x n } , n = 0, 1, ..., generated by (2) is well defined, remains in Ω, and converges to x * , such that the following error estimate holds for each n = 0, 1, 2, . . .: where g(r) = B[1 − B(2α + (L 0 + 2M 0 )r)(L 0 + 2M 0 )r] −1 ; (65) The proof of Corollary 3 is analogous to the proof of Theorem 1.

Numerical Examples
In this section, we give examples to show the applicability of method (2) and to confirm Remark 2. We use the norm where x = (u, v). The solution of this problem x * ≈ (0.917889, 0.288314) and η ≈ 0.079411.
Let us give the number of iterations needed to obtain an approximate solution of this problem. We test method (2) for the different initial points x 0 = δ(1.1, 0.5) T , where δ ∈ R, and use the stopping criterion x n+1 − x n ≤ ε. The additional point x −1 = x 0 + 10 −4 . The numerical results are shown in Table 1. Table 1. Results for Example 1, ε = 10 −8 .
Number of iterations 12 8 15 17 25 In Table 2, we give values of x n+1 , x n+1 − x n and the norm of residual at each iteration. Example 2. Let function F + G : D ⊆ R → R 3 be defined by [5]: where λ, µ ∈ R are two parameters. Here x * = 0 and η = √ 2|µ|. Thus, if µ = 0, then we have a problem with zero residual.
Let us consider Example 2 and show that r 1 * ≤ r * and the new error estimates (64) are tighter than the corresponding ones in [4]. We consider the case of the classical Lipschitz conditions (Corollary 3). Error estimates from [4] are as follows: where They can be obtained from (64) by replacing r * , L 0 , L, M 0 , M in g(r), C 1 , C 2 , C 3 , C 4 by r 1 * , L 1 , L 1 , M 1 , M 1 , respectively. Similarly, Let us choose D = (−0.5; 0.5). Thus, we have Radii are written in Table 3.  Tables 4 and 5 report the left and right side of error estimates (64) and (73). We obtained these results for ε = 10 −8 and starting approximations x −1 = 0.2001, x 0 = 0.2. We see that the new error bounds (64) are tighter than the corresponding (73) from [4].

Conclusions
We developed an improved local convergence analysis of the Gauss-Newton-Secant method for solving nonlinear least squares problems with nondifferentiable operator. We use a center and restricted radius Lipschitz conditions to study the method. As a consequence, we obtain a larger radius of convergence and tighter error estimates under the same computational effort as in earlier papers. This idea can be used to extend the usage of other methods with inverses, such as Newton-type, Secant-type, single-step, or multi-step, to mention a few. This should be our future work. Finally, it is worth mentioning that except for the methods used in this paper, some of the most representative computational intelligence algorithms can be used to solve the problems, such as monarch butterfly optimization (MBO) [18], the earthworm optimization algorithm (EWA) [19], elephant herding optimization (EHO) [20], the moth search (MS) algorithm [21], the slime mould algorithm (SMA), and Harris hawks optimization (HHO) [22].