A Class of Sparse Direct Broyden Method for Solving Sparse Nonlinear Equations

: In our paper, we present a sparse quasi-Newton method, called the sparse direct Broyden method, for solving sparse nonlinear equations. The method can be seen as a Broyden-like method and is a least change update satisfying the sparsity condition and direct tangent condition simulta-neously. The local and q-superlinear convergence is presented based on the bounded deterioration property and Dennis–Moré condition. By adopting a nonmonotone line search, we establish the global and superlinear convergence. Moreover, the unit step length is essentially accepted. Numerical results demonstrate that the sparse direct Broyden method is effective and competitive for large-scale nonlinear equations.


Introduction
We consider the nonlinear equation where F : R n → R n is a continuously differentiable mapping. We denote F (x) as the Jacobian matrix of F(x) at x and pay attention to the case F (x) having sparse or special structures. Specifically, one has F(x) = (F 1 (x), F 2 (x), · · · , F n (x)) T , and F (x) = (∇F 1 (x), ∇F 2 (x), · · · , ∇F n (x)) T .
Nonlinear equations arise from many scientific and engineering problems and have various applications in the fields such as physics, biology, and many other fields [1]. The linearization of nonlinear Equation (1) at an iterative point x k is when F (x k ) is nonsingular, we obtain the Newton-Raphson method Newton's method is theoretically efficient because it is locally quadratically convergent when the Jacobian matrix is nonsigular and Lipschitz continuous at the solution of F(x) [2]. However, at each iteration, Newton's method must compute the exact Jacobian matrix to The quasi-Newton matrix B k can be updated by kinds of quasi-Newton update formulae, such as Broyden's method, Powell's symmetric Broyden method, BFGS method, and DFP methods [3,4]. Quasi-Newton methods are popular among small and medium-scale problems, since they possess local and superlinear convergence without computing the Jacobian [5][6][7]. However, when the dimension of nonlinear equations is large, the matrix B k will be dense. Then, the computation and time complexity will be high. There are two considerations to motivate us to consider the sparse quasi-Newton methods for solving sparse nonlinear equations in this paper. One is the fact that there are lots of nonlinear equations with sparse or special Jacobian. Moreover, quasi-Newton methods for solving (1) have a good property that they can maintain the sparse structure of Jacobian matrices. Thus, in this paper, we are interested in constructing a sparse quasi-Newton method for solving sparse nonlinear equations, where the Jacobian matrix F (x k ) has sparse or special structure. Earlier work on sparse quasi-Newton methods was carried out by Schubert [8] and Toint [9], where Schubert modified Broyden's method by updating B k row by row so that the sparsity can be maintained and Toint studied sparse and symmetric quasi-Newton methods. There also have been many kinds of methods for solving large-scale nonlinear systems, such as limited-memory quasi-Newton methods [10,11], partitioned quasi-Newton methods [12][13][14], diagonal quasi-Newton method [15,16], and column updating method [17].
However, the global convergence of quasi-Newton methods for nonlinear equations is a relatively difficult topic, not to mention the dense case. This mainly results from the fact that the quasi-Newton direction may not be a descent direction of the merit function Griewank [18] and Li and Fukushima [19] have proposed some line search techniques to establish the global convergence of the quasi-Newton method. The purpose of our paper is to develop a sparse quasi-Newton method and study its local and global convergence. We consider Broyden's method If we replace y k with F (x k+1 )s k , we can obtain the following update which fulfills the direct tangent condition [20,21] B k+1 s k = F (x k+1 )s k .
We call the corresponding method the direct Broyden method. Then, we will develop a sparse direct Broyden method, which enjoys the following nice properties: (a) the new sparse quasi-Newton method is a least change update satisfying the direct tangent condition; (b) the proposed method can preserve the sparsity property of the original Jacobian matrix F (x) exactly; and (c) the sparse direct Broyden method is globally and superlinearly convergent. Presented limited numerical results demonstrate that our algorithm has better performance than Schubert's method and the direct Broyden method in iteration counts, function evaluation counts, and Broyden's mean convergence rate. The paper is organized as follows: in Section 2, we propose a sparse direct Broyden method and list its nice property. For the full step sparse direct Broyden method, local and superlinear convergence is also given. By adopting a nonmonotone line search, we prove the global and superlinear convergence of the method proposed in Section 2. Moreover, after finitely many iterations, the unit step length will always be accepted. In Section 4, we do some preliminary numerical experiments to test the efficiency of the proposed method. In the last section, we give the conclusion.

A New Sparse Quasi-Newton Update and Local Convergence
We pay attention to nonlinear Equation (1), whose Jacobian matrix is sparse or has a special structure. Firstly, we introduce some notations to describe the sparsity structure of the Jacobian as that in [22]. Define the sparsity features of the ith row of F (x) where e j is the jth column of identity matrix. Then, we can obtain the set of matrices V that preserve the sparsity pattern of F (x): Define a projection operator S i , i = 1, 2, . . . , n, which maps R n onto V i : Similar to the derivation of Schubert's method [8], we consider the sparse extension of direct Broyden update [2] which fulfills the direct tangent condition Then, we can obtain a compact representation of the new sparse quasi-Newton update as where the pseudo-inverse of α ∈ R is defined by The new sparse quasi-Newton method (3) updates the quasi-Newton matrix row by row to preserve the zero and nonzero structure of the Jacobian. Then, we can obtain a quasi-Newton method as where d k can be obtained by solving the following subproblem and B k is updated by sparse direct Broyden update We call the corresponding method the sparse direct Broyden method. When α k ≡ 1, we refer to it as a full step sparse direct Broyden method. (3) is the unique solution to the following minimization problem:

Lemma 1. The B k+1 defined by
Proof. Firstly, we will prove that B k+1 ∈ V ∩ Q(F (x k+1 ), s k )}. For i = 1, 2, · · · , n, multiply both sides of (3) by e T i , to obtain According to the definition of the operator S i , we have Then, (5) can be written as Then, we will prove the uniqueness. Suppose thatB k+1 ∈ Q(F (x k+1 ), s k )}. Sincē Taking the Frobenius norm, where the first inequality follows from the triangle inequality. Since the function f (B) = B − B k F is strictly convex and the constraint condition (4) is convex, we can obtain the uniqueness.
To analyze the local convergence of the full step sparse direct Broyden method, first we show that the bounded deterioration property is satisfied with some constants α 1 , Lemma 2. Suppose that F : R n → R n is continuously differentiable in D 0 , which is an open and convex set. Let x * ∈ D 0 be a solution of (1) at which F (x * ) is nonsingular. Suppose that there exists K = (k 1 , k 2 , · · · , k n ) ∈ R n with k i ≥ 0, for i = 1, 2, . . . , n, such that Then, one has the estimation Proof. For the case s k = 0, then it is obvious that F(x k ) = 0 and x k = x * . For the case s k = 0, subtracting F (x * ) from both sides of the update formula and multiplying by e T i , i = 1, 2, · · · , n, one has Taking norms yields If s(i) k = 0, then we have (s(i) T k s(i) k ) + = 0. It is obvious that Thus, (7) reduces to Make a summation to obtain Based on the classical framework of Dennis and Moré, we give the following local convergence, which can be proved similar to the case of Broyden's method [6,7]. Theorem 1. Let the conditions in Lemma 2 hold. Then, there exist constants , δ > 0 such that, if x 0 − x * 2 < and B 0 − F (x * ) F < δ, the sequence {x k } is well defined and converges to x * . Furthermore, the convergence rate is superlinear.
Proof. According to Lemma 2, one has which means that the estimation (6) is satisfied with α 1 = 0 and α 2 = L. Then, we obtain the local and linear convergence of {x k }.
Next, we will show the Dennis-Moré condition [7] lim k→∞ is satisfied. According to (8), one has then, the result can be proved similar to that in [7].

Algorithm and Global Convergence
In this section, by the use of LF condition [19], we propose a global sparse Broyden method, whose specific steps are listed in the following Algorithm 1.
Step 3. If then let α k := 1 and go to Step 5. Else, go to Step 4.
Step 4. Set α k = r i k , where i k is the smallest nonnegative integer i satisfying where η k is defined as in (10).
Step 6. Update B k to obtain B k+1 by sparse direct Broyden update Set k := k + 1. Go to Step 1.

Remark 1.
It is noticed that the update formula (14) may be singular when B k is nonsingular.
In this case, we use a similar technique in [22,23] and give the following discussion about the nonsingular sparse direct Broyden update. Set H 0 = B k , and for i = 1, 2, . . . , n, let Since e T i H 0 = e T i H 1 = · · · = e T i H i−1 , then For a scalar α ∈ (0, 1), θ i k can be chosen such that Thus, | detB k+1 |≥ α | detB k | and θ i k can be chosen so that B k+1 is nonsigular, and | θ i k − 1 |≤θ < 1.
Thus, we can define the sparse direct Broyden-like update formula as

Remark 2.
It can be seen that the update formula (14) includes F (x k+1 ), but it does not need to compute F (x k+1 ) in practice. Automatic differentiation is a chain-rule-based technique for evaluating the derivatives with respect to the input variables of functions defined by a high-level computer program. Automatic Differentiation has two basic modes of operations, the forward mode and the reverse mode. In the forward mode, the derivatives are propagated throughout the computation using the chain rule, while in the reverse mode the adjoint derivatives are propagated backwards. The forward mode and reverse mode of automatic differentiation provide the possibility to compute F (x)s and σ T F (x) exactly within machine accuracy for given vectors x, s and σ.
To establish the global convergence, we need the following conditions.

Assumption 1.
(1) F is continuously differentiable on Ω, which is a bounded level set defined by (3) F (x) is nonsingular for any x ∈ Ω.
First, we give the following important lemmas.
Proof. According to the line search (12) and (13), one has for any k On the basis of (12) and (13), we have for each k that We can obtain (15) by taking summation on both sides for k from 0 to ∞.
we then obtain the convergence of { F(x k ) }. Denote Lemma 4. Suppose that the sequence {x k } is generated by Algorithm 1, and F (x) is Lipschitz continuous with a common Lipschitz constant L > 0. If In addition, there exists a subsequence of {δ k } tending to zero. If In addition, the whole sequence {δ k } converges to zero.
Proof. According to the update (14), we have Taking norms yields Then, one has that, for k ≥ 1, Making summation on both sides from k = 0 to t − 1, we have for 1 ≤ p < t Dividing both sides by t and letting t → ∞, we have If ∑ ∞ k=0 s k 2 < ∞, then the Lipschitz continuity of F (x) together with the last inequality implies Then, there is a subsequence of {δ k } tending to zero. If ∑ ∞ k=0 s k < ∞, then (17) comes from (18). Moreover, the whole sequence {δ k } converges to zero. This completes the proof.
Theorem 2. Let the conditions in Assumption 1 hold. Then, the sequence {x k } generated by Algorithm 1 converges to the unique solution x * of (1).

Proof. We first verify lim
According to Lemma 3, the sequence { F(x k ) } converges. Thus, we only need to prove that there is an accumulation point of {x k }, which is the unique solution of (1). If there are infinitely many α k , which is obtained by the line search condition (12), then holds for infinitely many k. This indicates lim inf k→∞ F(x k ) = 0. There are only finite many α k , which is obtained by the line search condition (12). By (15) and Lemma 4, there is a subsequence {δ k } k∈K converging to zero. Since {x k } K is bounded, we may assume that {x k } K → x * without loss of generality. Hence, {F (x k+1 )} tends to F (x * ), and there exists a constant C 1 such that F (x k+1 ) ≤ C 1 for all sufficiently large k ∈ K. According to the subproblem (11) and the definition of δ k , one has which indicates that there exists a constant M 1 such that holds for all sufficiently large k ∈ K. Thus, the subsequence {d k } K is bounded, and we can assume that Taking limit in the subproblem (11) as k → ∞, k ∈ K, one has Denote According to the line search rule, when k ∈ K is sufficiently large, α k < 1 and hence Multiplying both sides of (21) by Combined with (20), we have F(x * ) = 0. Then, we complete the proof.
In what follows, we will show that, when k is sufficiently large, the α k ≡ 1 will be accepted. Theorem 3. Suppose Assumption 1 holds and {x k } is generated by Algorithm 1. Then, there exist a constant δ > 0 and an indexk such that α k = 1 whenever δ k ≤ δ and k ≥k. Furthermore, the inequality (12) holds for all k ≥k satisfying δ k ≤ δ.
Proof. According to Theorem 2, {x k } converges to the solution x * of (1). Then, there exists a constant M 2 > 0 such that F (x k+1 ) −1 ≤ M 2 for all k sufficiently large. Moreover, it can be deduced similarly that there exists constant M 3 > 0 such that, when δ k ≤ δ and k is large enough, By the subproblem (11), one has This implies where M 4 an upper bound of F (x) on the level set Ω. Then, by the last inequality, we have On the other hand, by the nonsingularity of F (x * ) and the convergence of {x k }, there is a constant m > 0 such that the inequality holds for all k sufficiently large. Thus, we deduce from (22) and (23) that, when δ k ≤ δ, Let δ = min{δ, 1 2 ρm(M 2 M 3 M 2 4 ) −1 }; then, we complete the proof.
The following theorem presents that Algorithm 1 is superlinearly convergent.

Theorem 4.
Let the Assumption 1 hold. Then, the sequence {x k } generated by Algorithm 1 converges to the unique solution x * of (1) superlinearly.
Proof. Let δ andk be as defined by Theorem 3. Then, according to Lemma 4, we have that holds for all k ≥k, which implies that, in this case, there are at least k 2 many indices j ≤ k satisfying δ j ≤ δ. Let k = max{k,k}. Moreover, on the basis of Theorem 3, for any k ≥ 2k , there are at least k 2 − k many indices j ≤ k, which make α j = 1 and Define J k = {j | (24) holds } and |J k | as the number of the elements in J k . Then, |J k | ≥ k 2 − k − 1. On the other side, for each j ∈ J k , we have Multiplying inequalities (24) with j ∈ J k and (25) with j ∈ J k from j = k to k yields Thus, we obtain Then, following from Lemma 4, one has δ k = 0.

Numerical Experiments
In this section, we compare the SDBroyden method with Schubert's method [8]. We also compare the SDBroyden method with a direct Broyden method and Newton's method. All the methods are written in MATLAB R2018a and run in an iMac with 16G. The product F (x)s is computed by the automatic differentiation tool TOLMAB [24].
The testing problems are listed in Appendix A. The Jacobian matrices of the tested problems have different structures such as: diagonal (Problem 1, 2), tridiagonal (Problems 3, 4, 5, 6, 7, 8), block-diagonal (Problems 9, 10, 11), and special structure (Problem 12). The parameters in Algorithm 1 are specified as [19] For all the methods, we also stop the iteration if the number of iterations exceeds 200. We report the numerical performance of the above four methods in Tables 1-7 and Figures 1 and 2 the stopping criterion was not satisfied.
(1) In the first set of our numerical experiments, we test the performance of the SDBroyden method and Schubert's method. When B 0 is chosen as unit matrix I, the results are listed in Tables 1 and 2, respectively. For SDBroyden method and Schubert's method, we compute the problems with dimensions (n = 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10,000, 20,000, 50,000), but we select a subset of the dimensions ( n = 10, 100, 1000, 2000, 10,000, 20,000, 50,000) to improve the readability of the corresponding tables. The two methods fail on two problems (3,8). Considering the iteration counts, the SDBroyden method is more efficient than Schubert's method on seven problems (1, 2, 4, 5, 10, 11, 12), equivalent to Schubert's method on three problems (6,7,9). For the total number of function evaluations, the SDBroyden method has better performance on seven problems (1,2,4,9,10,11,12), while Schubert's method needs less function evaluations on one problem (5), and both methods are equivalent on two problems (6, 7). As for the Broyden's mean convergence rate, SDBroyden works well on seven problems (1, 2, 4, 6, 10, 11, 12), equal to Schubert's method on three problems (5,7,9). It can be seen that the SDBroyden method outperforms Schubert's method in iteration counts, function evaluation counts, and Broyden's mean convergence rate. When B 0 is chosen as the exact Jacobian matrix F (x 0 ), the results are given in Tables 3 and 4, respectively. The two methods solve the 12 problems successfully. The SDBroyden method needs fewer iterations than Schubert's method on seven problems (1,2,4,5,8,10,11), equal iterations with Schubert's method on five problems (3,6,7,9,12). For the total number of function evaluations, the SDBroyden method is more efficient than Schubert's method on six problems (1,2,4,5,8,11) and equivalent to Schubert's method on six problems (3,6,7,9,10,12). As for the Broyden's mean convergence rate, SDBroyden has better performance on nine problems (1,2,3,4,5,8,10,11,12) and equals Schubert's method on two problems (7,9). The two methods are competitive on one problem (6). It also can be seen that the SDBroyden method outperforms Schubert's method in terms of number of iterations, number of function evaluations, and Broyden's mean convergence rate. Meanwhile, the CPU time of SDBroyden method is mostly more than that of Schubert's method.  Performance ration [25] is used to compare the numerical performance. For given solvers set S and problems set P, let t p,s be the number of iterations, the number of function evaluations or others, required to solve problem p by solver s. Then, define the performance ration as r p,s = t p,s min{t p,q : q ∈ S} , whose distribution function is defined as where N p is the number of problems in the set P. Thus, ρ s : R → [0, 1] was the probability for solver s ∈ S that a performance ratio r p,s was within a factor t ∈ R of the best possible ratio. According to the definition of performance profiles, we can see that the top curve corresponds to the best solver. In Figure 1, the performance of the two methods: the SDBroyden method and Schubert's method, relative to the number of iterations, and the number of function evaluations are evaluated. Figure 1 indicates that SDBroyden has better performance than Schubert's method on the number of iterations and number of function evaluations. (2) In the second set of numerical experiments, we compare the SDBroyden method with the direct Broyden quasi-Newton method (DBQN). We give the results of the DBQN method with B 0 = I in Table 5. The DBQN method fails on four problems (3,5,8,9). For the number of iterations and number of function evaluations, the SDBroyden method needs less iterations on five problems (2,4,6,7,11) and equals DBQN on three problems (1,10,12). For the Broyden's mean convergence rate, the SDBroyden method performs better on five problems (2,4,6,7,11), equals DBQN on two problems (1,10), and works badly on one problem (12). The results of the DBQN method with B 0 = F (x 0 ) are listed in Table 6. The DBQN method fails on one problem (5). For the number of iterations, SDBroyden is better than the DBQN method on seven problems (2,4,6,8,10,11,12), equivalent to the DBQN method on three problems (1,3,9). At the same time, DBQN performs well on one problem (7). For the number of function evaluations and Broyden's mean convergence rate, SDBroyden is excellent on six problems (2,4,6,8,11,12), while the DBQN method works well on one problem (10). The two methods coincide with each other on three problems (3,9,10). In Figure 2, we also give the comparison of the SDBroyden method and DBQN method relative to the number of iterations and number of function evaluations. It can be seen that the top curve corresponds to the SDBroyden method. This means that the SDBroyden method has satisfactory performance in terms of number of iterations and number of function evaluations when compared with its dense version. (3) In the third set of our numerical experiments, we compare the SDBroyden method with Newton's method, where the results are listed in Table 7. Newton's method fails on three problems (5,8,10). One can see that the SDBroyden method requires slightly more iterations than Newton's method in most tests and has no significant advantages in the number of iterations, number of function evaluations, and Broyden's mean convergence rate. However, the CPU time for Newton's method is much higher than that of the SDBroyden method. Moreover, the CPU time of Newton's method increases significantly faster than that of the quasi-Newton methods. Thus, the SDBroyden method can be applied to solve large-scale nonlinear equations.

Conclusions
We have developed a sparse direct Broyden quasi-Newton method for solving largescale nonlinear equations, which is the sparse case of the direct Broyden method and is an extension of Broyden's method. The method approximates the Jacobian matrix by least change updating and satisfies the sparsity condition and direct tangent condition simultaneously. We show that the method is locally and superlinearly convergent. Combined with a nonmonotone line search, we also establish the global and superlinear convergence. In particular, the unit step length is essentially accepted. Our numerical results show that the proposed method is effective and competitive for sparse nonlinear equations.

Data Availability Statement:
The date used to support the research plan and all the computer codes used in this study are available from the corresponding author upon request.

Acknowledgments:
The authors would like to thank the four anonymous referees for their careful reading of this paper and their comments to improve the quality of this paper. The authors also would like to thank the corresponding editor for providing insightful suggestions.

Conflicts of Interest:
The authors declare no conflict of interest.
Problem 2. Strictly convex function [27] F(x) is the gradient of h(x) = ∑ n i=1 (e x i − x i ).