Two Modiﬁed Single-Parameter Scaling Broyden–Fletcher–Goldfarb–Shanno Algorithms for Solving Nonlinear System of Symmetric Equations

: In this paper, we develop two algorithms to solve a nonlinear system of symmetric equations. The ﬁrst is an algorithm based on modifying two Broyden–Fletcher–Goldfarb–Shanno (BFGS) methods. One of its advantages is that it is more suitable to effectively solve a small-scale system of nonlinear symmetric equations. In contrast, the second algorithm chooses new search directions by incorporating an approximation method of computing the gradients and their difference into the determination of search directions in the ﬁrst algorithm. In essence, the second one can be viewed as an extension of the conjugate gradient method recently proposed by Lv et al. for solving unconstrained optimization problems. It was proved that these search directions are sufﬁciently descending for the approximate residual square of the equations, independent of the used line search rules. Global convergence of the two algorithms is established under mild assumptions. To test the algorithms, they are used to solve a number of benchmark test problems. Numerical results indicate that the developed algorithms in this paper outperform the other similar algorithms available in the literature.


Introduction
In this paper, we study solution methods of the following nonlinear system of symmetric equations: where F : R n → R n is a continuously differentiable function, and its Jacobian J(x) = ∇F(x) is symmetric, i.e., J(x) = J(x) T . Such a problem is closely related with many scientific problems, such as unconstrained optimization problems, equality constrained mathematical programming problems, discretized two-point boundary value problems, and discretized elliptic boundary value problems (see Chapter 1 in [1]). For example, when F is the gradient mapping of an objective function f : R n → R, (1) is just the first order necessary condition for a local minimizer of the following problem: For the equality constrained mathematical programming problems:

Development of Algorithm
In this section, we first simply recall a single-parameter scaling BFGS method [21] for solving the following unconstrained optimization problem: where f : R n → R is continuously differentiable such that its gradient is available. This method generates a sequence x k satisfying: where k ≥ 0, x 0 is the initial point, α k > 0 is called a step length obtained by some line search rule, and d k is a search direction defined as where s k = x k+1 − x k , y k = ∇ f k+1 − ∇ f k . By minimizing the measure function introduced by Byrd and Nocedal [23]: Lv et al. [21] obtained that: In 2006, ref. [22] proposed a modification of the BFGS algorithm for unconstrained nonconvex optimization. The matrix B k in [22] was updated by the formula: B k s k s T k B k s T k B k s k + y ks y T ks y T ks s k . (10) where y ks is the sum of y k and t k ∇ f k r s k , and t k > 0, r > 0. Due to their impressive numerical efficiency, we now attempt to modify the aforementioned methods to solve the symmetric system of nonlinear Equation (1).
If we define f in (5) as Then, for this objective function, any global minimizer of Problem (5) at which f vanishes is a solution of Problem (1). If an algorithm stops at a global minimizer x k , i.e., f (x k ) = 0, then the algorithm finds a solution of (1).
By a symmetry of J, it holds that In [3], Li and Fukushima suggested that ∇ f (x) is approximately computed by where α > 0, and it can be proved that: g(x, α) = J(x) T F(x) = ∇ f (x). (14) In other words, when αF(x) is sufficiently small, it is true that the vector g(x, α) defined by (13) is a nice approximation to ∇ f (x).
In the actual calculation, [3] computed g k by g k = g(x k , α k−1 ). (15) where α k−1 is the step size at the last iterate point x k−1 . In general, the convergence of algorithms can ensure that lim Based on the work done by Li and Fukushima [3], Zhou [6] proposed a modified BFGS method to solve (1). The modified BFGS update formula is given by where: and:δ Based on the ideas of [6,21,22], we now attempt to propose a modified single-parameter scaling BFGS method to solve (1). The modified BFGS update formula is given by where γ k > 0 and: δ k −δ k T s k s T k s k s k + t F k r s k , otherwise. (20) It is clear that γ k also minimizes (8) where B k is defined by (19) if we compute γ k by Moreover, we obtain an approximate quasi-Newton direction: where g k is an approximate gradient defined by (15). (20) is slightly different from that in (17) where r = 1. Since γ k in (21) can minimize (8) where B k is defined by (19), its condition number which is the quotient of the maximum eigenvalue and the minimum eigenvalue of B k is also minimized. Clearly, a smaller condition number of search direction matrices can theoretically ensure the stability of algorithms [21]. Numerical experiments will also show that B k in (19) with γ k being defined by (21) is more efficient and robust than that in (16).

Remark 1. δ k in
Since nonmonotone line search rules can play a critical role in solving a complicated nonconvex optimization problem, we use the nonmonotone line search in [3,6] to determine a step-size α k along the direction d k . Specifically, let σ 1 , σ 2 , ρ, ρ 1 , ∈ (0, 1) and η > 0 be five given constants, and let η k be a given positive sequence such that: We search for a step size α k satisfying: With the above preparation, we are in a position to develop an algorithm to solve Problem (1). We now present its computer procedure as follows.
Remark 2. In fact, for all k > 0, if B 0 is symmetric and positive definite, B k in (19) is also symmetric and positive definite since Therefore, the algorithm is well defined. From the definition of δ k in (20), we can also obtain: Since Algorithm 1 cannot efficiently solve large-scale nonlinear symmetric equations, based on the work done by [3], we will develop another algorithm that is not involved with matrix operation and inverse operation. When we set B k = I, the inverse matrix of B k+1 in (19) can be written as where γ k is the same as (21), and: In fact, δ k and ξ k are completely the same as those in [3,15]. Moreover, in order to guarantee that our proposed method generates descent directions and to further increase its computational efficiency and robustness, we can compute the direction by where g k is defined in (15) and: Remark 3. Note that the nonmonotone line search (31) is a variant of (24) with σ 1 = 0.

Remark 4.
Since the search direction of Algorithm 1 at each iteration is an approximate quasi-Newton direction, which is involved with the solution of a linear system of equations, Algorithm 1 can only efficiently solve small-medium-scale Problem (1). Instead, the needed search directions in Algorithm 2 is only associated with evaluating the function F without requirement of computing or storing its Jacobian matrix. Thus, compared with Algorithm 1, Algorithm 2 is more applicable to solving large-scale systems of nonlinear equations. In addition, two different approximation methods are used to compute the difference of gradients (see (18) and (28)).

Remark 6.
Very recently, Liu et al. [15] developed an algorithm to solve (1), where β k and θ k in (29) were replaced by respectively. Since (30) and (32) are two similar choices, it is interesting to compare their numerical performance for solving the problem (1).

Convergence of Algorithm
In this section, we establish the global convergence of Algorithms 1 and 2. For this purpose, we first define the level set: Clearly, it follows from Step 3 of Algorithm 1 that: Thus, any sequence {x k } generated by Algorithm 1 belongs to Ω, i.e., x k ∈ Ω for all k. In other words, there exists a constant ∆ > 0, such that: Moreover, since η k satisfies (23), from Lemma 3.3 in [24], we know that the sequence { F k } generated by Algorithm 1 converges.
Likewise, from the line search rule of Algorithm 2, we know that the sequence of iterate points {x k } generated by Algorithm 2 also belongs to Ω and { F k } generated by Algorithm 2 also satisfies (34).
As done in the existing results [13,15,25], we also suppose that F in (1) satisfies the following conditions: Assumption 1. The solution set of the problem (1) is nonempty.
Assumption 2. The level set Ω is bounded. Assumption 3. F is a continuous differentiable on an open and convex set V ⊆ R n containing the level set Ω, and its Jacobian matrix is symmetric and bounded on V, i.e., there exists a positive constant M such that: Assumption 4. J(x) is uniformly nonsingular on V, i.e., there exists a positive constant m such that: Clearly, Assumptions 2-4 imply that there exist positive constants M ≥ m > 0 such that the following statements are true: (1) For any x ∈ V, p ∈ R n , (2) For any x, y ∈ V, where θ ∈ (0, 1).
where t ∈ (0, 1). Under Assumptions 2-4, we can prove that Algorithm 1 has the following nice properties. Lemma 1. Let {B k } be generated by the BFGS formula (19), where B 0 is a symmetric and positive definite and γ k is defined by (21). If there exists a positive constant m 0 > 0, such that: then for any p ∈ (0, 1) and k > 1, there exist positive constants β i , i = 1, 2, 3, 4 such that: hold for at least pk values of j ∈ [1, k], where t is the smallest integer which is larger than or equal to t.
Proof. From (8) and (19), we have: Take cos θ k = s T k B k s k B k s k s k and q k = s T k B k s k s T k s k , then (42) can be rewritten as On the other hand, from (2.11) in [6], we know: where C 1 > 0 is a constant. Hence, it follows from (25), (26), (34) , (40) and (44) that: and: From (43), (45) and (46), we have: Take η j ≥ 0: It is clear that ψ(B k+1 ) > 0 since B k+1 is symmetric and positive definite. Hence, from (47), we have: Let us define J k to be a set consisting of the pk indices corresponding to the pk smallest values of η j , for j ≤ k, and let η m k denote the largest of the η j for j ∈ J k . Then: Thus, from (48)-(50) and the following fact: we have: It follows from (51) that: On the other hand, since ln cos 2 θ j ≤ 0, we have: Let µ(t) = 1 − t + ln t, then by simple analysis, we have: Therefore, there exist positive constantsβ 3 andβ 4 such that the following inequalities hold:β 3 ≤ q j cos 2 θ j ≤β 4 , ∀j ∈ J k .

Remark 7.
From the proof of Lemma 1, we know that if γ k is not defined by (21), Lemma 1 is also true whenever there exist constants m 3 > 0 and M 3 > 0 such that m 3 ≤ γ k ≤ M 3 holds.
By Lemma 1, since the definition ofδ k and the line search rule are completely the same as those in [6], we can obtain the same convergence result as Algorithm 1 without proof.
To establish the global convergence of Algorithm 2, we first prove the following results.
Lemma 2. Let {x k } be a sequence generated by Algorithm 2. If Assumptions 1-4 hold. Then: Proof. Similar to the proof of Lemma 3.1 in [15], we can prove (58).

Lemma 3.
Let {x k } be a sequence generated by Algorithm 2. If Assumptions 1-4 hold, then: Additionally, if s k → 0, then there exists a constantm > 0 such that for all sufficiently large k: Proof. On the one hand, it follows from (28) and (37) that: where θ 1 , θ 2 ∈ (0, 1). On the other hand, by the mean-value theorem, we have: From Lemma 2, we have s k → 0, hence ξ k = F(x k+1 ) − F(x k ) → 0. By continuity of J, we get (60).

Lemma 4.
Suppose that Assumptions 1-4 hold. If there exists a constant r > 0 such that for all k ∈ N: Then, there exists a constantM ≥m > 0 such that: hold, wherem = rm.

Lemma 5.
Suppose that Assumptions 1-4 hold. Then: where: Proof. If α k = 1, it is easy to see that (71) holds. If α k = 1, then α k = ρ −1 α k does not satisfy (31), that is to say, α k satisfies: On the other hand, from (38), it follows that: Combined with (73), we obtain: where the first inequality follows from (65), the second equality follows from (13) and the differentiability of F. The third inequality follows from the Cauchy-Schwartz inequality.
By (75), we obtain the desired result.

Lemma 6.
Suppose that Assumptions 1-4 hold. Let {x k } and {d k } be two sequences generated by Algorithm 2. Then, the line search rule (31) by Step 3 in Algorithm 2 is well defined.
Proof. Our aim is to show that the line search rule (31) terminates finitely with a positive step length α k . In contrast, suppose that for some iterate indexes such as k * , the condition (31) does not hold. As a result, for all m ∈ N: which can be written as By taking the limit as m → ∞ in both sides of (77), we have: However, from Assumption 3, Lemma 4 and (34) and the stop rule of Algorithm 2, we obtain: Clearly, (79) contradicts (78). That is to say, the line search rule terminates within a finite number of many trials to obtain a positive step length α k , i.e., Step 3 of Algorithm 2 is well defined.
With the above preparation, we now state the convergence result of Algorithm 2.
Theorem 2. Suppose that Assumptions 1-4 hold. Let {x k } be a sequence generated by Algorithm 2. Then: lim Proof. For the sake of contradiction, we suppose that the conclusion is not true. Then, there exists a constant ε 0 > 0 such that F k ≥ ε 0 for all k ∈ N. Hence, (66) holds. Hence, from (58), we have: lim It follows from (81) and (72) that: From (39), Lemmas 4 and 5, we know the following inequality: holds for all sufficiently large k. Therefore, taking the limit as k → ∞ in both sides of (83), it holds that: which is a contradiction. Thus, the proof of Theorem 2 has been completed.

Numerical Tests
In this section, by numerical tests, we study the effectiveness and robustness of Algorithm 1 when it is used to solve nonlinear systems of symmetric equations.
Problem 2. In Reference [22], the elements of F(x) are given by

Problem 4. Unconstrained optimization problem:
with Engval function [28] f : R n → R defined by The related symmetric nonlinear equation is: Problem 5. The discretized two-point boundary value problem like the problem in [1]: Problem 6. In Reference [6], the elements of F(x) are given by

Problem 7.
In Reference [6], the elements of F(x) are given by All the algorithms are coded in MATLAB R2021a and run on a desktop (at Peking University) computer with a 3.6 GHZ CPU processor, 16 GB memory and Windows 7 operation system. The relevant parameters are specified by and µ in MBFGS method (Algorithm 2.1 in [6]) is the same as [6], i.e., µ = 10 −4 . In fact, the above parameters all are same as those in [6]. Similarly to [6], we use the matrix left division command d = −B\g to directly solve the linear subproblem (22). The termination condition of all the algorithms is: F k ≤ 10 −6 , or the number of iterations exceeds 10 4 , or the MATLAB R2010b crashes, or the CPU time exceeds 100 s. In order to choose optimal values for the parameters t and r in Algorithm 1, we first take r = 0.5, and choose t from the interval [1, 1.1] with a step size of 0.01. We present the total number of iterations (Iter) in Figure 1a as Algorithm 1 is used to solve all the seven test problems with different sizes n (10, 50, 100 and 500) and different initial guesses. The initial guesses are Figure 1a, we know that Iter changes little when t ∈ [1.02, 1.03] and it is the least when t = 1.03.
We then take t = 1.03, and choose r from the interval [0.5, 0.6] with a step size of 0.01. We present the total number of iterations (Iter) in Figure 1b as Algorithm 1 is used to solve all seven test problems with different sizes n (10, 50, 100 and 500) and different initial guesses (x 1 -x 6 ). Figure 1b shows that Iter changes little when r ∈ [0.52, 0.54] and Algorithm 1 with r = 0.5 performs the best.   According to the above research, we take t = 1.03 and r = 0.5, and compare Algorithm 1 (MSBFGS) with two similar algorithms proposed very recently to see which is more efficient as they are used to solve all seven test problems with different sizes n and different initial guesses. One is the Gauss-Newton-based BFGS method (GNBFGS for short) in [3] and another is MBFGS in [6] since they have been reported to be more efficient than the state-of-the-art ones.
In Table A1, we report the numerical performance of the three algorithms. For the simplification of statement, we use the following notations in Table A1. P: the problems; Dim: the dimension of test problems; CPU: the CPU time in seconds; Ni: the number of iterations; Nf: the number of function evaluations; Norm (F): the norm of F k at the stopping point; F: a notation when an algorithm fails to achieve the given iteration tolerance, or in the limited number of iterations exceeds 10 4 , or the MATLAB R2010b crashes, or in the limited the CPU time exceeds 100 s.
The underlined data in Table A1 indicate the superiority of Algorithm 1 in comparison with the others.
To further show the efficiency of the proposed method, we calculated the number of wins for the three algorithms in terms of the elapsed CPU time (CPU wins), the number of iterations (Iter wins) and the number of function evaluations (Nf wins) and we also calculated the failures (Fails) of the three algorithms. The results are recorded in Table 1.
In addition, we adopted the performance profiles introduced by Dolan and Moré [29] to evaluate the required number of iterations and the number of function evaluations.
It follows from the results in Table 1 and Figure 2 that our algorithm (MSBFGS) performs the best among the three algorithms, either with respect to the number of iterations, or with respect to the elapsed CPU time.
In order to test the efficiency of the Algorithm 2 (MSBFGS2), we compared its performance for solving large-scale nonlinear symmetric equations with Algorithm 2.1 (NDDF) in [15] and Algorithm 2.1 (DFMPRP) in [13]. For the sake of fairness, we chose seven test problems, all from [15], where the relevant parameters of Algorithm 2 are same as those of NDDF in [15]. The values of parameters in DFMPRP are from [13]. The termination condition of all three algorithms is F k ≤ 10 −4 , or the number of iterations exceeds 10 4 , or the MATLAB R2010b crashes, or the CPU time exceeds 100 s. The numerical performance of all the algorithms is reported in Tables A2 and A3.  Table A2 shows the numerical performance of all three algorithms with the fixed initial points x 1 -x 6 . Table A3 demonstrates the numerical performance of all the three algorithms with initial points x 7 and x 8 randomly generated by Matlab's Code "rand(n,1)" and "rand(n,1)", respectively. Furthermore, we calculated the "CPU wins" , the "Iter wins", the "Nf wins" and the "Fails" of the three algorithms. The results are recorded in Table 2. We also adopted the performance profiles introduced by Dolan and Moré [29] to evaluate the required number of iterations and the required number of function evaluations of the three algorithms.
All the results of numerical performance in Figure 3 and Tables 2, A2 and A3 demonstrate that our algorithm (MSBFGS2) performs better than the other two algorithms. MS-BFGS2 is more efficient and robust than the others since the failures of MSBFGS2 are the least among the three algorithms in the case that different initial guesses are chosen. (a) The number of iterations

Conclusions and Future Research
In this paper, we presented two derivative-free methods for solving nonlinear symmetric equations. For the first method, the direction is an approximate quasi-Newton direction and it can solve small-scale problems efficiently. For the second method, since it is not involved with the computation or storage of any matrix, it is applicable to solve the large scale system of nonlinear equations.
Global convergence theories of the developed algorithms were established. Compared with the similar algorithms, numerical tests demonstrated that our algorithms outperformed the others by costing less iterations, or less CPU time to find a solution with the same tolerance.
In future research, it would be valuable to deeply study the local convergence of the developed algorithms, in addition to the conducted analysis of global convergence in this paper. Additionally, our algorithms were designed only for the system of equations which is symmetric and satisfied with some relatively restrictive assumptions. Thus, it is interesting to study how to modify our algorithms to solve a more general system of equations.

Data Availability Statement:
The data used to support the findings of this study are available from the corresponding author upon request. All the computer codes used in this study are available from the corresponding author upon request.

Conflicts of Interest:
The authors declare no conflict of interest.