1. Introduction
In this paper, we study solution methods of the following nonlinear system of symmetric equations:
where
is a continuously differentiable function, and its Jacobian
is symmetric, i.e.,
. Such a problem is closely related with many scientific problems, such as unconstrained optimization problems, equality constrained mathematical programming problems, discretized two-point boundary value problems, and discretized elliptic boundary value problems (see Chapter 1 in [
1]). For example, when
F is the gradient mapping of an objective function
, (
1) is just the first order necessary condition for a local minimizer of the following problem:
For the equality constrained mathematical programming problems:
where
is a vector-valued function. The Karush–Kuhn–Tucker (KKT) conditions (see Chapter 8 in [
2]) for Problem (
3) is also the system (
1) with
, and:
Among various methods for solving (
1), the Newton method needs to compute the Jacobian matrix
, and requires that
is nonsingular at each iterative point. Due to this stringent condition, the Newton method is not applicable in a general case.
Li and Fukushima [
3] proposed a Gauss–Newton method to solve the symmetric system of equations, which can ensure that an approximate residual of
is descent. Gu et al. [
4] modified the method in [
3] such that the residual
is descent. As a generalization of the method in [
5] for solving smooth unconstrained optimization, Zhou [
6] presented an inexact modified BFGS method to solve the symmetric system of equations by approximately computing the gradient of the residual square. In [
7], Wang and Zhu proposed an inexact-Newton via GMRES (generalized minimal residual) subspace method without line search technique for solving symmetric nonlinear equations. The iterative direction was obtained by solving the Newton equation of the system of nonlinear equations with the GMRES algorithm. Yuan and Yao [
8] also proposed a BFGS method for solving symmetric nonlinear equations and the method possesses a good property that the generated sequence of the quasi-Newton matrix is positive definite. However, since the search direction is generated by solving a system of linear equations in these methods, all of them are not applicable to solving large-scale problems.
For large-scale symmetric nonlinear equations, Li and Wang [
9] proposed a modified Fletcher–Reeves derivative-free method, as an extension of the conjugate gradient method [
10]. Similarly, as an extension of descent conjugate gradient methods in [
11] for unconstrained optimization, Xiao et al. [
12] presented a family of derivative-free methods for symmetric equations, and established the global convergence under some appropriate conditions, and showed their effectiveness by numerical experiments. Zhou and Shen [
13] presented an efficient iterative method for solving large-scale symmetric equations, as an extension of the three-term PRP conjugate gradient method in [
14] for solving unconstrained optimization problems. Liu and Feng [
15] proposed a norm descent derivative-free algorithm for solving large-scale nonlinear symmetric equations, as an extension of the three-term conjugate gradient method in [
16] for solving unconstrained optimization problems. More details can be seen in [
17,
18,
19,
20].
Our motivation in this paper is to develop two algorithms to solve Problem (
1). Firstly, based on the single-parameter scaling memoryless BFGS method proposed by Lv et al. [
21] and the modification of the BFGS method in [
22], we intended to develop an efficient algorithm (MSBFGS) which would incorporate the approximation method of computing the gradients in [
3] and their difference in [
6] such that it can solve the system of nonlinear Equations (
1) more efficiently. Secondly, since MSBFGS is involved with computation and the storage of matrices, it is not applicable to solve a large-scale system of nonlinear equations. Therefore, by giving an inverse formula of the update matrix in MSBFGS, we are going to develop another method (MSBFGS2) such that it can solve large-scale systems of nonlinear Equations (
1). Additionally, in addition to the establishment of the two algorithms’ convergence, we shall also demonstrate their powerful numerical performance as they are applied to solve benchmark test problems in the literature.
The rest of this paper is organized as follows. In
Section 2, we first state the idea to propose two methods for solving the nonlinear symmetric equations. Then, two new algorithms are developed. Global convergence of algorithms is established in
Section 3.
Section 4 is devoted to numerical tests. Some conclusions are drawn in
Section 5.
Some words about our notation: throughout the paper, the space is equipped with the Euclidean norm , the transpose of any matrix is denoted by , and the and are abbreviated as and , respectively.
2. Development of Algorithm
In this section, we first simply recall a single-parameter scaling BFGS method [
21] for solving the following unconstrained optimization problem:
where
is continuously differentiable such that its gradient is available. This method generates a sequence
satisfying:
where
,
is the initial point,
is called a step length obtained by some line search rule, and
is a search direction defined as
where
.
By minimizing the measure function introduced by Byrd and Nocedal [
23]:
Lv et al. [
21] obtained that:
In 2006, ref. [
22] proposed a modification of the BFGS algorithm for unconstrained nonconvex optimization. The matrix
in [
22] was updated by the formula:
where
is the sum of
and
, and
.
Due to their impressive numerical efficiency, we now attempt to modify the aforementioned methods to solve the symmetric system of nonlinear Equation (
1).
Then, for this objective function, any global minimizer of Problem (
5) at which
f vanishes is a solution of Problem (
1). If an algorithm stops at a global minimizer
, i.e.,
, then the algorithm finds a solution of (
1).
By a symmetry of
J, it holds that
In [
3], Li and Fukushima suggested that
is approximately computed by
where
, and it can be proved that:
In other words, when
is sufficiently small, it is true that the vector
defined by (
13) is a nice approximation to
.
In the actual calculation, [
3] computed
by
where
is the step size at the last iterate point
. In general, the convergence of algorithms can ensure that
.
Based on the work done by Li and Fukushima [
3], Zhou [
6] proposed a modified BFGS method to solve (
1). The modified BFGS update formula is given by
where:
and:
Based on the ideas of [
6,
21,
22], we now attempt to propose a modified single-parameter scaling BFGS method to solve (
1). The modified BFGS update formula is given by
where
and:
It is clear that
also minimizes (
8) where
is defined by (
19) if we compute
by
Moreover, we obtain an approximate quasi-Newton direction:
where
is an approximate gradient defined by (
15).
Remark 1. in (20) is slightly different from that in (17) where . Since in (21) can minimize (8) where is defined by (19), its condition number which is the quotient of the maximum eigenvalue and the minimum eigenvalue of is also minimized. Clearly, a smaller condition number of search direction matrices can theoretically ensure the stability of algorithms [21]. Numerical experiments will also show that in (19) with being defined by (21) is more efficient and robust than that in (16). Since nonmonotone line search rules can play a critical role in solving a complicated nonconvex optimization problem, we use the nonmonotone line search in [
3,
6] to determine a step-size
along the direction
. Specifically, let
and
be five given constants, and let
be a given positive sequence such that:
We search for a step size
satisfying:
With the above preparation, we are in a position to develop an algorithm to solve Problem (
1). We now present its computer procedure as follows.
Remark 2. In fact, for all , if is symmetric and positive definite, in (19) is also symmetric and positive definite sinceTherefore, the algorithm is well defined. From the definition of in (20), we can also obtain: Since Algorithm
1 cannot efficiently solve large-scale nonlinear symmetric equations, based on the work done by [
3], we will develop another algorithm that is not involved with matrix operation and inverse operation. When we set
, the inverse matrix of
in (
19) can be written as
where
is the same as (
21), and:
In fact,
and
are completely the same as those in [
3,
15].
Moreover, in order to guarantee that our proposed method generates descent directions and to further increase its computational efficiency and robustness, we can compute the direction by
where
is defined in (
15) and:
Remark 3. Note that the nonmonotone line search (31) is a variant of (24) with . Remark 4. Since the search direction of Algorithm 1 at each iteration is an approximate quasi-Newton direction, which is involved with the solution of a linear system of equations, Algorithm 1 can only efficiently solve small–medium-scale Problem (1). Instead, the needed search directions in Algorithm 2 is only associated with evaluating the function F without requirement of computing or storing its Jacobian matrix. Thus, compared with Algorithm 1, Algorithm 2 is more applicable to solving large-scale systems of nonlinear equations. In addition, two different approximation methods are used to compute the difference of gradients (see (18) and (28)). Algorithm 1 (Modified Single-Parameter Scaling BFGS Algorithm (MSBFGS)) |
Step 0. Choose three constants . Take a sequence satisfying (23). Arbitrarily choose an initial iterate point , a symmetric and positive definite matrix . Set . |
Step 1. If is satisfied, then the algorithm stops. |
Step 2. Compute by (22) and (19). |
Step 3. Determine a step length satisfying (24). |
|
Step 4. Set . |
Step 5. Set , return to Step 1. |
Algorithm 2 (Modified Single-Parameter Scaling BFGS Algorithm 2(MSBFGS2))
|
Step 0. Choose three constants . Take a sequence satisfying (23). Arbitrarily choose an initial iterate point . Set . |
Step 1. If is satisfied, then the algorithm stops. |
Step 2. Compute by (29). |
Step 3. Determine a step length satisfying: |
Step 4. Set . |
Step 5. Set , return to Step 1. |
Remark 5. By combining the advantages of the three-term conjugate gradient method in [21] and those of the approximation methods for computing the difference of gradients, it is believable that the numerical performance of Algorithm 2 is better than the algorithm in [21]. In the two subsequent sections, apart from establishing the convergence theory of Algorithm 2, we will also test its efficiency in solving large-scale problems. Remark 6. Very recently, Liu et al. [15] developed an algorithm to solve (1), where and in (29) were replaced byrespectively. Since (30) and (32) are two similar choices, it is interesting to compare their numerical performance for solving the problem (1). 3. Convergence of Algorithm
In this section, we establish the global convergence of Algorithms 1 and 2. For this purpose, we first define the level set:
Clearly, it follows from Step 3 of Algorithm 1 that:
Thus, any sequence
generated by Algorithm 1 belongs to
, i.e.,
for all
k. In other words, there exists a constant
, such that:
Moreover, since
satisfies (
23), from Lemma 3.3 in [
24], we know that the sequence
generated by Algorithm 1 converges.
Likewise, from the line search rule of Algorithm 2, we know that the sequence of iterate points
generated by Algorithm 2 also belongs to
and
generated by Algorithm 2 also satisfies (
34).
As done in the existing results [
13,
15,
25], we also suppose that
F in (
1) satisfies the following conditions:
Assumption 1. The solution set of the problem (1) is nonempty. Assumption 2. The level set Ω is bounded.
Assumption 3. F is a continuous differentiable on an open and convex set containing the level set Ω, and its Jacobian matrix is symmetric and bounded on V, i.e., there exists a positive constant M such that: Assumption 4. is uniformly nonsingular on V, i.e., there exists a positive constant m such that: Clearly, Assumptions 2–4 imply that there exist positive constants such that the following statements are true:
(1) For any
,
,
(2) For any
,
where
.
(3) For any sequence
,
where
.
Under Assumptions 2–4, we can prove that Algorithm 1 has the following nice properties.
Lemma 1. Let be generated by the BFGS Formula (19), where is a symmetric and positive definite and is defined by (21). If there exists a positive constant , such that:then for any and , there exist positive constants such that:hold for at least values of , where is the smallest integer which is larger than or equal to t. Proof. From (
8) and (
19), we have:
Take
and
, then (
42) can be rewritten as
On the other hand, from (2.11) in [
6], we know:
where
is a constant. Hence, it follows from (
25), (
26), (
34), (
40) and (
44) that:
and:
From (
43), (
45) and (
46), we have:
It is clear that
since
is symmetric and positive definite. Hence, from (
47), we have:
Let us define
to be a set consisting of the
indices corresponding to the
smallest values of
, for
, and let
denote the largest of the
for
. Then:
Thus, from (
48)–(
50) and the following fact:
we have:
It follows from (
51) that:
On the other hand, since
, we have:
Let
, then by simple analysis, we have:
Therefore, there exist positive constants
and
such that the following inequalities hold:
Together with (
52), we obtain:
Take , we obtain the desired result. □
Remark 7. From the proof of Lemma 1, we know that if is not defined by (21), Lemma 1 is also true whenever there exist constants and such that holds. By Lemma 1, since the definition of
and the line search rule are completely the same as those in [
6], we can obtain the same convergence result as Algorithm 1 without proof.
Theorem 1. Suppose that Assumptions 1–4 hold. Let be a sequence generated by Algorithm 1. Then: To establish the global convergence of Algorithm 2, we first prove the following results.
Lemma 2. Let be a sequence generated by Algorithm 2. If Assumptions 1–4 hold. Then: Proof. Similar to the proof of Lemma 3.1 in [
15], we can prove (
58). □
Lemma 2 shows that holds.
Lemma 3. Let be a sequence generated by Algorithm 2. If Assumptions 1–4 hold, then: Additionally, if , then there exists a constant such that for all sufficiently large k: Proof. On the one hand, it follows from (
28) and (
37) that:
where
. On the other hand, by the mean-value theorem, we have:
From Lemma 2, we have
, hence
. By continuity of
J, we get (
60). □
Lemma 4. Suppose that Assumptions 1–4 hold. If there exists a constant such that for all : Then, there exists a constant such that:hold, where . Proof. Similar to Proposition 3 in [
21], we have:
From (
39), (
63) and (
65), it follows that:
Therefore, the left-hand side of (
64) holds.
From (
60), the definition of
in (
29) and (
30), we have:
From Assumptions 2 and 3, (
34) and (
39), we know that the sequence
is bounded, i.e., there exists a positive constant
such that for all
:
Thus, from (
29), (
30), (
67), (
68), (
69) and the line search rule, it is easy to obtain that:
The proof is completed. □
Lemma 5. Suppose that Assumptions 1–4 hold. Then:where: Proof . If
, it is easy to see that (
71) holds. If
, then
does not satisfy (
31), that is to say,
satisfies:
On the other hand, from (
38), it follows that:
Combined with (
73), we obtain:
where the first inequality follows from (
65), the second equality follows from (
13) and the differentiability of
F. The third inequality follows from the Cauchy–Schwartz inequality.
By (
75), we obtain the desired result. □
Lemma 6. Suppose that Assumptions 1–4 hold. Let and be two sequences generated by Algorithm 2. Then, the line search rule (31) by Step 3 in Algorithm 2 is well defined. Proof . Our aim is to show that the line search rule (
31) terminates finitely with a positive step length
. In contrast, suppose that for some iterate indexes such as
, the condition (
31) does not hold. As a result, for all
:
which can be written as
By taking the limit as
in both sides of (
77), we have:
However, from Assumption 3, Lemma 4 and (
34) and the stop rule of Algorithm 2, we obtain:
Clearly, (
79) contradicts (
78). That is to say, the line search rule terminates within a finite number of many trials to obtain a positive step length
, i.e., Step 3 of Algorithm 2 is well defined. □
With the above preparation, we now state the convergence result of Algorithm 2.
Theorem 2. Suppose that Assumptions 1–4 hold. Let be a sequence generated by Algorithm 2. Then: Proof . For the sake of contradiction, we suppose that the conclusion is not true. Then, there exists a constant
such that
for all
. Hence, (
66) holds. Hence, from (
58), we have:
It follows from (
81) and (
72) that:
From (
39), Lemmas 4 and 5, we know the following inequality:
holds for all sufficiently large
k. Therefore, taking the limit as
in both sides of (
83), it holds that:
which is a contradiction. Thus, the proof of Theorem 2 has been completed. □
4. Numerical Tests
In this section, by numerical tests, we study the effectiveness and robustness of Algorithm 1 when it is used to solve nonlinear systems of symmetric equations.
We first list the benchmark test problems
, which includes all the four test problems in [
6].
Problem 1. Strictly convex function 1 ([26], p. 29) Let be the gradient of , meaning that: Problem 2. In Reference [22], the elements of are given by Problem 3. The discretized Chandrasekhar’s H-Equation [27]:where and . Problem 4. Unconstrained optimization problem:with Engval function [28] defined byThe related symmetric nonlinear equation is:where is defined by: Problem 5. The discretized two-point boundary value problem like the problem in [1]:and with , . Problem 6. In Reference [6], the elements of are given by Problem 7. In Reference [6], the elements of are given by All the algorithms are coded in MATLAB R2021a and run on a desktop (at Peking University) computer with a 3.6 GHZ CPU processor, 16 GB memory and Windows 7 operation system. The relevant parameters are specified by
and
in MBFGS method (Algorithm 2.1 in [
6]) is the same as [
6], i.e.,
. In fact, the above parameters all are same as those in [
6]. Similarly to [
6], we use the matrix left division command
to directly solve the linear subproblem (
22). The termination condition of all the algorithms is:
, or the number of iterations exceeds
, or the MATLAB R2010b crashes, or the CPU time exceeds 100 s.
In order to choose optimal values for the parameters
t and
r in Algorithm 1, we first take
, and choose
t from the interval
with a step size of
. We present the total number of iterations (Iter) in
Figure 1a as Algorithm 1 is used to solve all the seven test problems with different sizes
n (10, 50, 100 and 500) and different initial guesses. The initial guesses are
,
,
,
,
,
. From
Figure 1a, we know that Iter changes little when
and it is the least when
.
We then take
, and choose
r from the interval
with a step size of
. We present the total number of iterations (Iter) in
Figure 1b as Algorithm 1 is used to solve all seven test problems with different sizes
n (10, 50, 100 and 500) and different initial guesses (
–
).
Figure 1b shows that Iter changes little when
and Algorithm 1 with
performs the best.
According to the above research, we take
and
, and compare Algorithm 1 (MSBFGS) with two similar algorithms proposed very recently to see which is more efficient as they are used to solve all seven test problems with different sizes
n and different initial guesses. One is the Gauss–Newton-based BFGS method (GNBFGS for short) in [
3] and another is MBFGS in [
6] since they have been reported to be more efficient than the state-of-the-art ones.
In
Table A1, we report the numerical performance of the three algorithms. For the simplification of statement, we use the following notations in
Table A1.
P: the problems;
Dim: the dimension of test problems;
CPU: the CPU time in seconds;
Ni: the number of iterations;
Nf: the number of function evaluations;
Norm (F): the norm of at the stopping point;
F: a notation when an algorithm fails to achieve the given iteration tolerance, or in the limited number of iterations exceeds , or the MATLAB R2010b crashes, or in the limited the CPU time exceeds 100 s.
The underlined data in
Table A1 indicate the superiority of Algorithm 1 in comparison with the others.
To further show the efficiency of the proposed method, we calculated the number of wins for the three algorithms in terms of the elapsed CPU time (CPU wins), the number of iterations (Iter wins) and the number of function evaluations (Nf wins) and we also calculated the failures (Fails) of the three algorithms. The results are recorded in
Table 1.
In addition, we adopted the performance profiles introduced by Dolan and Moré [
29] to evaluate the required number of iterations and the number of function evaluations.
It follows from the results in
Table 1 and
Figure 2 that our algorithm (MSBFGS) performs the best among the three algorithms, either with respect to the number of iterations, or with respect to the elapsed CPU time.
In order to test the efficiency of the Algorithm 2 (MSBFGS2), we compared its performance for solving large-scale nonlinear symmetric equations with Algorithm 2.1 (NDDF) in [
15] and Algorithm 2.1 (DFMPRP) in [
13]. For the sake of fairness, we chose seven test problems, all from [
15], where the relevant parameters of Algorithm 2 are same as those of NDDF in [
15]. The values of parameters in DFMPRP are from [
13]. The termination condition of all three algorithms is
, or the number of iterations exceeds
, or the MATLAB R2010b crashes, or the CPU time exceeds 100 s.
The numerical performance of all the algorithms is reported in
Table A2 and
Table A3.
Table A2 shows the numerical performance of all three algorithms with the fixed initial points
–
.
Table A3 demonstrates the numerical performance of all the three algorithms with initial points
and
randomly generated by Matlab’s Code “rand(n,1)” and “-rand(n,1)”, respectively. Furthermore, we calculated the “CPU wins”, the “Iter wins”, the “Nf wins” and the “Fails” of the three algorithms. The results are recorded in
Table 2. We also adopted the performance profiles introduced by Dolan and Moré [
29] to evaluate the required number of iterations and the required number of function evaluations of the three algorithms.
All the results of numerical performance in
Figure 3 and
Table 2,
Table A2 and
Table A3 demonstrate that our algorithm (MSBFGS2) performs better than the other two algorithms. MSBFGS2 is more efficient and robust than the others since the failures of MSBFGS2 are the least among the three algorithms in the case that different initial guesses are chosen.
5. Conclusions and Future Research
In this paper, we presented two derivative-free methods for solving nonlinear symmetric equations. For the first method, the direction is an approximate quasi-Newton direction and it can solve small-scale problems efficiently. For the second method, since it is not involved with the computation or storage of any matrix, it is applicable to solve the large scale system of nonlinear equations.
Global convergence theories of the developed algorithms were established. Compared with the similar algorithms, numerical tests demonstrated that our algorithms outperformed the others by costing less iterations, or less CPU time to find a solution with the same tolerance.
In future research, it would be valuable to deeply study the local convergence of the developed algorithms, in addition to the conducted analysis of global convergence in this paper. Additionally, our algorithms were designed only for the system of equations which is symmetric and satisfied with some relatively restrictive assumptions. Thus, it is interesting to study how to modify our algorithms to solve a more general system of equations.