Abstract
This research proposes and investigates some improvements in gradient descent iterations that can be applied for solving system of nonlinear equations (SNE). In the available literature, such methods are termed improved gradient descent methods. We use verified advantages of various accelerated double direction and double step size gradient methods in solving single scalar equations. Our strategy is to control the speed of the convergence of gradient methods through the step size value defined using more parameters. As a result, efficient minimization schemes for solving SNE are introduced. Linear global convergence of the proposed iterative method is confirmed by theoretical analysis under standard assumptions. Numerical experiments confirm the significant computational efficiency of proposed methods compared to traditional gradient descent methods for solving SNE.
MSC:
90C53; 65K05; 49M37
1. Introduction, Preliminaries, and Motivation
Our intention is to solve a system of nonlinear equations (SNE) of the general form
where is the set of real numbers, denotes the set of n-dimensional vectors from , and , , and is the ith component of F. It is assumed that F is a continuously differentiable mapping. The nonlinear optimization problem (1) is equivalent to the subsequent minimization of the following goal function f:
The equivalence of (1) and (2) is widely used in science and practical applications. In such problems, the solution to SNE (1) comes down to solving a related least-squares problem (2). In addition to that, the application of the adequate nonlinear optimization method in solving (1) is a common and efficient technique. Some well-known schemes for solving (1) are based on successive linearization, where the search direction is obtained by solving the equation
where , and is the Jacobian matrix of . Therefore, the Newton iterative scheme for solving (1) is defined as
where is a positive parameter that stands for the steplength value.
1.1. Overview of Methods for Solving SNE
Most popular iterations for solving (1) use appropriate approximations of the Jacobian matrix . These iterations are of the form , where is the steplength, and is the search direction obtained as a solution to the SNE
For simplicity, we will use notations
The BFGS approximations are defined on the basis of the secant equation . The BFGS updates
with an initial approximation were considered in [1].
Further on, we list and briefly describe relevant minimization methods that exploit the equivalence between (1) and (2). The efficiency and applicability of these algorithms highly motivated the research presented in this paper. The number of methods that we mention below confirms the applicability of this direction in solving SNE. In addition, there is an evident need to develop and constantly upgrade the performances of optimization methods for solving (1).
There are numerous methods which can be used to solve the problem (1). Many of them are developed in [2,3,4,5,6,7]. Derivative-free methods for solving SNE were considered in [8,9,10]. These methods are proposed as appropriate adaptations of double direction and steplength methods in nonlinear optimization and the approximation of the Jacobian with a diagonal matrix whose entries are defined utilizing of an appropriate parameter. One approach based on various modifications of the Broyden method was proposed in [11,12]. A derivative-free conjugate gradient (CG) iterations for solving SNE were proposed in [13].
A descent Dai–Liao CG method for solving large-scale SNE was proposed in [14]. Novel hybrid and modified CG methods for finding a solution to SNE were originated in [15,16], respectively. An extension of a modified three-term CG method that can be applied for solving equations with convex constraints was presented in [17]. A diagonal quasi-Newton approach for solving large-scale nonlinear systems was considered in [18,19]. A quasi-Newton method, defined based on an improved diagonal Jacobian approximation, for solving nonlinear systems was proposed in [20]. Abdullah et al. in [21] proposed a double direction method for solving nonlinear equations. The first direction is the steepest descent direction, while the second direction is the proposed CG direction. Two derivative-free modifications of the CG-based method for solving large-scale systems were presented in [22]. These methods are applicable in the case when the Jacobian of is not accessible. An efficient approximation to the Jacobian matrix with a computational effort similar to that of matrix-free settings was proposed in [23]. Such efficiency was achieved when a diagonal matrix generates a Jacobian approximation. This method possesses low memory space requirements because the method is defined without computing exact gradient and Jacobian. Waziri et al. in [24] followed the approach based on the approximation of the Jacobian inverse by a nonsingular diagonal matrix. A fast and computationally efficient method concerning memory requirements was proposed in [25], and it uses an approximation of the Jacobian by an adequate diagonal matrix. A two-step generalized scheme of the Jacobian approximation was given in [26]. Further on, an iterative scheme which is based on a modification of the Dai–Liao CG method, classical Newton iterates, and the standard secant equation was suggested in [27]. A three-step method based on a proper diagonal updating was presented in [28]. A hybridization of FR and PRP conjugate gradient methods was given in [29]. The method in [29] can be considered as a convex combination of the PRP method and the FR method while using the hyperplane projection technique. A diagonal Jacobian method was derived from data from two preceding steps, and a weak secant equation was investigated in [30]. An iterative modified Newton scheme based on diagonal updating was proposed in [31]. Solving nonlinear monotone operator equations via a modified symmetric rank-one update is given in [32]. In [33], the authors used a new approach in solving nonlinear systems by simply considering them in the form of multi-objective optimization problems.
It is essential to mention that the analogous idea of avoiding the second derivative in the classical Newton’s method for solving nonlinear equations is exploited in deriving several iterative methods of various orders for solving nonlinear equations [34,35,36,37]. Moreover, some derivative-free iterative methods were developed for solving nonlinear equations [38,39]. Furthermore, some alternative approaches were conducted for solving complex symmetric linear systems [40] or a Sylvester matrix equation [41].
Trust region methods have become very popular algorithms for solving nonlinear equations and general nonlinear problems [37,42,43,44].
The systems of nonlinear equations (1) have various applications [15,29,45,46,47,48], for example in solving the -norm problem arising from compressing sensing [49,50,51,52], in variational inequalities problems [53,54], and optimal power flow equations [55] among others.
Viewed statistically, the Newton method and different forms of quasi-Newton methods have been frequently used in solving SNE. Unfortunately, methods of the Newton family are not efficient in solving large-scale SNE problems since they are based on the Jacobian matrix. A similar drawback applies to all methods based on various matrix approximations of the Jacobian matrix in each iteration. Numerous adaptations and improvements of the CG iterative class exist as one solution applicable to large-scale problems. We intend to use the simplest Jacobian approximation using an appropriate diagonal matrix. Our goal is to define computationally effective methods for solving large-scale SNEs using the simplest of Jacobian approximations. The realistic basis for our expectations is the known efficient methods used to optimize individual nonlinear functions.
The remaining sections have the following general structure. The introduction, preliminaries, and motivation are included in Section 1. An overview of methods for solving SNE is presented in Section 1.1 to complete the presentation and explain the motivation. The motivation for the current study is described in Section 1.2. Section 2 proposes several multiple-step-size methods for solving nonlinear equations. Convergence analysis of the proposed methods is investigated in Section 3. Section 4 contains several numerical examples obtained on main standard test problems of various dimensions.
1.2. Motivation
The following standard designations will be used. We adopt the standard notations for the gradient and the Hessian of the objective function . Further, denotes the gradient vector for f in the point . An appropriate identity matrix will be denoted by I.
Our research is motivated by two trends in solving minimization problems. These streams are described as two subsequent parts of the current subsection. A nonlinear multivariate unconstrained minimization problem is defined as
where is a uniformly convex or strictly convex continuously differentiable function bounded from below.
1.2.1. Improved Gradient Descent Methods as Motivation
The most general iteration for solving (7) is expressed as
In (8), presents a new approximation point based on the previous . Positive parameter stays for the steplength value, while presents the search direction vector, which is generated based on the descent condition
The direction vector may be defined in various ways. This vital element is often determined using the features of the function gradient. In one of the earliest optimization schemes, the gradient descent method (GD), this variable is defined as negative of the gradient direction, i.e., . In the line search variant of the Newton method, the search direction presents the solution to the system of nonlinear equations with respect to , where denotes the Hessian matrix.
Unlike traditional GD algorithms for nonlinear unconstrained minimization, which are defined based on a single step size , a class of improved gradient descent () algorithms define the final step size using two or more steps size scaling parameters. Such algorithms were classified and investigated in [56]. Obtained numerical results confirm that the usage of appropriate additional scaling parameters decreases the number of iterations. Typically, one of the parameters is defined using the inexact line search, while the second one is defined using the first terms of the Taylor expansion of the goal function.
A frequently investigated class of minimization methods that can be applied for solving the problem (7) use the following iterative rule
In (9), the parameter represents the step size in the kth iteration. The originality of the iteration (9) is expressed though the acceleration variable . This type of optimization scheme with acceleration parameter was originated in [57]. Later, in [58], the authors justifiably named such models as accelerated gradient descent methods (AGD methods shortly). Further research on this topic confirmed that the acceleration parameter generally improves the performance of the gradient method.
The Newton method with included line search technique is defined by the following iterative rule
wherein stands for the inverse of the Hessian matrix . Let be a symmetric positive definite matrix such that , for arbitrary matrix norm and for a given tolerance . Further, let be a positive definite approximation of the Hessian’s inverse . This approach leads to the relation (11) which is the quasi-Newton method with line search:
Updates of can be defined as solutions to the quasi-Newton equation
where , . There is a class of iterations (11) in which there is no ultimate requirement for the to satisfy the quasi-Newton equation. Such a class of iterates is known as modified Newton methods [59].
The idea in [58] is usage of a proper diagonal approximation of the Hessian
Applying the approximation (13) of , the matrix can be approximated by the simple scalar matrix
In this way, the quasi-Newton line search scheme (11) is transformed into a kind of iteration, called the method and presented in [58] as
The positive quantity is the convergence acceleration parameter which improves the behavior of the generated iterative loop. In [56], methods of the form (15) are termed as improved gradient descent methods (IGD). Commonly, the primary step size is calculated through the features of some inexact line search algorithms. An additional acceleration parameter is usually determined by the Taylor expansion of the goal function. This way of generating acceleration parameter is confirmed as a good choice in [56,58,60,61,62].
The choice in the iterations (15) reveal the iterations
On the other hand, if the acceleration is well-defined, then the step size in the iterations (15) is acceptable in most cases [63], which leads to a kind of the iterative principle:
Barzilai and Borwein in [64] proposed two efficient variants, known as method variants, where the steplength was defined as an approximation . Therefore, the replacement in (17) leads to the iterative rule
The scaling parameter in the basic version is defined upon the minimization of the vector norm , which gives
The steplength in the dual method is produced by the minimization , which yields
The iterations were modified and investigated in a number of publications [65,66,67,68,69,70,71,72,73,74,75,76,77,78,79]. The so-called Scalar Correction () method from [80] proposed the trial steplength in (17) defined by
The iterations are defined as
A kind of steepest descent and iterations relaxed by a parameter were proposed in [81]. The so-called Relaxed Gradient Descent Quasi Newton methods, (shortly and ), expressed by
are introduced in [82]. Here, presents the relaxation parameter. This value is chosen randomly within the interval in the schemes and by the relation
in the algorithm.
1.2.2. Discretization of Gradient Neural Networks (GNN) as Motivation
Our second motivation arises from discretizing gradient neural network (GNN) design. A GNN evolution can be defined in three steps. Further details can be found in [83,84].
The bulleted lists look like this:
- Step1GNN.
- Define underlying error matrix by the interchange of the unknown matrix in the actual problem by the unknown time-varying matrix , which will be approximated over time . The scalar objective of a GNN is just the Frobenius norm of :
- Step2GNN.
- Compute the gradient of the objective .
- Step3GNN.
- Apply the dynamic GNN evolution, which relates the time derivative and direction opposite to the gradient of :
Here, is the activation state variables matrix, is the time, is the gain parameter, and is the time derivative of .
The discretization of by the Euler forward-difference rule is given by
where is the sampling time and , [84]. The approximation (23) transforms the continuous-time GNN evolution (23) into discrete-time iterations
Derived discretization of the GNN design is just a GD method for nonlinear optimization:
where is the step size. So, the step size is defined as a product of two parameters, in which the parameter should be “as large as possible”, while should be “as small as possible”. Such considerations may add additional points of view to multiple parameters gradient optimization methods.
Our idea is to generalize the IGD iterations considered in [56] to the problem of solving SNE. One observable analogy is that the gain parameter from (22) corresponds to the parameter from (15). In addition, the sampling time can be considered as an analogy to the primary step size , which is defined by an inexact line search. Iterations defined as IGD iterations adopted to solve SNE will be called IGDN class.
2. Multiple Step-Size Methods for Solving SNE
The term “multiple step-size methods” is related to the class of gradient-based iterative methods for solving SNE employing a step size defined using two or more appropriately defined parameters. The final goal is to improve the efficiency of classical gradient methods. Two strategies are used in finding approximate parameters: inexact line search and the Taylor expansion.
2.1. IGDN Methods for Solving SNE
Our aim is to simplify the update of the Jacobian . Following (13), it is appropriate to approximate the Jacobian with a diagonal matrix
The final step size in iterations (26) is defined using two step size parameters: and . Iterations that fulfill pattern (26) are an analogy of methods for nonlinear unconstrained optimization and will be termed as class of methods.
Using the experience of nonlinear optimization, the steplength parameter can be defined appropriately using the Taylor expansion of :
On the basis of (25), it is appropriate to use , which implies
It can be noticed that the iterative rule (26) matches with iteration [64]. So, we introduced the method for solving SNE. Our further contribution is the introduction of appropriate restrictions on the scaling parameter. To that end, Theorem 1 reveals values of which decrease the objective functions included in . The inequality means , .
Theorem 1.
If the condition is satisfied, then the iterations (26) satisfy
Proof.
In view of , it follows that . On the other hand, the inequality is satisfied in the case . Now, (28) implies , , which needs to be proven. □
So, appropriate update can be defined as follows:
Now, we are able to generate the value of the next approximation in the form
The step size in (30) can be determined using the nonmonotone line search. More precisely, is defined by , where , and the integer k is defined from the line search
wherein , , are constants, and is a positive sequence such that
Further, an application of Theorem 1 gives the following additional update for the acceleration parameter :
Proof.
Clearly, (34) initiates , and the proof follows from Theorem 1. □
2.2. A Class of Accelerated Double Direction (ADDN) Methods
In [61], an optimization method was defined by the iterative rule
where denotes the value of the steplength parameter, and , are the search directions vectors. The vector is defined as in the SM-method from [58], which gives , and further
We want to apply this strategy in solving (1). First of all, the vector can be defined according to [85]. An appropriate definition of is still open.
We propose the steplength arising from the Taylor expansion (27) and defined as in (29). In addition, it is possible to use an alternative approach. More precisely, in this case, (27) yields to
As a consequence, can be defined utilizing
The problem in (38) is solved using .
We can easily conclude that the next iteration is then generated by
The iterations are defined in Algorithm 2.
| Algorithm 2 The iterations based on (37), (38). |
|
2.3. A Class of Accelerated Double Step Size (ADSSN) Methods
If the steplength is replaced by another steplength in (35), it can be obtained
Here, the parameters are two independent step size values, and the vectors , define the search directions of the proposed iterative scheme (39).
Motivation for this type of iterations arises from [60]. The author of this paper suggested a model of the form (39) with two-step size parameters. This method is actually defined by substituting the parameter from (35) with another step size parameter . Both step size values are computed by independent inexact line search algorithms.
Since we aim to unify search directions, it is possible to use
The final step size, , in the iterations (41) are defined combining three step size parameters: , , and . Again, the parameter is defined using the Taylor series of the form
As a consequence, can be computed by
Theorem 2.
If the condition holds, then the iterations (41) satisfy
Proof.
Clearly, implies . The proof follows from , which ensures . □
In view of Theorem 2, it is reasonable to define the following update for in the method:
Once the accelerated parameter is determined, the values of step size parameters and are defined. Then, it is possible to generate the next point:
In order to derive appropriate values of the parameters and , we investigate the function
The gradient of is equal to
Therefore,
In addition,
Therefore, the function is well-defined.
Step scaling parameters and can be determined using two successive line search procedures (31).
Corollary 2.
The iterations determined by (41) satisfy
Proof.
Clearly, the definition of in (42) implies , and the proof follows from Theorem 2. □
Remark 2.
Step 6 of Algorithm 3 is defined according to Theorem 2.
2.4. Simplified ADSSN
Applying the relation
between the step size parameters and in the iterative rule (41), the iteration is transformed to
The convex combination (45) of step size parameters and that appear in the scheme (41) was originally proposed in [62] and applied in an iterative method for solving the unconstrained optimization problem (7). The assumption (45) represents a trade-off between the steplength parameters and . In [62], it was shown that the induced single step size method shows better performance characteristics in general. The constraint (45) initiates the reduction of the two-parameter rule into a single step size transformed (shortly ) iterative method (46).
We can spot that the method is a modified version of iterations, based on the replacement of the product , from the classical IGDN iteration, by the multiplying factor .
The substitution will be used to simplify the presentation. Here, the accelerated parameter value is calculated by (29).
Corollary 3.
Iterations (46) satisfy
In view of (47), it is possible to conclude
Corollary 4 gives some useful restrictions on this rule.
Corollary 4.
If the condition holds, then the iterations (41) satisfy
Proof.
It follows from Theorem 1 and . □
In view of Corollary 4, it is reasonable to define the following update for in the method:
Then, is equal to
| Algorithm 4 The ADSSN iteration based on (46) and (48). |
3. Convergence Analysis
The level set is defined as
where is an initial approximation.
Therewith, the next assumptions are needed:
- The level set defined in (49) is bounded below.
- Lipschitz continuity holds for the vector function F, i.e., for all and .
- The Jacobian is bounded.
Lemma 1.
Suppose the assumption holds. If the sequence is obtained by the (29) iterations, then
Proof.
Obviously,
Therefore, assuming , it is possible to derive
Previous estimation confirms that (50) is satisfied with r defined by the Lipschitz condition in . □
For the convergence results of the remaining algorithms, we need to prove the finiteness of , , and the remaining results follow trivially.
Lemma 2.
The generated by (29) is bounded by the Lipschitz constant r.
Proof.
Clearly, the complemental step size defined by (29) satisfies
which leads to the conclusion . □
Lemma 3.
The additional step size generated by (34) is bounded as follows:
Proof.
The updating rule (34) satisfies . Continuing in the same way, one concludes
The proof can be finished using . □
Lemma 4.
The additional scaling parameter generated by (42) is bounded as follows:
Lemma 5.
The directions used in (29) and (34) algorithms are descent directions.
Proof.
Since
an application of the scalar product of both sides in (56) with in conjunction with Lemma 2 leads to the following conclusion for (29) iterations:
With Lemma 3, it can be concluded that (34) iterations imply the following:
The proof is complete. □
Lemma 6.
The direction used in algorithms is a descent direction.
Proof.
Since
after using the scalar product of both sides in (59) with and taking into account Lemma 4, we obtain
The proof is complete. □
Theorem 3.
The vector generated by (34) is a descent direction.
Proof.
According to (34), it follows
As a consequence, implies , which means that is a descent direction. □
Theorem 4.
The vector generated by iterations (41) is a descent direction.
Lemma 7.
If the assumptions and are valid, then the norm of the direction vector generated by (29) is bounded.
Proof.
The norm can be estimated as
As an implication of , one can conclude , which in conjunction with Lemma 2 further approximates in (61) by , □
Lemma 8.
If the assumptions and hold, then the norm of the direction vector generated by (34) is bounded.
Proof.
Lemma 9.
If the assumptions and are active, then the norm of the direction vector generated by is bounded.
Proof.
Following the proof used in Lemma 8, it can be verified that
□
Now, we are going to establish the global convergence of (29) and (34) and iterations.
Theorem 5.
If the assumptions and are satisfied and are iterations generated by (29), then
Proof.
The search direction is defined by Starting from the apparent relation
we can conclude
Based on Lemma 7, it can be concluded
By Lemma 5, we can deduce that the norm of the function is decreasing along the direction , which means is true for every k. Based on this fact, it follows
which directly implies
and completes the proof. □
Theorem 6.
If the assumptions and are satisfied and are iterations generated by (34), then (62) is valid.
Proof.
4. Numerical Experience
In order to confirm the efficiency of the presented and processes, we compare them with the iterations from [8]. We explore performances of both variants defined by Algorithm 1, depending on chosen acceleration parameter . These variants are denoted as (29) and (34).
The following values of needed parameters are used:
- algorithms are defined using , , , , and
- method is defined using , , , , and .
We use the following initial points (IP shortly) for the iterations:
, , , ,
, , , .
The considered nine test problems are listed below.
Problem 1 (P1) [86] Nonsmooth Function
, for .
Problem 2 (P2) [87]
, .
Problem 3 (P3) [87] Strictly Convex Function I
, for .
Problem 4 (P4) [87]
,
.
Problem 5 (P5) [87]
for ,
Problem 6 (P6) [87]
for ,
.
Problem 7 (P7) [87]
for ,
.
Problem 8 (P8) [86]
, for .
Problem 9 (P9) [86]
, for .
All tested methods are analyzed concerning three main computational aspects: number of iterations (iter), number of function evaluations (fval), and the CPU time (CPU). Performances of analyzed models are investigated on nine listed problems, applied on eight marked initial points, for five variables: 1000, 5000, 10,000, 50,000, 100,000.
According to obtained results, (29) and (34) have better performances in comparison to the method from [8]. Both variants of algorithms outperform the method in all considered performances. In the next Table 1 (IGDN-EMFD comparisons), we display the best comparative analysis achievements of all methods regarding three tested profiles: iter, fval, and CPU.
Table 1.
IGDN-EMFD comparisons.
The (29) variant gives the best results in 52 out of 360 cases, considering the minimal number of iterations. Further, (34) has the lowest outcomes in 33 out of 230 cases. These variants have the same minimal number of iterations in total, 181 out of 360 cases. All tree models require equal minimal number of iterations in 23 out of 360 cases, while the methods give the minimal number of iterations in 71 out of 360 cases. Considering the needed number of iterations, variants reach the minimal values in 265 out of 360 cases, as stated in the column total.
Regarding the fval metric, the results are as follows: 52 out of 360 cases are in favor to (29), 33 out of 360 with respect to (34), 180 out of 360 when both variants have the same minimal fval, while in 24 out of 360 cases all three methods give equal fval minimal values, and 71 out of 360 are in favor to the method. The total minimal fval values achieved under the application of some variants are the same as the total minimal iter numbers, i.e., 265 out of 360.
Concerning the CPU time, numerical outcomes are absolutely in favor of variants, i.e., in 355 out of 360 cases, while the is faster only in 5 out of 360 outcomes.
Obtained numerical results justify better performance characteristics of the method, which is defined by Algorithm 3, compared to the method. Actually, the scheme outperforms the iteration regarding all analyzed metrics: iter, fval, CPU time, and additionally with respect to the norm of the objective function. The summary review of obtained numerical values is presented in Table 2 (ADSSN-EMFD comparisons).
Table 2.
IADSSN-EMFD comparisons.
Results arranged in Table 2 confirm huge dominance of the scheme in comparison with the method. Considering the number of iterations, the method obtains 282 minimal values, while the wins in only 55 instances. Similar outcomes are recorded regarding the fval profile. The most convincing results are achieved considering the CPU time metric, by which the model outperforms the in 359 out of 360 cases.
This section finishes with a graphical analysis of the performance features of the considered methods. In the subsequent Figure 1, Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6, we display Dolan and Moré [88] performance profiles of compared models in relation to tested metrics: iter, fval, and CPU.
Figure 1.
Performance profile of versus [8] with respect to iter.
Figure 2.
Performance profile of versus [8] with respect to fval.
Figure 3.
Performance profile of versus [8] with respect to CPU.
Figure 4.
Performance profile of versus [8] with respect to iter.
Figure 5.
Performance profile of versus [8] with respect to fval.
Figure 6.
Performance profile of versus [8] with respect to CPU.
Figure 1, Figure 2 and Figure 3 exhibit the clear superiority of (29) and (34) iterations compared to corresponding iterations regarding the analyzed characteristics iter (resp. fval, CPU time). Further, the theoretical equivalence between (29) and (34) implies their identical responses on testing criteria iter and fval, represented in Figure 1 and Figure 2. However, Figure 3 demonstrates slightly better performances of (34) with respect to (29), which implies that the updating rule (34) is slightly better compared to (29) concerning the execution time. So, (34) is computationally the most effective algorithm.
In the rest of this section, we compare and .
5. Conclusions
The traditional gradient descent optimization schemes for solving SNE form a class of methods termed the class. A single step size parameter characterizes methods belonging to that class. We aim to upgrade the traditional iterates by introducing the improved gradient descent iterations (), which include complex steplength values defined by several parameters. In this way, we justified the assumption that applying two or more quantities in defining the composed step size parameters generally improves the performance of an underlying iterative process.
Numerical results confirm the evident superiority of methods in comparison with iterations from [8], which indicates the superiority of methods over traditional methods considering all three analyzed features: iter, fval, and CPU. Confirmation of excellent performance of the presented models is also given through graphically displayed Dolan and Moré’s performance profiles.
The problem of solving SNE by applying some efficient accelerated gradient optimization models is of great interest to the optimization community. In that regard, the question of further upgrading , , and type of methods is still open.
One possibility for further research is proper exploitation of the results presented in Theorems 1–2 in defining proper updates of the scaling parameter . In addition, it will be interesting to examine and exploit similar results in solving classical nonlinear optimization problems.
Author Contributions
Conceptualization, P.S.S. and M.J.P.; methodology, P.S.S., M.J.P. and B.I.; software, B.I. and J.S.; validation, B.I. and J.S.; formal analysis, P.S.S., B.I., A.S. (Abdullah Shah) and J.S.; investigation, X.C., S.L. and J.S.; data curation, B.I., J.S. and A.S. (Abdullah Shah); writing—original draft preparation, P.S.S., J.S. and B.I.S.; writing—review and editing, M.J.P., B.I.S., X.C., A.S. (Alena Stupina) and S.L.; visualization, B.I., J.S. and B.I.S.; project administration, A.S. (Alena Stupina); funding acquisition, A.S. (Alena Stupina). All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the Ministry of Science and Higher Education of the Russian Federation (Grant No. 075-15-2022-1121).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Acknowledgments
Predrag Stanimirović is supported by the Science Fund of the Republic of Serbia, (No. 7750185, Quantitative Automata Models: Fundamental Problems and Applications-QUAM). Predrag Stanimirović acknowledges support Grant No. 451-03-68/2022-14/200124 given by Ministry of Education, Science and Technological Development, Republic of Serbia. Milena J. Petrović acknowledges support Grant No.174025 given by Ministry of Education, Science and Technological Development, Republic of Serbia. Milena J. Petrović acknowledges support from the internal-junior project IJ-0202 given by the Faculty of Sciences and Mathematics, University of Priština in Kosovska Mitrovica, Serbia.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Yuan, G.; Lu, X. A new backtracking inexact BFGS method for symmetric nonlinear equations. Comput. Math. Appl. 2008, 55, 116–129. [Google Scholar] [CrossRef]
- Abubakar, A.B.; Kumam, P. An improved three–term derivative–free method for solving nonlinear equations. Comput. Appl. Math. 2018, 37, 6760–6773. [Google Scholar] [CrossRef]
- Cheng, W. A PRP type method for systems of monotone equations. Math. Comput. Model. 2009, 50, 15–20. [Google Scholar] [CrossRef]
- Hu, Y.; Wei, Z. Wei–Yao–Liu conjugate gradient projection algorithm for nonlinear monotone equations with convex constraints. Int. J. Comput. Math. 2015, 92, 2261–2272. [Google Scholar] [CrossRef]
- La Cruz, W. A projected derivative–free algorithm for nonlinear equations with convex constraints. Optim. Methods Softw. 2014, 29, 24–41. [Google Scholar] [CrossRef]
- La Cruz, W. A spectral algorithm for large–scale systems of nonlinear monotone equations. Numer. Algorithms 2017, 76, 1109–1130. [Google Scholar] [CrossRef]
- Papp, Z.; Rapajić, S. FR type methods for systems of large–scale nonlinear monotone equations. Appl. Math. Comput. 2015, 269, 816–823. [Google Scholar] [CrossRef]
- Halilu, A.S.; Waziri, M.Y. En enhanced matrix-free method via double steplength approach for solving systems of nonlinear equations. Int. J. Appl. Math. Res. 2017, 6, 147–156. [Google Scholar] [CrossRef]
- Halilu, A.S.; Waziri, M.Y. A transformed double steplength method for solving large-scale systems of nonlinear equations. J. Numer. Math. Stochastics 2017, 9, 20–32. [Google Scholar]
- Waziri, M.Y.; Muhammad, H.U.; Halilu, A.S.; Ahmed, K. Modified matrix-free methods for solving system of nonlinear equations. Optimization 2021, 70, 2321–2340. [Google Scholar] [CrossRef]
- Osinuga, I.A.; Dauda, M.K. Quadrature based Broyden-like method for systems of nonlinear equations. Stat. Optim. Inf. Comput. 2018, 6, 130–138. [Google Scholar] [CrossRef]
- Muhammad, K.; Mamat, M.; Waziri, M.Y. A Broyden’s-like method for solving systems of nonlinear equations. World Appl. Sci. J. 2013, 21, 168–173. [Google Scholar]
- Ullah, N.; Sabi’u, J.; Shah, A. A derivative–free scaling memoryless Broyden–Fletcher–Goldfarb–Shanno method for solving a system of monotone nonlinear equations. Numer. Linear Algebra Appl. 2021, 28, e2374. [Google Scholar] [CrossRef]
- Abubakar, A.B.; Kumam, P. A descent Dai–Liao conjugate gradient method for nonlinear equations. Numer. Algorithms 2019, 81, 197–210. [Google Scholar] [CrossRef]
- Aji, S.; Kumam, P.; Awwal, A.M.; Yahaya, M.M.; Kumam, W. Two Hybrid Spectral Methods With Inertial Effect for Solving System of Nonlinear Monotone Equations With Application in Robotics. IEEE Access 2021, 9, 30918–30928. [Google Scholar] [CrossRef]
- Dauda, M.K.; Usman, S.; Ubale, H.; Mamat, M. An alternative modified conjugate gradient coefficient for solving nonlinear system of equations. Open J. Sci. Technol. 2019, 2, 5–8. [Google Scholar] [CrossRef]
- Zheng, L.; Yang, L.; Liang, Y. A conjugate gradient projection method for solving equations with convex constraints. J. Comput. Appl. Math. 2020, 375, 112781. [Google Scholar] [CrossRef]
- Waziri, M.Y.; Aisha, H.A. A diagonal quasi-Newton method for system of nonlinear equations. Appl. Math. Comput. Sci. 2014, 6, 21–30. [Google Scholar]
- Waziri, M.Y.; Leong, W.J.; Hassan, M.A.; Monsi, M. Jacobian computation-free Newton’s method for systems of nonlinear equations. J. Numer. Math. Stochastics 2010, 2, 54–63. [Google Scholar]
- Waziri, M.Y.; Majid, Z.A. An improved diagonal Jacobian approximation via a new quasi-Cauchy condition for solving large-scale systems of nonlinear equations. J. Appl. Math. 2013, 2013, 875935. [Google Scholar] [CrossRef]
- Abdullah, H.; Waziri, M.Y.; Yusuf, S.O. A double direction conjugate gradient method for solving large-scale system of nonlinear equations. J. Math. Comput. Sci. 2017, 7, 606–624. [Google Scholar]
- Yan, Q.-R.; Peng, X.-Z.; Li, D.-H. A globally convergent derivative-free method for solving large-scale nonlinear monotone equations. J. Comput. Appl. Math. 2010, 234, 649–657. [Google Scholar] [CrossRef]
- Leong, W.J.; Hassan, M.A.; Yusuf, M.W. A matrix-free quasi-Newton method for solving large-scale nonlinear systems. Comput. Math. Appl. 2011, 62, 2354–2363. [Google Scholar] [CrossRef]
- Waziri, M.Y.; Leong, W.J.; Mamat, M. A two-step matrix-free secant method for solving large-scale systems of nonlinear equations. J. Appl. Math. 2012, 2012, 348654. [Google Scholar] [CrossRef]
- Waziri, M.Y.; Leong, W.J.; Hassan, M.A.; Monsi, M. A new Newton’s Method with diagonal Jacobian approximation for systems of nonlinear equations. J. Math. Stat. 2010, 6, 246–252. [Google Scholar] [CrossRef]
- Waziri, M.Y.; Leong, W.J.; Mamat, M.; Moyi, A.U. Two-step derivative-free diagonally Newton’s method for large-scale nonlinear equations. World Appl. Sci. J. 2013, 21, 86–94. [Google Scholar]
- Yakubu, U.A.; Mamat, M.; Mohamad, M.A.; Rivaie, M.; Sabi’u, J. A recent modification on Dai–Liao conjugate gradient method for solving symmetric nonlinear equations. Far East J. Math. Sci. 2018, 103, 1961–1974. [Google Scholar] [CrossRef]
- Uba, L.Y.; Waziri, M.Y. Three-step derivative-free diagonal updating method for solving large-scale systems of nonlinear equations. J. Numer. Math. Stochastics 2014, 6, 73–83. [Google Scholar]
- Zhou, Y.; Wu, Y.; Li, X. A New Hybrid PRPFR Conjugate Gradient Method for Solving Nonlinear Monotone Equations and Image Restoration Problems. Math. Probl. Eng. 2020, 2020, 6391321. [Google Scholar] [CrossRef]
- Waziri, M.Y.; Leong, W.J.; Mamat, M. An efficient solver for systems of nonlinear equations with singular Jacobian via diagonal updating. Appl. Math. Sci. 2010, 4, 3403–3412. [Google Scholar]
- Waziri, M.Y.; Leong, W.J.; Hassan, M.A. Diagonal Broyden-like method for large-scale systems of nonlinear equations. Malays. J. Math. Sci. 2012, 6, 59–73. [Google Scholar]
- Abubakar, A.B.; Sabi’u, J.; Kumam, P.; Shah, A. Solving nonlinear monotone operator equations via modified SR1 update. J. Appl. Math. Comput. 2021, 67, 343–373. [Google Scholar] [CrossRef]
- Grosan, C.; Abraham, A. A new approach for solving nonlinear equations systems. IEEE Trans. Syst. Man Cybern. 2008, 38, 698–714. [Google Scholar] [CrossRef]
- Dehghan, M.; Hajarian, M. New iterative method for solving nonlinear equations with fourth-order convergence. Int. J. Comput. Math. 2010, 87, 834–839. [Google Scholar] [CrossRef]
- Dehghan, M.; Hajarian, M. Fourth-order variants of Newton’s method without second derivatives for solving nonlinear equations. Eng. Comput. 2012, 29, 356–365. [Google Scholar] [CrossRef]
- Kaltenbacher, B.; Neubauer, A.; Scherzer, O. Iterative Regularization Methods for Nonlinear III—Posed Problems; De Gruyter: Berlin, Germany; New York, NY, USA, 2008. [Google Scholar]
- Wang, Y.; Yuan, Y. Convergence and regularity of trust region methods for nonlinear ill-posed problems. Inverse Probl. 2005, 21, 821–838. [Google Scholar] [CrossRef]
- Dehghan, M.; Hajarian, M. Some derivative free quadratic and cubic convergence iterative formulas for solving nonlinear equations. Comput. Appl. Math. 2010, 29, 19–30. [Google Scholar] [CrossRef]
- Dehghan, M.; Hajarian, M. On some cubic convergence iterative formulae without derivatives for solving nonlinear equations. Int. J. Numer. Methods Biomed. Eng. 2011, 27, 722–731. [Google Scholar] [CrossRef]
- Dehghan, M.; Shirilord, A. Accelerated double-step scale splitting iteration method for solving a class of complex symmetric linear systems. Numer. Algorithms 2020, 83, 281–304. [Google Scholar] [CrossRef]
- Dehghan, M.; Shirilord, A. A generalized modified Hermitian and skew-Hermitian splitting (GMHSS) method for solving complex Sylvester matrix equation. Appl. Math. Comput. 2019, 348, 632–651. [Google Scholar] [CrossRef]
- Bellavia, S.; Gurioli, G.; Morini, B.; Toint, P.L. Trust-region algorithms: Probabilistic complexity and intrinsic noise with applications to subsampling techniques. EURO J. Comput. Optim. 2022, 10, 100043. [Google Scholar] [CrossRef]
- Bellavia, S.; Krejić, N.; Morini, B.; Rebegoldi, S. A stochastic first-order trust-region method with inexact restoration for finite-sum minimization. Comput. Optim. Appl. 2023, 84, 53–84. [Google Scholar] [CrossRef]
- Bellavia, S.; Krejić, N.; Morini, B. Inexact restoration with subsampled trust-region methods for finite-sum minimization. Comput. Optim. Appl. 2020, 76, 701–736. [Google Scholar] [CrossRef]
- Eshaghnezhad, M.; Effati, S.; Mansoori, A. A Neurodynamic Model to Solve Nonlinear Pseudo-Monotone Projection Equation and Its Applications. IEEE Trans. Cybern. 2017, 47, 3050–3062. [Google Scholar] [CrossRef]
- Meintjes, K.; Morgan, A.P. A methodology for solving chemical equilibrium systems. Appl. Math. Comput. 1987, 22, 333–361. [Google Scholar] [CrossRef]
- Crisci, S.; Piana, M.; Ruggiero, V.; Scussolini, M. A regularized affine–acaling trust–region method for parametric imaging of dynamic PET data. SIAM J. Imaging Sci. 2021, 14, 418–439. [Google Scholar] [CrossRef]
- Bonettini, S.; Zanella, R.; Zanni, L. A scaled gradient projection method for constrained image deblurring. Inverse Probl. 2009, 25, 015002. [Google Scholar] [CrossRef]
- Liu, J.K.; Du, X.L. A gradient projection method for the sparse signal reconstruction in compressive sensing. Appl. Anal. 2018, 97, 2122–2131. [Google Scholar] [CrossRef]
- Liu, J.K.; Li, S.J. A projection method for convex constrained monotone nonlinear equations with applications. Comput. Math. Appl. 2015, 70, 2442–2453. [Google Scholar] [CrossRef]
- Xiao, Y.; Zhu, H. A conjugate gradient method to solve convex constrained monotone equations with applications in compressive sensing. J. Math. Anal. Appl. 2013, 405, 310–319. [Google Scholar] [CrossRef]
- Awwal, A.M.; Wang, L.; Kumam, P.; Mohammad, H.; Watthayu, W. A Projection Hestenes–Stiefel Method with Spectral Parameter for Nonlinear Monotone Equations and Signal Processing. Math. Comput. Appl. 2020, 25, 27. [Google Scholar] [CrossRef]
- Fukushima, M. Equivalent differentiable optimization problems and descent methods for asymmetric variational inequality problems. Math. Program. 1992, 53, 99–110. [Google Scholar] [CrossRef]
- Qian, G.; Han, D.; Xu, L.; Yang, H. Solving nonadditive traffic assignment problems: A self-adaptive projection–auxiliary problem method for variational inequalities. J. Ind. Manag. Optim. 2013, 9, 255–274. [Google Scholar] [CrossRef]
- Ghaddar, B.; Marecek, J.; Mevissen, M. Optimal power flow as a polynomial optimization problem. IEEE Trans. Power Syst. 2016, 31, 539–546. [Google Scholar] [CrossRef]
- Ivanov, B.; Stanimirović, P.S.; Milovanović, G.V.; Djordjević, S.; Brajević, I. Accelerated multiple step-size methods for solving unconstrained optimization problems. Optim. Methods Softw. 2021, 36, 998–1029. [Google Scholar] [CrossRef]
- Andrei, N. An acceleration of gradient descent algorithm with backtracking for unconstrained optimization. Numer. Algorithms 2006, 42, 63–73. [Google Scholar] [CrossRef]
- Stanimirović, P.S.; Miladinović, M.B. Accelerated gradient descent methods with line search. Numer. Algorithms 2010, 54, 503–520. [Google Scholar] [CrossRef]
- Sun, W.; Yuan, Y.-X. Optimization Theory and Methods: Nonlinear Programming; Springer: New York, NY, USA, 2006. [Google Scholar]
- Petrović, M.J. An Accelerated Double Step Size model in unconstrained optimization. Appl. Math. Comput. 2015, 250, 309–319. [Google Scholar] [CrossRef]
- Petrović, M.J.; Stanimirović, P.S. Accelerated Double Direction method for solving unconstrained optimization problems. Math. Probl. Eng. 2014, 2014, 965104. [Google Scholar] [CrossRef]
- Stanimirović, P.S.; Milovanović, G.V.; Petrović, M.J.; Kontrec, N. A Transformation of accelerated double step size method for unconstrained optimization. Math. Probl. Eng. 2015, 2015, 283679. [Google Scholar] [CrossRef]
- Nocedal, J.; Wright, S.J. Numerical Optimization; Springer: New York, NY, USA, 1999. [Google Scholar]
- Barzilai, J.; Borwein, J.M. Two-point step size gradient method. IMA J. Numer. Anal. 1988, 8, 141–148. [Google Scholar] [CrossRef]
- Dai, Y.H. Alternate step gradient method. Optimization 2003, 52, 395–415. [Google Scholar] [CrossRef]
- Dai, Y.H.; Fletcher, R. On the asymptotic behaviour of some new gradient methods. Math. Program. 2005, 103, 541–559. [Google Scholar] [CrossRef]
- Dai, Y.H.; Liao, L.Z. R-linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 2002, 22, 1–10. [Google Scholar] [CrossRef]
- Dai, Y.H.; Yuan, J.Y.; Yuan, Y. Modified two-point step-size gradient methods for unconstrained optimization. Comput. Optim. Appl. 2002, 22, 103–109. [Google Scholar] [CrossRef]
- Dai, Y.H.; Yuan, Y. Alternate minimization gradient method. IMA J. Numer. Anal. 2003, 23, 377–393. [Google Scholar] [CrossRef]
- Dai, Y.H.; Yuan, Y. Analysis of monotone gradient methods. J. Ind. Manag. Optim. 2005, 1, 181–192. [Google Scholar] [CrossRef]
- Dai, Y.H.; Zhang, H. Adaptive two-point step size gradient algorithm. Numer. Algorithms 2001, 27, 377–385. [Google Scholar] [CrossRef]
- Raydan, M. On the Barzilai and Borwein choice of steplength for the gradient method. IMA J. Numer. Anal. 1993, 13, 321–326. [Google Scholar] [CrossRef]
- Raydan, M. The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim. 1997, 7, 26–33. [Google Scholar] [CrossRef]
- Vrahatis, M.N.; Androulakis, G.S.; Lambrinos, J.N.; Magoulas, G.D. A class of gradient unconstrained minimization algorithms with adaptive step-size. J. Comput. Appl. Math. 2000, 114, 367–386. [Google Scholar] [CrossRef]
- Yuan, Y. A new step size for the steepest descent method. J. Comput. Math. 2006, 24, 149–156. [Google Scholar]
- Frassoldati, G.; Zanni, L.; Zanghirati, G. New adaptive step size selections in gradient methods. J. Ind. Manag. Optim. 2008, 4, 299–312. [Google Scholar] [CrossRef]
- Serafino, D.; Ruggiero, V.; Toraldo, G.; Zanni, L. On the steplength selection in gradient methods for unconstrained optimization. Appl. Math. Comput. 2018, 318, 176–195. [Google Scholar] [CrossRef]
- Crisci, S.; Porta, F.; Ruggiero, V.; Zanni, L. Spectral properties of Barzilai–Borwein rules in solving singly linearly constrained optimization problems subject to lower and upper bounds. SIAM J. Optim. 2020, 30, 1300–1326. [Google Scholar] [CrossRef]
- Crisci, S.; Porta, F.; Ruggiero, V.; Zanni, L. Hybrid limited memory gradient projection methods for box–constrained optimization problems. Comput. Optim. Appl. 2023, 84, 151–189. [Google Scholar] [CrossRef]
- Miladinović, M.; Stanimirović, P.S.; Miljković, S. Scalar Correction method for solving large scale unconstrained minimization problems. J. Optim. Theory Appl. 2011, 151, 304–320. [Google Scholar] [CrossRef]
- Raydan, M.; Svaiter, B.F. Relaxed steepest descent and Cauchy-Barzilai-Borwein method. Comput. Optim. Appl. 2002, 21, 155–167. [Google Scholar] [CrossRef]
- Djordjević, S.S. Two modifications of the method of the multiplicative parameters in descent gradient methods. Appl. Math. Comput. 2012, 218, 8672–8683. [Google Scholar]
- Zhang, Y.; Yi, C. Zhang Neural Networks and Neural-Dynamic Method; Nova Science Publishers, Inc.: New York, NY, USA, 2011. [Google Scholar]
- Zhang, Y.; Ma, W.; Cai, B. From Zhang neural network to Newton iteration for matrix inversion. IEEE Trans. Circuits Syst. I Regul. Pap. 2009, 56, 1405–1415. [Google Scholar] [CrossRef]
- Djuranovic-Miličić, N.I.; Gardašević-Filipović, M. A multi-step curve search algorithm in nonlinear optimization - nondifferentiable case. Facta Univ. Ser. Math. Inform. 2010, 25, 11–24. [Google Scholar]
- Zhou, W.J.; Li, D.H. A globally convergent BFGS method for nonlinear monotone equations without any merit functions. Math. Comput. 2008, 77, 2231–2240. [Google Scholar] [CrossRef]
- La Cruz, W.; Martínez, J.; Raydan, M. Spectral residual method without gradient information for solving large-scale nonlinear systems of equations. Math. Comput. 2006, 75, 1429–1448. [Google Scholar] [CrossRef]
- Dolan, E.; Moré, J. Benchmarking optimization software with performance profiles. Math. Program. 2002, 91, 201–213. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).