Next Article in Journal
A Model of Hepatitis B Viral Dynamics with Delays
Next Article in Special Issue
A Block Hybrid Method with Equally Spaced Grid Points for Third-Order Initial Value Problems
Previous Article in Journal
A Mathematical Structure Underlying Sentences and Its Connection with Short–Term Memory
Previous Article in Special Issue
An Efficient Bi-Parametric With-Memory Iterative Method for Solving Nonlinear Equations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Convergence and Stability Improvement of Quasi-Newton Methods by Full-Rank Update of the Jacobian Approximates

Applied Informatics and Applied Mathematics Doctoral School, Óbuda University, Bécsi út 96/B, 1034 Budapest, Hungary
AppliedMath 2024, 4(1), 143-181; https://doi.org/10.3390/appliedmath4010008
Submission received: 15 November 2023 / Revised: 18 December 2023 / Accepted: 12 January 2024 / Published: 26 January 2024
(This article belongs to the Special Issue Contemporary Iterative Methods with Applications in Applied Sciences)

Abstract

:
A system of simultaneous multi-variable nonlinear equations can be solved by Newton’s method with local q-quadratic convergence if the Jacobian is analytically available. If this is not the case, then quasi-Newton methods with local q-superlinear convergence give solutions by approximating the Jacobian in some way. Unfortunately, the quasi-Newton condition (Secant equation) does not completely specify the Jacobian approximate in multi-dimensional cases, so its full-rank update is not possible with classic variants of the method. The suggested new iteration strategy (“T-Secant”) allows for a full-rank update of the Jacobian approximate in each iteration by determining two independent approximates for the solution. They are used to generate a set of new independent trial approximates; then, the Jacobian approximate can be fully updated. It is shown that the T-Secant approximate is in the vicinity of the classic quasi-Newton approximate, providing that the solution is evenly surrounded by the new trial approximates. The suggested procedure increases the superlinear convergence of the Secant method φ S = 1.618 to super-quadratic φ T = φ S + 1 = 2.618 and the quadratic convergence of the Newton method φ N = 2 to cubic φ T = φ N + 1 = 3 in one-dimensional cases. In multi-dimensional cases, the Broyden-type efficiency (mean convergence rate) of the suggested method is an order higher than the efficiency of other classic low-rank-update quasi-Newton methods, as shown by numerical examples on a Rosenbrock-type test function with up to 1000 variables. The geometrical representation (hyperbolic approximation) in single-variable cases helps explain the basic operations, and a vector-space description is also given in multi-variable cases.

1. Introduction

It is a common task in numerous disciplines (e.g., physics, chemistry, biology, economics, robotics, and engineering, social, and medical sciences) to construct a mathematical model with some parameters for an observed system which gives an observable response to an observable external effect. The unknown parameters of the mathematical model are determined so that the difference between the observed and the simulated system responses of the mathematical model for the same external effect is minimized (see e.g., [1,2,3,4,5,6,7,8,9,10,11]). This problem leads to finding the zero of a residual function (difference between observed and simulated responses). The rapidly accelerating computational tools and the increasing complexity of mathematical models with more and more efficient numerical algorithms provide a chance for better understanding and control of the surrounding nature.
As referenced above, root-finding methods are essential for solving a great class of numerical problems, such as data fitting problems with m sampled data D = D j ( j = 1 , , m ) and n adjustable parameters x = x i ( i = 1 , , n ) with m n . This leads to the problem of least-squares solving of an over-determined system of nonlinear equations,
f x = 0 ,
( x R n and f : R n R m ( m n )), where the solution x * minimizes the difference
f x 2 = ϕ x D 2
between the data D and a computational model function ϕ x . The system of simultaneous multi-variable nonlinear Equation (1) can be solved by Newton’s method when the derivatives of f x are available analytically and a new iterate,
x p + 1 = x p J p 1 f p ,
that follows x p can be determined, where f p = f ( x p ) is the function value and J p = J ( x p ) is the Jacobian matrix of f at x p in the p t h iteration step. Newton’s method is one of the most widely used algorithm, with very attractive theoretical and practical properties and with some limitations. The computational costs of Newton’s method is high, since the Jacobian J p and the solution to the linear system (3) must be computed at each iteration. In many cases, explicit formulae for the function f x are not available ( f x can be a residual function between a system model response and an observation of that system response) and the Jacobian J p can only be approximated. The classic Newton’s method can be modified in many different ways. The partial derivatives of the Jacobian may be replaced by suitable difference quotients (discretized Newton iteration, see [12,13]),
f x = f j x + x k d k f j x x k = f j x x k ,
k = 1 , , n , j = 1 , , m with n additional function value evaluations, where d k is the k t h Cartesian unit vector. However, it is difficult to choose the stepsize x . If any x k is too large, then Expression (4) can be a bad approximation to the Jacobian, so the iteration converges much more slowly if it converges at all. On the other hand, if any x k is too small, then f j x 0 , and cancellations can occur which reduce the accuracy of the difference quotients (4) (see [14]). The suggested procedure (“T-Secant”) may resemble the discretized Newton iteration, but it uses a systematic procedure to determine suitable stepsizes for the Jacobian approximates. Another modification is the inexact Newton approach, where the nonlinear equation is solved by an iterative linear solver (see [15,16,17]).
It is well-known that the local convergence of Newton’s method is q-quadratic if the initial trial approximate x 0 is close enough to the solution x * , J ( x * ) is non-singular, and J ( x ) satisfies the Lipschitz condition
J ( x ) J ( x * ) L x x *
for all x close enough to x * . However, in many cases, the function f x is not an analytical function, the partial derivatives are not known, and Newton’s method cannot be applied. Quasi-Newton methods are defined as the generalization of Equation (3) as
x p + 1 = x p B p 1 f p
and
B p x p = f p
where
x p = x p + 1 x p
is the iteration step length and B p is expected to be the approximate to the Jacobian matrix J p without computing derivatives in most cases. The new iterate is then given as
x p + 1 = x p + x p
and B p is updated to B p + 1 according to the specific quasi-Newton method. Martinez [18] has made a thorough survey on practical quasi-Newton methods. The iterative methods of the form (6) that satisfy the equation
B p + 1 x p = f p + 1 f p
for all k = 0 , 1 , 2 , are called “quasi-Newton” methods, and Equation (10) is called the fundamental equation of quasi-Newton methods (“quasi-Newton condition” or “secant equation”). However, the quasi-Newton condition does not uniquely specify the updated Jacobian approximate B p + 1 , and further constraints are needed. Different methods offer their own specific solution. One new quasi-Newton approximate x p + 1 will never allow for a full-rank update of B p + 1 because it is an n × n matrix and only n components can be determined from the Secant equation, making it an under-determined system of equations for the elements B i , j , p + 1   i , j = 1 , n if n > 1 .
The suggested new strategy is based on Wolfe’s [19] formulation of a generalized Secant method. The function
x f ( x ) , w h e r e x R n a n d f : R n R n , n > 1
is locally replaced by linear interpolation through n + 1 interpolation base points A p , B p , k k = 1 , , n . The variables x and the function values f are separated into two equations and an auxiliary variable q A is introduced. Then the Jacobian approximate matrix B p is split into a variable difference X p and a function value difference F p matrix, and the zero x p + 1 A of the p th interpolation plane is determined from the quasi-Newton condition (7) as
x p + 1 A f p A = X p F p q p A
where
x p + 1 A = x p + 1 A x p A .
The auxiliary variable q p A is determined from the second row of Equation (12), and the new quasi-Newton approximate x p + 1 A comes from the first row of this equation. Popper [20] made further generalization for functions
x f ( x ) , w h e r e x R n a n d f : R n R m , m n > 1
and suggested the use of a pseudo-inverse solution for the over-determined system of linear equations (where n is the number of unknowns and m is the number of function values). The auxiliary variable q p A is determined from the second row of Equation (12) as
q p A = F p + f p A ,
where . + stands for the pseudo-inverse, and the new quasi-Newton approximate x p + 1 A comes from the first row of this equation as
x p + 1 A = x p A X p F p + f p A .
The new iteration continues with n + 1 new base points A p + 1 , B p + 1 , k   k = 1 , , n . Details are given in Section 3.
Ortega and Rheinboldt [12] stated that a necessary condition of convergence is that the interpolation base points should be linearly independent and they have to be “in general position” through the whole iteration process. Experiences show that the low-rank update procedures often lead to a dead end because this condition is not satisfied. The purpose of the suggested new iteration strategy is to determine linearly independent base points providing that the Ortega and Rheinboldt condition is satisfied. The basic idea of the procedure is that another new approximate x p + 1 B is determined from the previous approximate x p + 1 A and a new system of n linearly independent base points is generated. The basic equations of the Wolfe–Popper formulation (Equation (12)) were modified as
x p + 1 f p A = T p X 0 0 T p F X p F p q p B
where
x p + 1 = x p + 1 B x p + 1 A
T p X = d i a g t p , i X = diag x p + 1 , i B x p + 1 , i A x p + 1 , i A x p , i A
and
T p F = diag t p , j F = d i a g f p + 1 , j B f p + 1 , j A f p + 1 , j A f p , j A .
The auxiliary variable q p B is determined from the second row of Equation (17) as
q p B = F p + T p F 1 f p A = j = 1 m F p , i , j + f p , j A t p , j F ,
and the new quasi-Newton approximate x p + 1 B comes from the first row of Equation (17) as
x p + 1 , i B = x p + 1 , i A + x p , i A 2 x p , i q p , i B = x p + 1 , i A x p , i A 2 j = 1 m x p , i F p , i , j + f p , j A t p , j F
i = 1 , , n . The details of the proposed new strategy (“T-Secant method”) are given in Section 4. It is different from the traditional Secant method in that all interpolation base points A p and B p , k k = 1 , , n are updated in each iteration (full-rank update), providing n + 1 new base points A p + 1 and B p + 1 , k for the next iteration. The key idea of the method is very simple. The function value f p + 1 A (that can be determined from the new Secant approximate x p + 1 A ) measures the “distance” of the approximate x p + 1 A from the root x * (if f p + 1 A = 0 , then the distance is zero and x p + 1 A = x * ). The T-Secant method uses this information so that the basic equations of the Secant method are modified by a scaling transformation T , and an additional new estimate x p + 1 B is determined. Then, the new approximates x p + 1 A and x p + 1 B are used to construct the n + 1 new interpolation base points A p + 1 and B p + 1 , k .
The T-Secant procedure has been worked out for solving multi-variable problems. It can also be applied for solving single-variable ones, however. The geometrical representation of the latter provides a good view with which to explain the mechanism of the procedure as shown in Section 5. It is a surprising result that the T-Secant modification corresponds to a hyperbolic function
z p ( x ) = a p x x p + 1 A + f p A ,
the zero of which gives the second approximate x p + 1 B in the single-variable case. A vector space interpretation is also given for the multvariable case in this section.
The general formulations of the proposed method are given in Section 6 and compared with the basic formula of classic quasi-Newton methods. It follows from Equation (16) that
S p x p A = f p A ,
where
S p = F p X p 1 = f 1 , 1 , p x 1 f n , 1 , p x n f 1 , m , p x 1 f n , m , p x n = f k , , j , p x i , p
is the Jacobian approximate of the traditional Secant method. It follows from the first and second rows of Equation (17) of the T-Secant method and from the Definition (25) of S p that
S T , p x p A = f p A
is the modified Secant equation, where
S T , p = T p F S p T p X 1 = T p F F p X p 1 T p X 1 = t j , p F t i , p X f k , j , p x i , p .
It is well known that the single-variable Secant method has asymptotic convergence for sufficiently good initial approximates x A and x B if f x does not vanish in x x A x B and f x is continuous at least in a neighborhood of the zero x * . The super-linear convergence property has been proved in different ways, and it is known that the order of convergence is α = 1 + 5 / 2 = φ (where φ = 1.618 is the golden ratio). The convergence order of the proposed method is determined in Section 7, and it is shown that it has super-quadratic convergence with rate α T S = φ + 1 = φ 2 = 2.618 in the single variable case. It is also shown for the multi-variable case in this section that the second approximate x p + 1 B will always be in the vicinity of the classic Secant approximate x p + 1 A , providing that the solution x * will evenly be surrounded by the n + 1 new trial approximates and matrix S p + 1 will be well-conditioned.
A step-by-step algorithm is given in Section 8, and the results of numerical tests with a Rosenbrock-type test function demonstrates the stability of the proposed strategy in Section 9 for up to 1000 unknown variables. The Broyden-type efficiency (mean convergence rate) of the proposed method is studied in a multi-variable case in Section 10, and it is compared with other classic rank-one update and line-search methods on the basis of available test data. It is shown in Section 11 how the new procedure can be used to improve the convergence of other classic multi-variable root finding methods (Newton–Raphson and Broyden methods). Concluding remarks are summarized in Section 12. Among others, the method has been used for the identification of vibrating mechanical systems (foundation pile driving [21,22], percussive drilling [23]) and found to be very stable and efficient even in cases with a large number of unknowns.
The proposed method needs n + 1 function value evaluations in each iteration, and it is not using the derivative information of the function like the Newton–Raphson method is doing. On the other hand, it needs n more function evaluations than the traditional secant method needs in each iteration. However, this is an apparent disadvantage, as the convergence rate considerably increases ( α TS 2.618 ). Furthermore, the stability and the efficiency of the procedure has been greatly improved.

2. Notations

Vectors and matrices are denoted by bold-face letters. Subscripts refer to components of vectors and matrices; superscripts A and B refer to interpolation base points. Notations A and B are introduced so as to be able to clearly distinguish between the two new approximates x A and x B . Vectors and matrices may also be given by their general elements. Δ refers to a difference of two elements. x and X denote unknown quantities. f and F denote function values and matrices. q , q , t , and T denote multiplier scalars, vectors, and matrices. e , ε , and E denote approximate error. p is iteration counter, α is convergence rate, and ε * is termination criterion. n is the number of unknowns, m is the number of function values, and i , j , k , and l are running indexes of matrix columns and rows. Superscripts S and TS refer to the traditional Secant method and to the proposed T-Secant method, receptively.

3. Secant Method

The history of the Secant method in single-variable cases is several thousands of years old, its origin was found in ancient times. The idea of finding the scalar root x *  of a scalar nonlinear function
x f ( x ) ( w h e r e x R 1 a n d f : R 1 R 1 )
by successive local replacement of the function by linear interpolation (secant line) gives a simple and efficient numerical procedure. It has the advantage that it does not need the calculation of function derivatives, it only uses function values, and the order of asymptotic convergence is super-linear with a convergence of rate α S 1.618 .
The function f x is locally replaced by linear interpolation (secant line) through interpolation base points A and B , and the zero x A of the Secant line is determined as an approximate to the zero x * of the function. The next iteration continues with new base points, selected from available old ones. Wolfe [19] extended the scalar procedure to multidimensional
x f ( x ) , w h e r e x R n a n d f : R n R n , n > 1 ,
and Popper [20] made a further generalization
x f ( x ) , w h e r e x R n a n d f : R n R m , m n > 1
and suggested use of pseudo-inverse solution for the over-determined system of linear equations (where n is the number of unknowns and m is the number of function values).
The zero x * of the nonlinear function x f ( x ) has to be found, where x R n and f : R n R m . Let x A be the initial trial for the zero x * , and let the function f x be linearly interpolated through n + 1 interpolation base points A x A f A and B k x k B f k B k = 1 , , n and be approximated/replaced by the interpolation “plane” y x near x * . One of the key ideas of the suggested numerical procedure is that interpolation base points B k x k B f k B are constructed by individually incrementing the coordinates x i A of the initial trial x A by an “initial trial increment” value x i i = 1 , , n as
x k , i B = x i A + x i ,
or in vector form as
x k B = x A + x k d k ,
where d k is the k t h Cartesian unit vector, as shown in Figure 1.
It follows from this special construction of the initial trials x k B that x k , i B x i A = 0 for i k and x k , i B x i A = x i for i = k providing that
x = x i , i B x i A = x i
is the “initial trial increment vector”. Let
f k = f k , j = f k , j B f j A
j = 1 , , m . Any point on the n dimensional interpolation plane y x can be expressed as
x y ( x ) = x A f A + X F q A ,
where
X = x k B x A = x 1 , 1 B x 1 A x n , 1 B x 1 A x 1 , n B x n A x n , n B x n A
F = f k B f A = f 1 , 1 B f 1 A f n , 1 B f 1 A f 1 , m B f m A f n , m B f m A
k = 1 , , n , q A is a vector with n scalar multipliers q i A i = 1 , , n , and as a consequence of Equation (32),
X = x k = x 1 0 0 x n = d i a g x k
is a diagonal matrix that has great computational advantage. It also follows from Definition (34) that
F = f k = f 1 , 1 f n , 1 f 1 , m f n , m .
Let x p + 1 A be the zero of the n-dimensional interpolation plane y p x with interpolation base points A p x p A f p A and B k , p x k , p B f k , p B in the p th iteration. Then, it follows from the zero condition that
y p ( x p + 1 A ) = 0
and from the second row of Equation (35) that
F p q p A = f p A ,
and the vector q p A of multipliers q p , i A can be expressed as
q p A = F p + f p A = j = 1 m F p , i , j + f p , j A ,
where . + stands for the pseudo-inverse. Let
x p A f p A = x p + 1 A x p A f p + 1 A f p A
be the iteration stepsize of the Secant method; then, it follows from the first row of Equation (35) and from Equation (42) that
x p A = X p q p A = X p F p + f p A ,
and from Definition (43), it follows that
x p + 1 A f p + 1 A = x p A + x p A f p A + f p A ,
and the new Secant approximate x p + 1 A can be expressed from Equation (44) as
x p + 1 A = x p A + x p A .
A new base point A p + 1 x p + 1 A , f p + 1 A can than be determined for the next iteration. In a single-variable case m = n = k = 1 with interpolation base points A p x p A , f p A and B p x p B , f p B , Equation (42) will have the form
q p A = f p A f p B f p A = f p A f p ,
and the new Secant approximate
x p + 1 A = x p A + x p q p A = x p A x p f p f p A = x p A f p B x p B f p A f p B f p A
can be determined according to Equation (46). The procedure then continues with new interpolation base points A p + 1 x p + 1 A , f p + 1 A and B p + 1 x p + 1 B , f p + 1 B .

4. T-Secant Method

4.1. Single-Variable Case

The T-Secant method is different from the traditional Secant method in the way that all interpolation base points A p and B p , k k = 1 , , n are updated in each iteration, providing n + 1 new base points A p + 1 and B p + 1 , k for the next iteration. The key idea of the method is very simple. The function value f p + 1 A (that can be determined from the new Secant approximate x p + 1 A ) measures the “distance” of the approximate x p + 1 A from the root x * (if f p + 1 A = 0 , then the distance is zero and x p + 1 A = x * ). The T-Secant method uses this information to determine another approximate x p + 1 B . In a single-variable case  m = n = k = 1 with interpolation base points A p and B p , the basic equation
f p q p A = f p A
of the Secant method (Equation (41) in multi-variable case) is modified by a factor
t p f = f p + 1 A f p A
that expresses the “improvement rate” of the new approximate x p + 1 A to the original approximate x p A , providing the T-Secant-modified basic equation
t p f f p q p B = f p A .
Then, the T-Secant multiplier
q p B = q p A t p f = f p A 2 f p + 1 A f p
can be determined. The other basic equation
x p A = x p q p A
of the Secant method (Equation (44) in multi-variable case) with iteration stepsize
x p A = x p + 1 A x p A
is also modified in a similar way to that in the case of Equations (49) and (51) by a factor
t p x = x p + 1 B x p + 1 A x p + 1 A x p A = x p + 1 x p A
that expresses the “improvement rate” of the new “T-Secant stepsize” x p + 1 to the previous “secant stepsize” x p A , providing a new basic equation
x p A = t p x x p q p B ,
from which
x p A = x p + 1 x p A x p q p B
and
x p + 1 A x p A = x p + 1 B x p + 1 A x p + 1 A x p A x p B x p A f p A 2 f p + 1 A f p B f p A .
By re-ordering Equation (58), the T-Secant approximate
x p + 1 B = x p + 1 A + x p A 2 x p q p B = x p + 1 A x p + 1 A x p A 2 f p B f p A f p + 1 A x p B x p A f p A 2
can be determined, and it is used to update the original interpolation base point B p to B p + 1 . The new iteration will then continue with new base points A p + 1 and B p + 1 . Note that it follows from Equations (52), (53), and (59) that
x p + 1 = x p + 1 B x p + 1 A = x p A 2 x p q p B = t p f x p A 2 x p q p A = t p f x p A .

4.2. Multi-Variable Case

In the multi-variable case m n > 1 with n + 1 interpolation base points A p x p A f p A and B p , k x p , k B f p , k B k = 1 , , n , the basic equations of the Secant method (Equations (41) and (44)) are modified as
T p F F p q p B = f p A
and
x p A = T p X X p q p B .
Then, a vector-based equation can be formulated, as in case of the traditional Secant method (see Equation (35)), in a the following form:
x x z x = x A f A + T X 0 0 T F X F q B ,
where X and F are defined in (36) and (37), z x is a function with zero at x p + 1 B , and the diagonal transformation matrix in the p th iteration is
T p = T p X 0 0 T p F ,
with T p X and T p F sub-diagonals, where
T p X = d i a g t p , i X = diag x p + 1 , i B x p + 1 , i A x p + 1 , i A x p , i A = diag x p + 1 , i x p , i A
T p F = diag t p , j F = d i a g f p + 1 , j B f p + 1 , j A f p + 1 , j A f p , j A ,
and T p F is approximated with the assumption f x y p x p + 1 A z p x p + 1 B and according to the conditions y p x p + 1 A = 0 and z p x p + 1 B = 0 as
T p F diag z p , j x p + 1 B f p + 1 , j A y p , j x p + 1 A f p , j A = d i a g f p + 1 , j A f p , j A
i = 1 , , n , j = 1 , , m , where f p , j A 0 . The vector of T-Secant multipliers
q p B = F p + T p F 1 f p A = j = 1 m F p , i , j + f p , j A t p , j F
can be determined from Equation (61), where . + stands for the pseudo-inverse ( F p + has already been calculated when q p A was determined from Equation (42)). The i th element of the new approximate x p + 1 B can be expressed from the i th row of Equation (62):
x p , i A = x p + 1 , i x p , i A x p , i q p , i B = x p + 1 , i B x p + 1 , i A x p , i A x p , i q p , i B ,
and the T-Secant approximate x p + 1 B can be expressed as
x p + 1 , i B = x p + 1 , i A + x p , i A 2 x p , i q p , i B = x p + 1 , i A x p , i A 2 j = 1 m x p , i F p , i , j + f p , j A t p , j F ,
where x p , i 0 and q p , i B 0   i = 1 , , n . Then, the next iteration continues with the new trial increment vector (iteration stepsize)
x p + 1 = x p + 1 B x p + 1 A
and with n + 1 new interpolation base points A p + 1 x p + 1 A f p + 1 A B k , p + 1 x k , p + 1 B f k , p + 1 B k = 1 , , n . Figure 1 shows the formulation of a set of new base vectors x k , p + 1 B from x p + 1 A and x p + 1 B in the n = 3 case.
Let the ratio μ i of constants q p , i A and q p , i B be introduced as
μ i = q p , i A q p , i B .
Then, it follows from Equations (42), (44), (68), (70), and (71) that the i th element of the new trial increment vector is
x p + 1 , i = x p , i A 2 x p , i q p , i B = μ i x p , i A 2 x p , i q p , i A = μ i x p , i A .
The basic equations in single-variable and multi-variable cases are summarized in Table 1.

5. Geometry

5.1. Single-Variable Case

The T-Secant procedure has been worked out for solving multi-variable problems. It can also be applied for solving single-variable ones, however. The geometrical representation of the latter gives a good view with which to explain the mechanism of the procedure.
Find the scalar root x * of a nonlinear function x f ( x ) , where x R 1 and f : R 1 R 1 . Let the function f ( x ) be linearly interpolated through initial base points A p x p A , f p A and B p x p B , f p B , providing a “secant” line y p ( x ) as shown on Figure 2, where f p A = f ( x p A ) and f p B = f ( x p B ) are the corresponding function values. An arbitrary point of the Secant y p ( x ) can be expressed as
x y p x = x p A f p A + x p f p q p A ,
where q p A is a scalar multiplier. Let a new approximate x p + 1 A be the root of the Secant y p ( x ) and let
x p A = x p + 1 A x p A
be the iteration stepsize. It follows from condition
y p ( x p + 1 A ) = 0
and from the second row of Equation (74) that
f p q p A = f p A ,
and the scalar multiplier can be determined as
q p A = f p A f p .
From the first row of Equation (74), the iteration stepsize is given as
x p A = x p q p A ,
and the new approximate can be expressed as
x p + 1 A = x p A + x p A .
A new base point A p + 1 x p + 1 A , f p + 1 A (see Figure 2) can then be determined for the next iteration. Two out of the three available base points A p B p A p + 1 are used for the next iteration by omitting either A p or B p in the case of the traditional secant method. The decision is not obvious and it may cause the iteration to unstable and/or not converge to the solution. Instead, an additional new approximate x p + 1 B is determined by the T-Secant procedure as a root of the function z p x near the first Secant approximate x p + 1 A , and iteration continues with new base points A p + 1 x p + 1 A , f p + 1 A and B p + 1 x p + 1 B , f p + 1 B . An arbitrary point of the function z p x can be expressed as
x x z p x = x p A f p A + t x 0 0 t f x p f p q p B ,
where the transformation scalars for x p and f p at x = x p B are
t p x = x p + 1 x p A = x p + 1 B x p + 1 A x p + 1 A x p A a n d t p f = f p + 1 A f p A .
Then, it follows from condition
z p ( x p + 1 B ) = 0
and from the second row of Equation (81) that
t p f f p q p B = f p A
and
q p B = f p A t p f f p = f p A 2 f p + 1 A f p B f p A .
The new approximate x p + 1 B can then be expressed from the first row of Equation (81) as
x p + 1 B = x p + 1 A + x p A 2 x p q p B .
The new base point B p + 1 x p + 1 B , f p + 1 B (see Figure 2) can then be determined. Interpolation base points A p + 1 and B p + 1 are used for the next iteration. The scalar multiplier q p B can be expressed from Equation (86) as
q p B = x p + 1 A x p A 2 x p B x p A x p + 1 B x p + 1 A .
By substituting it into the second row of Equation (81) and changing x p + 1 B to x , it turns into a hyperbolic function
z p ( x ) = a p x x p + 1 A + f p A
with vertical and horizontal asymptotes x p + 1 A and f p A , where
a p = x p + 1 A x p A 2 f p B f p A x p B x p A f p + 1 A f p A ,
and the root x p + 1 B of the function z p ( x ) will be in the vicinity of x p + 1 A in “appropriate distance”, which is regulated by the function value f p + 1 A (see Figure 2). This virtue of the T-Secant procedure is that it ensures an automatic mechanism for having base vectors in general positions through the whole iteration process, providing a stable and efficient numerical performance.

5.2. Multi-Variable Case

Find the root x *  of a nonlinear function x f ( x ) , where x R n and f : R n R m . Let the function f ( x ) be linearly interpolated through n + 1 base points A p x p A , f p A and B k , p x k , p B , f k , p B in the R n + m space ( f ( x ) space) in the p th iteration as shown on Figure 3, where k = 1 , , n , given a set of approximates x p A and
x k , p B = x p A + x k , p d k
in the R n space ( x space) with k = 1 , , n , where d k is the k t h Cartesian unit vector. Let the expression
F p q p A = f 1 , 1 , p f n , 1 , p f 1 , m , p f n , m , p q p A
represent the linear combination q p A = q p , k A T of n column vectors
f k , j , p = f k , p = f k , p B f k , p A
in the R m space ( f space) with k = 1 , , n column index, and with j = 1 , , m row index, and the expression
X p q p A = x 1 , 1 , p x n , 1 , p x 1 , n , p x n , n , p q p A
represents the same linear combination of n column vectors
x k , j , p = x p , k = x p , k B x p , k A ,
with k = 1 , , n column index, and with j = 1 , , n row index. The linear combination q p A is determined from Equation (42) in step S 1 (see Figure 3), providing a new approximate
x p + 1 A = x p + 1 , k A = x p A + x p A ,
for the solution x * , and the corresponding f p + 1 A vector is also determined in step S 2 (see Figure 3). The column vectors f k , p of F p are then modified by a non-uniform scaling transformation,
T p F = d i a g f p + 1 , j A f p , j A ,
and a new linear combination q p B = q p , k B T is determined from Equation (68) in step S 3 (see Figure 3), providing a new approximate x p + 1 B for the solution x * with elements
x p + 1 , k B = x p + 1 , k A + x p , k A 2 x p , k q p , k B .
A new set of n + 1 approximates x p + 1 A and
x k , p + 1 B = x p + 1 A + x k , p + 1 d k
( k = 1 , , n ) can then be generated with iteration stepsize
x p + 1 = x k , p + 1 = x p + 1 B x p + 1 A
for the next iteration.

5.3. Single-Variable Example

An example is given with function x f ( x ) , where x   R 1 f : R 1 R 1 and
f x = x 3 2 x 5
with root x * 2.0945514815423 . Figure 4 and Table 2 summarize the results of the first two iterations (left: x 1 A is the zero of y 0 ( x ) , x 1 B is the zero of z 0 ( x ) ; right: x 2 A is the zero of y 1 ( x ) , x 2 B is the zero of z 1 ( x ) ). Iterations were made with initial approximates x 0 A = 3.0 and x 0 B = 1.0 , providing f 0 A = 16 ( p = 0 ). The first Secant approximate x 1 A = 1.545 is found as the zero of the first Secant y 0 x , and the first T-Secant appropriate x 1 B = 1.945 is found as the zero of the first hyperbola function z 0 ( x ) (Figure 4, left). Iteration then goes on ( p = 1 ) with new interpolation base points x 1 A = 1.545 and x 1 B = 1.945 providing f 1 A = 4.3997 , and new approximates x 2 A = 2.158 and x 2 B = 2.0556 are found as the zeros of the second Secant and the second hyperbola function y 1 ( x ) and z 1 ( x ) , respectively (Figure 4, right). The next iteration ( p = 2 ) will then continue with interpolation base point x 2 A = 2.158 and x 2 B = 2.0556 and with f 2 A = 0.7367 . The iterated values of f p A x p A and x p B are also indicated in the diagrams. Further diagrams for this example are shown in Section 7.3.

6. General Formulations

Re-ordering Equation (44) gives the general equation
F X 1 x A = f A
of the Secant method. The initial trials are constructed according to Equation (32), providing that X is a diagonal matrix with elements x i = x i , i B x i A   i = 1 , , n . Let the Jacobean approximate of the secant-method be defined as
S = F X 1
S = f 1 , 1 B f 1 A x 1 , 1 B x 1 A f n , 1 B f 1 A x n , 1 B x 1 A f 1 , m B f m A x 1 , 1 B x 1 A f n , m B f m A x n , 1 B x 1 A = f 1 , 1 x 1 f n , 1 x n f 1 , m x 1 f n , m x n = f k , j x i
i = 1 , , n , j = 1 , , m , k = 1 , , n and
S + = X F + .
Then, Equation (100) simplifies as
S x A = f A
and
x A = S + f A .
The i th  element of the new approximate x p + 1 A in the p th iteration will then be
x p + 1 , i A = x p , i A + x p , i A = x p , i A j = 1 m S p , i , j + f p , j A
i = 1 , , n . It follows from the first row of Equation (63) of the T-Secant method, from Equation (61) and from the Definition (103) of S + , that the p th iteration stepsize is
x p A = T p x S p + T p F 1 f p A
and
T p F S p T p X 1 x p A = f p A .
Let the modified Jacobean approximate of the T-Secant method be defined as
S p , T = T p F S p T p X 1
S p , T = f p + 1 , 1 A f p , 1 A 0 0 0 0 0 0 f p + 1 , m A f p , m A f p , 1 , 1 x p , 1 f p , n , 1 x p , n f p , 1 , m x p , 1 f p , n , m x p , n x p , 1 A x p + 1 , 1 0 0 0 0 0 0 x p , n A x p + 1 , n
and in condensed form with general matrix elements (without the p index):
S T = T F S T X 1 = d i a g t j F f k , j x i d i a g 1 t i X = t j F t i X f k , j x i
i = 1 , , n , j = 1 , , m , k = 1 , , n and
S T + = T X S + T F 1 .
Equations (107) and (108) then can be re-written as
x A = S T + f A
and
S T x A = f A
in a similar form as in case of the traditional Secant method (Equations (105) and (104)). The i th element x p + 1 , i B of the second new approximate x p + 1 B in the p th iteration will then be
x p + 1 , i B = x p + 1 , i A + x p + 1 , i = x p + 1 , i A x p , i A 2 j = 1 m S p , i , j + f p , j A t p , j F ,
where t p , j F 0 , j = 1 , , m and i = 1 , , n . Note that the T-Secant modification of the Jacobean approximate (102) is made with multipliers
t p , j F = f p + 1 , j A f p , j A and t p , i x = x p + 1 , i x p , i A
to the difference quantities f p , k , j and x p , i . The basic equations of the Secant method and the T-Secant method are summarized in Table 3; rows 1–4 are the elements (matrix  T ) of the basic equations, rows 5–6 are the explicit basic equations, row 7 depicts the Jacobean-type matrices, and rows 8–9 are the general formulations of the basic equations.

7. Convergence

7.1. Single-Variable Case

As was shown in Section 4 (Equation (60)), the p th iteration stepsize of the second new approximate x p + 1 B is
x p + 1 = t p f x p A .
The Secant method is super-linear convergent, so the new approximate x p + 1 A is expected to be a much better approximate to the solution x * then the previous one ( x p A ). Thus,
f p + 1 A f p A
and
t p f = f p + 1 A f p A 1
is expected to be a “small positive number”. It means that the T-Secant approximate x p + 1 B will always be in the vicinity of the classic Secant approximate x p + 1 A , and the approximate errors of the new approximates will be of a similar order, providing that the solution x * will be evenly surrounded by the two new trial approximates x p + 1 A and x p + 1 B .

7.2. Convergence Rate

It is well known that the single-variable Secant method has asymptotic convergence for sufficiently good initial approximates x A and x B if f x does not vanish in x x A x B and f x is continuous at least in a neighborhood of the zero x * . The super-linear convergence property has been proved in different waysm and it is known that the order of convergence α S = 1 + 5 / 2 with asymptotic error constant
C = 1 2 f ξ f ξ 1 α .
The order of convergence of the T-Secant method is determined in this section. Let p be the iteration counter and the approximate error be defined in the p th iteration as
e p = x p x * .
It follows from Equation (48) and from Definition (121) that the error e p + 1 A of the new Secant approximate x p + 1 A can be expressed as
e p + 1 A = e p A f p B e p B f p A f p B f p A = x p B x p A f p B f p A f p B / e p B f p A / e p A x p B x p A e p A e p B .
It follows from the mean value theorem that the first factor of the right side of Equation (122) can be replaced with 1 / f η p , where η p x p A , x p B , if f x is continuously differentiable on x p A , x p B and f η p 0 . Let the function f x be approximated around the root x * by a second order Taylor series expansion as
f p = f e p + x * = f x * + e p f x * + 1 2 e p 2 f ξ p ,
where ξ p x p A , x p B , x * in the remainder term. Since f x * = 0 , it follows from Equation (123) that
f p e p = f x * + 1 2 f ξ p e p .
Substituting this expression to Equation (122), and since e p B e p A = x p B x p A , we obtain
e p + 1 A = 1 2 f ξ p f η p e p B e p A x p B x p A e p A e p B = C p e p A e p B
and
C p = 1 2 f ξ p f η p .
If the series x p A converges to x * , then ξ p and η p x * with increasing iteration counter p , and
C p 1 2 f x * f x * = c o n s t a n t .
It follows from Equation (59) with Definition (121) and from the mean value theorem (with η p 1 x p 1 A , x p 1 B , if f x is continuously differentiable on x p 1 A , x p 1 B ), that
x p B = x p A x p A x p 1 A f p 1 A 2 f η p 1 f p A ,
and the error e p B of the T-Secant approximate x p B can be expressed as
e p B = e p A e p A e p 1 A f p 1 A 2 f η p 1 f p A .
With the Taylor-series expansion (123) for f p 1 A and f p A , where ξ p 1 x p 1 A , x p 1 B , x * and ξ p x p A , x p B , x * in the remainder term, we obtain
e p B = e p A e p A e p A e p 1 A e p 1 A 2 γ p ,
where
γ p = f x * f η p 1 + 1 2 f ξ p f η p 1 e p A f x * f η p 1 + 1 2 f ξ p 1 f η p 1 e p 1 A 2
and f η p 1 0 . If the series x p A converges to x * , then, with increasing iteration counter p , ξ p , ξ p 1 , η p 1 x * , and e p A , e p 1 A 0 , it implies that
f x * f η p 1 f x * f x * = 1
and γ p 1 . Substituting e p B (Equation (130)) into Equation (125) gives
e p + 1 A = C p e p A e p A e p A e p A e p 1 A e p 1 A 2 γ p ,
and re-arranging
e p + 1 A = C p e p A γ p e p A 2 2 e p 1 A e p A e p 1 A 2 + 1 γ p e p A
with x p A converges to x * , γ p 1 , and the above equation simplifies as
e p + 1 A = C p e p A e p A 2 2 e p 1 A e p A e p 1 A 2 .
This means that e p + 1 A depends on e p A and e p 1 A , and by assuming an asymptotic convergence, a power law relationship
e p + 1 A = C e p A α
can be established, where C is the asymptotic error constant and α is the convergence rate, also called the “convergence order” of the iterative method. It also follows from Equation (136), that
e p A = C e p 1 A α
and
e p 1 A = e p A C 1 α .
Let E = e p A be introduced for simplifying purposes; then, it follows from Equations (133), (136)–(138) that
E α = C p C E 3 2 E C 1 α E E C 2 α ,
where C p and C are constants and, if the series x p A converges to x * , with increasing iteration, counter p , E 0 + . Taking the logarithms of both sides of Equation (139) and dividing by lnE gives
α = ln C p C ln E + 3 2 α · ln E C ln E + ln ( 2 E C 1 α E ) ln E .
If x p A series converges to x * , then, with increasing iteration, they counter p , E 0 + , ln E and
lim E 0 + ln C p C ln E = 0
lim E 0 + ln E C = lim E 0 + ln E ln C = ln E
lim E 0 + ln 2 E 1 α E ln E = 1 α ,
and Equation (140) simplifies as
α 3 + 1 α = 0 ,
with root (convergence rate of the T-Secant method):
α T S = 3 + 5 2 2.618033988 = α S + 1 = φ 2 ,
where α S = φ 1.618033988 is the convergence rate of the traditional Secant method, and φ is the well-known golden ratio. It follows from Equation (140) that the actual values of α * of α T S depend on the approximate error E = e A . Convergence rates α * ( E ) were determined for different E values and are shown in Figure 5. The upper bound α T S = α S + 1 = 2.618 at E 0 + is also indicated (horizontal dashed red line).

7.3. Single-Variable Example

An example is given for demonstration purposes with a single-variable test function (99) with root x * 2.09455 . Iterations were made with initial approximates x 0 A = 3.5 and x 0 B = 2.5 , and the convergence rates α S , α N , and α T S were determined for the traditional Secant method (Table 4, Figure 6), for the Newton–Raphson method (Table 5, Figure 7), and for the T-Secant method (Table 6, Figure 8), respectively. The cumulative number of function values ( N f ) and derivative function values ( N f ) calculations are also indicated in the tables. Calculated convergence rates agree well with theoretical values α S = 1.62 , α N = 2.0 and α T S = 2.62 . Figure 9 summarizes the results of iterations with three different methods (Secant, Newton–Raphson, and T-Secant). Two groups of graphs show the absolute approximate error e p A decrease and the calculated convergence rates α for the three compared methods. Results demonstrate that the convergence rate of the T-Secant method is higher than the convergence rate of the Newton–Raphson method.

7.4. Multi-Variable Convergence

Matrix S (see Equation (102)) corresponds to a divided difference approximation of the Jacobian. It is known (e.g., from Dennis-Schnabel [13]) that these values give a second-order approximation of the derivative in the midpoint. When considering Newton’s iteration, it is assumed that the Jacobian has inverted in a neighborhood of x * . If that condition holds, then there is a chance that the approximate Jacobian has also inverted in the same neighborhood.
It is known that the Secant method is locally q-super-linear convergent, so the new approximate x p + 1 A is expected to be a much better approximate to the solution x * then the previous approximate x p A . Thus,
f p + 1 A f p A
and the diagonal elements
t p , j F = f p + 1 , j A f p , j A 1
of the transformation matrix T p F j = 1 , m are expected to be “small numbers”. It follows from Equations (68), (70), (73), and (147) that
μ i = j = 1 m S p , i , j + f p , j A j = 1 m S p , i , j + f p , j A t p , j F 1
i = 1 , n and
x p + 1 μ x p A x p A
(see Figure 10). This means that the T-Secant approximate x p + 1 B will always be in the vicinity of the classic Secant approximate x p + 1 A and the approximate errors of the new approximates will be of similar order, providing that the solution x * will be evenly surrounded by the n + 1 new trial approximates x p + 1 A and x k , p + 1 B   k = 1 , n , and that matrix S p + 1 will be well-conditioned.

8. Algorithm

Let p be the iteration counter, ε * be the error bound for termination criterion, and
e p A = x p A x *
be the approximate error vector of approximate x A in the p th iteration with elements e p , i A   i = 1 , , n . Let the scalar approximate error
ε p = e p A 2 n = i = 1 n e p , i A 2 n
be defined, where . 2 is the Euclidean norm, and let the iteration be terminated when
ε p < ε *
holds. Let x p A be the initial trial and x p be the trial increment (iteration stepsize) in the p t h iteration. Choose T min and T max as lower and upper bounds for t p , j F   j = 1 , m and let f min and q min be lower bounds for f p , j A   j = 1 , , m and q p , i B i = 1 , , n , respectively.
  • Initial step
    Let p = 0 and let the initial trial x p A = x p , 1 A x p , n A and the initial trial increment x p = x p , 1 x p , n be given. Calculate the corresponding function values f p A and assume that f min < f p , j A ( j = 1 , m ) .
  • Step 1: Generate a set of n additional initial trials (interpolation base points)
    x p , k B = x p A + x p , k · d k
    and evaluate function values f p , k B   k = 1 , , n .
  • Step 2 (Secant): Construct matrix
    F p = f p , k i = f p , k B f p A
    then calculate q p A from Equation (42). Let q min < q p , i A , and determine x p + 1 A from Equation (46) and ε p from Equation (151).
  • Step 3: If ε p < ε * , then terminate iteration; otherwise, continue with Step 4.
  • Step 4 (T-Secant): Calculate f p + 1 A and T p F  from Equation (67). Let T min < t p , j F < T max and determine q p B from Equation (68) ( F p + has already been calculated when q p A was determined from Equation (42)). Let q min < q p , i B . Calculate x p + 1 B from Equation (70).
  • Step 5: Let the new initial trial be
    x p + 1 A f p + 1 A
    and the new initial trial increment (iteration stepsize) be
    x p + 1 = x p + 1 B x p + 1 A ,
    and continue iteration with Step 1.
Iteration constants δ m i n f m i n q m i n T m i n T m a x are necessary in order to avoid division by zero and to avoid computed values being near numerical precision. If p max is the number of necessary iterations for satisfying the termination criterion ε p < ε * , and n is the number of unknowns to be determined, then the T-Secant method needs n + 1 function evaluations in each iteration, as well as
N f = p max n + 1
function evaluations to reach the desired termination criterion. p max depends on many circumstances, such as the nature of the function f x , termination criteria ( ε * or others), and the distance of the initial trial x A from the solution x * and from the iteration constants T min , q min A , .

9. Numerical Tests Results

9.1. Rosenbrock Test Function

A variant of the Rosenbrock function [24] has been used to test the numerical performance of the suggested method. The global minimum of the function
R x = i = 1 N 1 100 · x i + 1 x i 2 2 + 1 x i 2
has to be determined, where x = x 1 x N R N and N 2 . R x has exactly one minimum for N = 3 at x * = 1 1 1 and exactly two minima for 4 N 7 , i.e., a global minimum of all and a local minimum near x ^ = 1 1 1 . The sum of squares R x will be minimum when all terms are zero, such that the minimization of the function R x is equivalent to finding the zero of a function x f ( x ) , where x R N , f : R N R 2 ( N 1 ) , and
f ( x ) = f 2 i 1 ( x ) f 2 i ( x ) = 10 · ( x i + 1 x i 2 ) 1 x i
i = 1 , , N 1 . For N > 7 , the function R ( x ) has exactly one global minimum and has some local minima with some x j * = 1 , with x i * = 1 for all other unknowns. The results were obtained by least squares solving of the simultaneous system of nonlinear equations f ( x ) = 0 via the T-Secant method.

9.2. N = 2 , N = 3 and N = 10 Examples

In case the case of N = n = m = 2 , the iterations terminated after N f = 6 function evaluations ( p max = 2 iterations) in most cases. f 2 x = 1 x 1 is a linear function, and the first T-Secant iteration p = 0 finds the exact value of x 1 in one step; then, f 1 x = 10 x 2 x 1 2 also becomes linear. The exact value of x 2 was then determined in one additional step.
Let N = n = 3 and m = 4 , T min = 0.01 and ε * = 10 14 . Let p = 0 , and
x 0 , i = 0.05 · x 0 , i A
( i = 1 , 3 ) . The number of necessary function evaluations N f varied between 20 and 36 within p max = 5 9 iterations for different initial trials x 0 A . Iteration results are summarized in Table 7 and in Figure 11 with initial trial x 0 A = x 0 , i A = 2.0 1.5 2.5 . Termination criterion ε p < ε * was satisfied after p max = 5 iterations with N f = 20 function evaluations.
Let N = n = 10 and m = 18 . Calculations were made with different, manually constructed initial trials x 0 A = x 0 , i A . Figure 12 (Left) shows the variation of x p , i A for initial trial x 0 A = (2.0 −1.5 −2.5 1.5 −1.2 3.0 −3.5 2.5 −2.0 3.5). Iterations terminated after N f = 154 function evaluations ( p max = 14 iterations) for the ε p < ε * = 10 14 condition. Table 8 shows a set of further initial trials for numerical tests. Test “3” failed, probably due to the large distance from the global optimal solution. Test “4” found a local zero x * = 1 1 1 1 1 1 1 1 1 1 . Figure 12 (Right) summarizes the results of numerical tests “1–6”. The graphs show the iteration paths in the l g e p A R p x p A plane. The graphs have an initial part, where the variation of R p x p A seems “chaotic”, while below e p A 0.01 and R p x p A 0.001 , the iterations run on similar paths.

9.3. Large N 200 500 1000 Examples

A series of numerical tests has been performed with a large number of unknown variables. The values of the initial trials x 0 A = x 0 , i A , i = 1 , N were generated as
x 0 , i A = x i * + L 1 · R a n d o m 1 2 5 + L 2 ,
where “ Random ” is a random real number ( 0 Random < 1 ), and L i i = 1 , 2 are parameters regulating the size and location of the interval in which the initial trial values are expected to vary. x * = x i * = 1 1 i = 1 , N is the known global optimal solution. Table 9 shows the results of T-Secant iterations with N = 200 and with initial trials x 0 A :   0.1 x 0 , i A 19.9 ( L 1 = 99 , L 2 = 9 ). Figure 13 (Left) shows the variation of variables x p A through T-Secant iterations. The iteration counter p value is indicated below the graphs. Figure 13 (Right) shows the decrease in the approximate error e p A = e p , i A i = 1 , , 200 , with the p iteration counter indication below the graphs. Table 10 shows the results of iterations with N = 1000 and initial trials x 0 A : 0.5 x 0 , i A 1.5 ( L 1 = 5 , L 2 = 0 ). Figure 14 summarizes the results of numerical tests with a large number of unknowns N = 200 500 1000 . The norm ε p of the approximate error e p A decrease is shown, and the number of function value evaluations N f is indicated for N = 200 b l u e 500 r e d 1000 g r e e n and for initial trials x 0 A : 0.5 x 0 , i A 1.5 (solid line) and x 0 A : 0.1 x 0 , i A 19.9 (dashed line).

10. Efficiency

10.1. Single-Variable Case

The efficiency of an algorithm relates to the amount of computational resources used by the algorithm. For better efficiency, it is desirable to minimize resource usage. An algorithm is considered efficient if its resource consumption (computational cost) is below some acceptable level (it runs in a reasonable amount of time or space on an available computer). The efficiency of an algorithm for the solution of nonlinear equations is thoroughly discussed by Traub [25] as follows. Let p be the order of the iteration sequence such that for the approximate errors e i = x i x * , there exists a nonzero constant C (asymptotic error constant) for which
e i + 1 e i p C .
A natural measure of the information used by an algorithm is the “informational usage” d , which is defined as the number of new pieces of information (values of the function and its derivatives) required per iteration (called “horner” by Ostrowski [26]). Then, the efficiency of the algorithm within one iteration can be measured by the “informational efficiency”:
E F F = p d .
An alternative definition of efficiency is
EFF * = p 1 d ,
called the “efficiency index” by Ostrowski [26]. Another measure of efficiency, called “computational efficiency”, takes into account the “cost” of calculating different derivatives. The concept of informational efficiency ( EFF ) and the efficiency index ( EFF * ) do not take into account the cost of evaluating f and its derivatives, nor do they take into account the total number of pieces of information needed to achieve a certain accuracy in the root of the function. If f is composed of elementary functions, then the derivatives are also composed of elementary functions; thus, the cost of evaluating the derivatives is merely the cost of combining the elementary functions. Table 11 compares the efficiencies of classic (secant, Newton) and improved algorithms (T-Secant, T-Newton).

10.2. Multi-Variable Case

Very limited data are available to compare the performance of the T-Secant method with other methods, especially in cases with a large number of unknowns. Broyden [27] suggested the mean convergence rate
L = 1 N f ln R x 0 A R x p m a x A
as a measure of efficiency of an algorithm for solving a particular problem, where N f is the total number of function evaluations, x 0 A is the initial trial, and x p max A is the last trial for the solution x * when the termination criteria is satisfied after p max iterations. R x is the Euclidean norm of f x . Efficiency results were given by Broyden [27] for the Rosenbrock function for N = 2 and for x 0 A = 1.2 1.0 . The calculated convergence rates for the two Broyden method variants [27], for the Powell’s method [28], for the adaptive coordinate descent method [29] m and for the Nelder–Mead simplex method [30] were compared with the calculated values for the T-Secant method in Table 12. Rows 1–5 are data from referenced papers, rows 6–8 are T-Secant results with the referenced initial trials, and rows 9–15 are calculated data for N > 2 .
Results show that the mean convergence rate L (Equation (165)) for N = 2 is much higher for the T-Secant method ( 5.5 6.9 ) than for the other listed methods ( 0.1 0.6 ); however, it is obvious that the mean convergence rate values decrease rapidly with increasing N values (more unknowns need more function evaluations). A modified convergence rate
L N = N L = N N f ln R x 0 A R x p max A
can be used as an N independent measure of efficiency (see Table 12). The values of L and L N are at least 10 times larger for the T-Secant method than for the referenced classic methods for N = 2 (see Table 12). Note that the efficiency measures ( L and L N ) are also dependant on the initial conditions (distance of the initial trial set from the optimal solution, termination criterion). Results from a large number of numerical tests indicate an average L N value of around 7.4 , with standard deviation 3.7 for the T-Secant method even for large N values. It has to be noted that if the value of R x p max A is zero, then the mean convergence rates ( L and L N ) are not countable (zero in the denominator). A substitute value 10 25 was used when iterations ended with R x p max A = 0 in the sample examples.

11. Discussions

11.1. General

The suggested procedure needs the usual approximate x p + 1 A to be determined by any of a classic quasi-Newton iterative methods (Wolfe–Popper–Secant, Broyden, etc.). By using the “information” f p + 1 A , an additional and independent approximate x p + 1 B is determined, which provides the possibility for a full-rank update of the approximate derivatives ( S p for Secant or B p for Broyden). Results and experience show that the suggested procedure considerably accelerates the convergence and the efficiency of the classic methods, and the full-rank update technique increases the stability of the iterative procedure. In multi-variable-case, it follows from Equation (107) that
T p X 1 x p A = S p + T p F 1 f p A ,
and in explicit form after re-arrangement:
x p , i A 2 x p + 1 , i B x p + 1 , i A = S p , i , j + f p , j A t p , j F .
Then, the i th element of the new approximate x p + 1 B can be expressed from the i th row of the above equation as
x p + 1 , i B = x p + 1 , i A x p , i A 2 j = 1 m S p , i , j + f p , j A t p , j F .
The mechanism of the procedure resembles to the mechanism of an engine’s turbocharger that is powered by the flow of exhaust gases (analogous to f p + 1 A or t p , j F ).

11.2. Newton Method

Matrix S in the general formula (104) gives a direct connection between the Secant and Newton methods, as differences go to differentials,
S = f k , j x i = S i , j J = f k , j x i = J i , j
where J is the Jacobian matrix of the function f : R n R m ( m n ) with k and i column and with j row indexes, respectively. It follows from formula (111) of matrix S T that the suggested full-rank update procedure can also be applied to the Newton method as
S T = t j F t i X f k , j x i J T = t j F t i x f k , j x i ,
where J T is the modified Jacobian matrix of the “T-Newton” method. In the single-variable case, with approximate x p A in the p th iteration, with function value f p A = f x p A and with derivative function value f p A = f x p A , the new Newton–Raphson approximate can be expressed as
x p + 1 A = x p A x p f p f p A = x p A f p A f p A
and the iteration stepsize is
x p A = x p + 1 A x p A
with the hyperbolic function (Equation (88))
z p ( x ) = a p x x p + 1 A + f p A .
where
a p = x p + 1 A x p A 2 f p A f p + 1 A f p A
( f p / x p is replaced by f p A ), the new “T-Newton” approximate is
x p + 1 B = x p + 1 A x p A 2 f p A f p + 1 A f p A 2
( f p / x p is again replaced by f p A ), similar to Equation (59) in case of the T-secant method. It can be seen from Table 13 and Table 14 that the convergence rate is be improved from α N = 2 to α T N = 3 . In the multi-variable case, it follows from Equation (107) ( S p + is replaced by J p + ) that
T p X 1 x p A = J p + T p F 1 f p A
and in explicit form after re-arrangement:
x p , i A 2 x p + 1 , i B x p + 1 , i A = J p , i , j + f p , j A t p , j F .
Then, the i th element of the new “T-Newton” approximate x p + 1 B can be expressed from the i th row of the above equation as
x p + 1 , i B = x p + 1 , i A x p , i A 2 j = 1 m J p , i , j + f p , j A t p , j F ,
similar to Equation (70) in case of the T-Secant method. Thus, the “hyperbolic” approximation accelerates the convergence of the Newton=-Raphson method by only one additional function evaluation.
Table 14. T-Newton method iteration and computed convergence rate, α T N (see Figure 15).
Table 14. T-Newton method iteration and computed convergence rate, α T N (see Figure 15).
p x p A x p + 1 B e p + 1 B α TN N f N f
04.52.830 7.4 × 10 1 21
12.8302.17760 8.3 × 10 2 42
22.177602.09486 3.1 × 10 4 1.8463
32.094862.09455148 1.9 × 10 11 2.5684
42.094551482.09455148154233 3.6 × 10 15 2.9795

11.3. Broyden’s Method

Broyden’s method is a special case of the Secant method. In the single-variable case, the derivative of the function is approximated as
f p B p = B p 1 + f p B p 1 x p x p 2 x p
in the p th iteration step, and with
x p x p 2 = 1 x p
it is simplified as
B p = B p 1 + f p B p 1 x p x p .
The next Broyden-approximate is then determined as
x p + 1 A = x p A f p A B p .
The convergence can similarly be improved by the new hyperbolic approximation procedure as in cases of the Secant and Newton methods. An additional new approximate
x p + 1 B = x p + 1 A x p A 2 B p f p + 1 A f p A 2
can be determined, and the iteration continues with this value. Figure 16 demonstrates the effect of the hyperbolic approximation applied to the classic Broyden method. Not surprisingly, the convergence rate will be improved from α B = φ 1.618 to α T B = φ 2 2.618 , as in case of the Secant method. In the multi-variable case, the i th element of the new the “T-Broyden” approximate x p + 1 B can be expressed as
x p + 1 , i B = x p + 1 , i A x p , i A 2 j = 1 m B p , i , j + f p , j A t p , j F ,
similar to Equation (179) for the T-Newton method with J p , i , j + replaced by B p , i , j + . The new approximate B p + 1 to the Jacobian matrix can then be fully updated in a similar way as it was in the case of the T-Secant method.

12. Conclusions

A completely new iteration strategy has been worked out for solving simultaneous nonlinear equations:
f x = 0 ,
x R n , and f : R n R m ( m n ). It replaces the Jacobian matrix with finite-difference approximations. The stepsize x p + 1 was determined as the difference between two new approximates
x p + 1 A = x p A + x p A
and x p + 1 B with elements
x p + 1 , i B = x p + 1 , i A + x p + 1 , i = x p + 1 , i A x p , i A 2 j = 1 m S p , i , j + f p , j A t p , j F
i = 1 , , n as
x p + 1 = x p + 1 B x p + 1 A .
The first one is a classic quasi-Newton approximate with stepsize x p A , while the second one was determined from a hyperbolic approximation governed by x p + 1 A and f p + 1 A , such that the classic Secant equation
S x A = f A
was modified by a non-uniform scaling transformation
T = T X 0 0 T F
with diagonal elements t j F j = 1 , , m , t i X i = 1 , , n as
S T x A = f A ,
where
S = f k , j x i and S T = t j F t i X f k , j x i
k = 1 , , n . It was shown that the new stepsize x p + 1 is much smaller than the stepsize x p A of the classic quasi-Newton approximate, providing that x p + 1 B will always be in the vicinity of x p + 1 A . Having two new approximates, a set of n + 1 new independent trial approximates x p + 1 A and x k , p + 1 B   k = 1 , , n was constructed (see Equation (32)), providing that the new trial approximates are always in general positions, ensuring the stable behavior of the iteration. According to the geometrical representation in the single-variable case, the suggested procedure corresponds to finding the root of a hyperbolic function with vertical and horizontal asymptotes x p + 1 A and f p A . It was shown in Section 7 that the suggested method has super-quadratic convergence with a rate of α T S = φ 2 = 2.618 (where φ = 1.618 is the well-known golden ratio) in the single-variable case.
The suggested method needs two function evaluations in each iteration in single-variable cases and n + 1 evaluations in multi-variable cases. The efficiency of the proposed method was studied in Section 10 in the multi-variable case and compared with other classic low-rank-update and line-search methods on the basis of available data. The results show that the efficiency of the suggested full-rank-update procedure is considerably better than the efficiency of the other referenced methods. A Rosenbrock test function (Equations (158) and (159)) with up to n = 1000 variables was used to demonstrate this efficiency in Section 9.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

A considerable part of the research work was conducted between the years 1988 and 1992 at the Technical University of Budapest (Hungary), at the TNO-BOUW Structural Division (The Netherlands), and at the Technical High-school of Lulea (Sweden). This work has been sponsored by the Technical University of Budapest (Hungary), by the Hungarian Academy of Sciences (Hungary), by TNO-BOUW (The Netherlands), by Sandvik Rock Tools (Sweden), by CP Test a/s (Denmark), and by Óbuda University (Hungary). The valuable discussions and personal support from Géza Petrasovits, György Popper, Peter Middendorp, Rikard Skov, Bengt Lundberg, Mario Martinez, and Csaba Hegedűs are greatly appreciated.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Rüth, B.; Uekermann, B.; Mehl, M.; Birken, P.; Monge, A.; Bungartz, H.-J. Quasi-Newton waveform iteration for partitioned surface-coupled multiphysics applications. Int. J. Numer. Methods Eng. 2021, 122, 5236–5257. [Google Scholar] [CrossRef]
  2. Barnafi, N.A.; Pavarino, L.F.; Scacchi, S. Parallel inexact Newton-Krylov and quasi-Newton solvers for nonlinear elasticity. Comput. Methods Appl. Mech. Engrg. 2022, 400, 115557. [Google Scholar] [CrossRef]
  3. Ryu, J.; Jae, M. A quantification methodology of Seismic Probabilistic Safety Assessment for nuclear power plant. Ann. Nucl. Energy 2021, 159, 108296. [Google Scholar] [CrossRef]
  4. Yahaya, M.M.; Kumam, P.; Awwal, A.M.; Aji, S. A structured quasi-Newton algorithm with nonmonotone search strategy for structured NLS problems and its application in robotic motion control. J. Comput. Appl. Math. 2021, 395, 113582. [Google Scholar] [CrossRef]
  5. Schröter, M.; Sauer, O. Quasi-Newton Algorithms for Medical Image Registration. In Proceedings of the World Congress on Medical Physics and Biomedical Engineering, Munich, Germany, 7–12 September 2009; Dössel, O., Schlegel, W.C., Eds.; IFMBE Proceedings. Springer: Berlin/Heidelberg, Germany, 2019; Volume 25/4. [Google Scholar] [CrossRef]
  6. Ludwig, A. The Gauss–Seidel–quasi-Newton method: A hybrid algorithm for solving dynamic economic models. J. Econ. Dyn. Control. 2007, 31, 1610–1632. [Google Scholar] [CrossRef]
  7. Wülfingen, G.B. On some advantages of the application of Newton’s method for the solution of nonlinear economic models. In Proceedings of the IFAC Dynamic Modelling, Warsaw, Poland, 16–19 June 1980; pp. 339–347. [Google Scholar]
  8. Schaefer, B.; Ghasemi, S.A.; Roy, S.; Goedecker, S. Stabilized quasi-Newton optimization of noisy potential energy surfaces. J. Chem. Phys. 2015, 142, 034112. [Google Scholar] [CrossRef] [PubMed]
  9. Kemeny, J.G.; Snell, J.L. Mathematical Models in the Social Sciences. Introduction to Higher Mathematics; Blaisdell Publishing Company, A Division of Ginn and Company: New York, NY, USA; Toronto, ON, Canada; London, UK, 1963; Volume VII, 145p. [Google Scholar]
  10. Beregi, S.; Barton, D.A.W.; Rezgui, D.; Nield, S.A. Real-Time Hybrid Testing Using Iterative Control for Periodic Oscillations, arXive. Available online: https://arxiv.org/abs/2312.06362 (accessed on 15 December 2023).
  11. Barnafi, N.A.; Pavarino, L.F.; Scacchi, S. Parallel inexact Newton-Krylov and Quasi-Newton Solvers for Nonlinear Elasticity, arXive. Available online: https://arxiv.org/abs/2203.05610 (accessed on 15 December 2023).
  12. Ortega, J.M.; Rheinboldt, W.C. Iterative Solution of Nonlinear Equations in Several Variables; Academic Press: New York, NY, USA, 1970. [Google Scholar]
  13. Dennis, J.E., Jr.; Schnabel, R.B. Numerical Methods for Unconstrained Optimization and Nonlinear Equations; Prentice-Hall: Englewood Cliffs, NJ, USA, 1983. [Google Scholar]
  14. Stoer, J.; Bulirsch, R. Introduction to Numerical Analysis, 3rd ed.; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
  15. Martinez, J.M.; Qi, L. Inexact Newton methods for solving nonsmooth equations. J. Comput. Appl. Math. 1995, 60, 127–145. [Google Scholar] [CrossRef]
  16. Dembo, R.S.; Eisenstat, S.C.; Steihaug, T. Inexact Newton methods. SIAM J. Numer. Anal. 1971, 19, 400–408. [Google Scholar] [CrossRef]
  17. Birgin, E.G.; Krejic, N.; Martinez, J.M. Globally convergent inexact quasi-Newton methods for solving nonlinear systems. Num. Algorithms. 2003, 32, 249–260. [Google Scholar] [CrossRef]
  18. Martínez, J.M. Practical quasi-Newton methods for solving nonlinear systems. J. Comput. Applied Math. 2000, 124, 97–121. [Google Scholar] [CrossRef]
  19. Wolfe, P. The Secant Method for Simultaneous Nonlinear Equations. Commun. ACM 1959, 2, 12–13. [Google Scholar] [CrossRef]
  20. Popper, G. Numerical method for least square solving of nonlinear equations. Period. Polytech. 1985, 29, 67–69. [Google Scholar]
  21. Berzi, P. Model investigation for pile bearing capacity prediction. In Proceedings of the Euromech (280) Symposium on Identification of Nonlinear Mechanical Systems from Dynamic Tests, Ecully, France, 29–31 October 1991. [Google Scholar]
  22. Berzi, P. Pile-Soil Interaction due to Static and Dynamic Load. In Proceedings of the XIII ICSMFE, New Delhi, India, 5–10 January 1994; pp. 609–612. [Google Scholar]
  23. Berzi, P.; Beccu, R.; Lundberg, B. Identification of a percussive drill rod joint from its response to stress wave loading. Int. J. Impact Eng. 1994, 18, 281–290. [Google Scholar] [CrossRef]
  24. Rosenbrock, H.H. An automatic Method for finding the Greatest or Least Value of a Function. Comput. J. 1960, 3, 175–184. [Google Scholar] [CrossRef]
  25. Traub, J.F. Iterative Methods for the Solution of Equations, 1st ed.; Prentice-Hall, Inc.: Englewood Cliffs, NJ, USA, 1964. [Google Scholar]
  26. Ostrowski, A.M. Solution of Equations and Systems of Equations; Academic Press: New York, NY, USA, 1966. [Google Scholar]
  27. Broyden, C.G. A class of Methods for Solving Nonlinear Simultaneous Equations. Math. Comput. Am. Math. 1965, 19, 577–593. [Google Scholar] [CrossRef]
  28. Powell, M.J.D. An efficient method for finding the minimum of a function of several variables without calculating derivatives. Comput. J. 1964, 7, 155–162. [Google Scholar] [CrossRef]
  29. Loshchilov, I.; Schoenauer, M.; Sebag, M. Adaptive Coordinate Descent. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), Dublin, Ireland, 12–16 July 2011; ACM Press: New York, NY, USA, 2011; pp. 885–892. [Google Scholar]
  30. Nelder, J.A.; Mead, R. A simplex method for function minimization. Comput. J. 1965, 7, 308–313. [Google Scholar] [CrossRef]
Figure 1. Formulation of a new set of base vectors n = 3 : x A , x 1 B , x 2 B , x 3 B and interpolation base points A , B 1 , B 2 and B 3 from new approximate x A and from new trial increment (iteration stepsize) x = x B x A = x 1 x 2 x 3 T .
Figure 1. Formulation of a new set of base vectors n = 3 : x A , x 1 B , x 2 B , x 3 B and interpolation base points A , B 1 , B 2 and B 3 from new approximate x A and from new trial increment (iteration stepsize) x = x B x A = x 1 x 2 x 3 T .
Appliedmath 04 00008 g001
Figure 2. Geometrical representation of the Secant method in asingle-variable case: (A) classic Secant method; (B) T-Secant modification.
Figure 2. Geometrical representation of the Secant method in asingle-variable case: (A) classic Secant method; (B) T-Secant modification.
Appliedmath 04 00008 g002
Figure 3. Vector space description of the T-Secant method in the multi-variable case ( k = 1 , n ).
Figure 3. Vector space description of the T-Secant method in the multi-variable case ( k = 1 , n ).
Appliedmath 04 00008 g003
Figure 4. T-secant iterations with test function (99) with initial approximates x 0 A = 3.0 and x 0 B = 1.0 ( Left : x 1 A is the root of y 0 ( x ) , x 1 B is the root of z 0 ( x ) ; Right : x 2 A is the root of y 1 ( x ) , x 2 B is the root of z 1 ( x ) ) (see also Table 2).
Figure 4. T-secant iterations with test function (99) with initial approximates x 0 A = 3.0 and x 0 B = 1.0 ( Left : x 1 A is the root of y 0 ( x ) , x 1 B is the root of z 0 ( x ) ; Right : x 2 A is the root of y 1 ( x ) , x 2 B is the root of z 1 ( x ) ) (see also Table 2).
Appliedmath 04 00008 g004
Figure 5. α * convergence rate variation with decreasing E 0 + (dashed red lines indicate α = α S + 1 2.618 level, where α S 1.618 is the convergence rate of the traditional Secant method).
Figure 5. α * convergence rate variation with decreasing E 0 + (dashed red lines indicate α = α S + 1 2.618 level, where α S 1.618 is the convergence rate of the traditional Secant method).
Appliedmath 04 00008 g005
Figure 6. Secant iteration with test function (99) with initial approximates x 0 A = 3.5 and x 0 B = 2.5 ( Left : p = 0 , 1 , 2; Right : p = 2 , 3 , 4 (see data in Table 4)).
Figure 6. Secant iteration with test function (99) with initial approximates x 0 A = 3.5 and x 0 B = 2.5 ( Left : p = 0 , 1 , 2; Right : p = 2 , 3 , 4 (see data in Table 4)).
Appliedmath 04 00008 g006
Figure 7. Newton iteration with test function (99) with initial approximate x 0 A = 3.5 ( Left : p = 0 , 1 ; Right : p = 2 (see data in Table 5)).
Figure 7. Newton iteration with test function (99) with initial approximate x 0 A = 3.5 ( Left : p = 0 , 1 ; Right : p = 2 (see data in Table 5)).
Appliedmath 04 00008 g007
Figure 8. T-Secant iteration with test function (99) with initial approximates x 0 A = 3.5 and x 0 B = 2.5 ( Left : p = 0 (with interpolation base points A 0 B 0 ) and p = 1 ( A 1 B 1 ); Right : p = 2 ( A 2 B 2 ) (see data in Table 6)).
Figure 8. T-Secant iteration with test function (99) with initial approximates x 0 A = 3.5 and x 0 B = 2.5 ( Left : p = 0 (with interpolation base points A 0 B 0 ) and p = 1 ( A 1 B 1 ); Right : p = 2 ( A 2 B 2 ) (see data in Table 6)).
Appliedmath 04 00008 g008
Figure 9. Absolute approximate error e p A decrease (dashed lines) and computed convergence rates ( α ) (solid lines) of different methods (Broyden (brown line), Secant (black lines), Newton–Raphson (blue lines), and T-Secant (red lines) method).
Figure 9. Absolute approximate error e p A decrease (dashed lines) and computed convergence rates ( α ) (solid lines) of different methods (Broyden (brown line), Secant (black lines), Newton–Raphson (blue lines), and T-Secant (red lines) method).
Appliedmath 04 00008 g009
Figure 10. Geometrical representation of the T-Secant method convergence in multi-variable case (analogous to the convergence proof figure Dennis-Schnabel [13], p. 180).
Figure 10. Geometrical representation of the T-Secant method convergence in multi-variable case (analogous to the convergence proof figure Dennis-Schnabel [13], p. 180).
Appliedmath 04 00008 g010
Figure 11. ( Left ) Variables x p , i A and ( Right ) absolute approximate errors lg e p A x p , i A i = 1 3 variation for initial trial x 0 A = 2.0 1.5 2.5 .
Figure 11. ( Left ) Variables x p , i A and ( Right ) absolute approximate errors lg e p A x p , i A i = 1 3 variation for initial trial x 0 A = 2.0 1.5 2.5 .
Appliedmath 04 00008 g011
Figure 12. ( Left ) Variation of x p , i A for x 0 A = 2.0 1.5 2.5 1.5 1.2 3.0 3.5 2.5 2.0 3.5 through iterations N = 10 = n = 10 , m = 18 with p max = 15 and N f = 165 . ( Right ) The absolute approximate errors lg e p A x p , i A i = 1 10 and the R x p A function variation through iterations for different initial trials N = 10 , n = 10 , m = 18 (see Table 8).
Figure 12. ( Left ) Variation of x p , i A for x 0 A = 2.0 1.5 2.5 1.5 1.2 3.0 3.5 2.5 2.0 3.5 through iterations N = 10 = n = 10 , m = 18 with p max = 15 and N f = 165 . ( Right ) The absolute approximate errors lg e p A x p , i A i = 1 10 and the R x p A function variation through iterations for different initial trials N = 10 , n = 10 , m = 18 (see Table 8).
Appliedmath 04 00008 g012
Figure 13. ( Left ) Variation of variables x p A through iterations. ( Right ) Decrease in approximate error l g e p A through iterations, N = 200 (with iteration counter p value indication below the graphs).
Figure 13. ( Left ) Variation of variables x p A through iterations. ( Right ) Decrease in approximate error l g e p A through iterations, N = 200 (with iteration counter p value indication below the graphs).
Appliedmath 04 00008 g013
Figure 14. Number of function evaluations for N = 200 (blue), N = 500 (red), and N = 1000 (green) with initial trials x 0 A : 0.5 x 0 , i A 1.5 (solid line) and x 0 A : 0.1 x 0 , i A 19.9 (dashed line).
Figure 14. Number of function evaluations for N = 200 (blue), N = 500 (red), and N = 1000 (green) with initial trials x 0 A : 0.5 x 0 , i A 1.5 (solid line) and x 0 A : 0.1 x 0 , i A 19.9 (dashed line).
Appliedmath 04 00008 g014
Figure 15. T-Newton iterations with test function (99) with initial approximate x 0 A = 4.5 . Left : x 1 B is the root of the tangent line through f 0 A , x 1 A is the root of z 0 ( x ) . Right : x 2 B is the root of the tangent line through f 1 A ; x 2 A is the root of z 1 ( x ) (see data in Table 14).
Figure 15. T-Newton iterations with test function (99) with initial approximate x 0 A = 4.5 . Left : x 1 B is the root of the tangent line through f 0 A , x 1 A is the root of z 0 ( x ) . Right : x 2 B is the root of the tangent line through f 1 A ; x 2 A is the root of z 1 ( x ) (see data in Table 14).
Appliedmath 04 00008 g015
Figure 16. Broyden ( Left ) and T-Broyden ( Right ) iterations with test function (99) with initial approximates x 0 A = 4.5 .
Figure 16. Broyden ( Left ) and T-Broyden ( Right ) iterations with test function (99) with initial approximates x 0 A = 4.5 .
Appliedmath 04 00008 g016
Table 1. Summary of the basic equations (single- and multi-variable cases).
Table 1. Summary of the basic equations (single- and multi-variable cases).
Single-Variable m = n = 1 Multi-Variable m n > 1 Equations
1 x p A x p A
2 x p B x p , k B = x p A + x p , k d k (32)
3 x p = x p B x p A x p = x p , k B x p A = d i a g x p , i (36), (38)
4 f p = f p B f p A F p = f p , 1 , 1 f p , n , 1 f p , 1 , m f p , n , m (34), (37), (39)
5 f p q p A = f p A F p q p A = f p A (49), (41)
6 q p A = f p A f p q p A = F p + f p A (47), (42)
7 x p + 1 A = x p A + x p q p A x p + 1 A = x p A + x p q p A (48), (46)
8 x p A = x p + 1 A x p A x p A = x p + 1 A x p A (54)
9 t p f = f p + 1 A f p A T p F = d i a g f p + 1 , j A f p , j A (50), (67)
10 t p f f p q p B = f p A T p F F p q p B = f p A (51), (61)
11 q p B = q p A t p f = f p A 2 f p + 1 A f p q p B = F p + T p F 1 f p A (52), (68)
12 t p x = x p + 1 B x p + 1 A x p + 1 A x p A = x p + 1 x p A T p X = diag x p + 1 , i x p , i A (55), (65)
13 x p A = t p x x p q p B x p A = T p X x p q p B (56), (62)
14 x p + 1 B = x p + 1 A + x p A 2 x p q p B x p + 1 , i B = x p + 1 , i A + x p , i A 2 x p , i q p , i B (59), (70)
15 x p + 1 = t p f x p A x p + 1 , i = μ i x p , i A (60), (73)
Table 2. T-Secant iteration results with x 0 A = 3.0 and x 0 B = 1.0 (see also Figure 4 for p = 0 , 1 , 2 ).
Table 2. T-Secant iteration results with x 0 A = 3.0 and x 0 B = 1.0 (see also Figure 4 for p = 0 , 1 , 2 ).
p 01234
x p A 3.0 1.545 2.158 2.093 2.09455149745
x p 2.0 0.400 0.103 0.0014 1.6 × 10 8
f p A 16.0 4.400 0.737 0.015 1.8 × 10 7
q p A 0.727 1.532 0.634 1.015 0.999
x p + 1 A 1.545 2.158 2.093 2.09455150 2.0945514815423
e p + 1 A 0.549 0.064 0.0014 1.6 × 10 8 2.7 × 10 14
f p + 1 A 4.400 0.737 0.015 1.8 × 10 7
t p F 0.275 0.167 0.021 1.2 × 10 5
q p B 2.645 9.149 30.329 88077
x p + 1 B 1.945 2.056 2.09453 2.09455148153
Table 3. Summary of the multi-variable Secant and T-Secant methods basic equations.
Table 3. Summary of the multi-variable Secant and T-Secant methods basic equations.
Secant MethodT-Secant MethodEquations
1 X p F p = x p , k B x p A f p , k B f p A = diag x p , i f p , k , j (36), (37)
2 T p = T p X 0 0 T p F (64)
3 T p X = d i a g t p , i x = diag x p + 1 , i x p , i A (65)
4 T p F = diag t p , j F d i a g f p + 1 , j A f p , j A (66), (67)
5 F q A = f A T F F q B = f A (41), (61)
6 F X 1 x A = f A T F F X 1 T X 1 x A = f A (100), (108)
7 S = F X 1 = f k , j x i S T = T F S T X 1 = t j F f k , j t i x x i (102), (111)
8 S x A = f A S T x A = f A (104), (114)
9 x A = S + f A x A = S T + f A (105), (113)
Table 4. Secant method iteration and computed convergence rate, α S (see Figure 6).
Table 4. Secant method iteration and computed convergence rate, α S (see Figure 6).
p x p A x p B x p + 1 A e p + 1 A α S N f
03.52.52.2772 1.8 × 10 1 2
12.52.27722.1282 3.4 × 10 2 3
22.27722.12822.0977 3.2 × 10 3 0.644
32.12822.09772.094611 5.9 × 10 5 2.125
42.09772.0946112.094552 1.1 × 10 7 1.396
52.0946112.094552162.09455148 3.6 × 10 12 1.697
62.09455162.094551482.09455148154233 2.7 × 10 14 1.598
72.094551482.094551481542332.09455148154233 2.7 × 10 14 1.639
Table 5. Newton method iteration and computed convergence rate, α N (see Figure 7).
Table 5. Newton method iteration and computed convergence rate, α N (see Figure 7).
p x p A x p + 1 A e p + 1 A α N N f N f
03.52.61 5.2 × 10 1 11
12.612.200 1.1 × 10 1 22
22.2002.10037 5.8 × 10 3 1.5833
32.100372.09457 1.9 × 10 5 1.8244
42.094572.09455148 2.0 × 10 10 1.9755
52.094551482.09455148154233 2.7 × 10 14 2.0066
Table 6. T-Secant method iteration and computed convergence rate, α T S (see Figure 8).
Table 6. T-Secant method iteration and computed convergence rate, α T S (see Figure 8).
p x p A x p B x p + 1 A e p + 1 A α TS N f
03.52.52.28 1.8 × 10 1 2
12.282.18792.1032 8.6 × 10 3 4
22.10322.09571122.0945571 5.6 × 10 6 1.506
32.09455712.094551512.09455148154242 1.2 × 10 13 2.418
42.094551481542422.094551481542332.09455148154233 2.7 × 10 14 2.4010
Table 7. Iteration results: x 0 A = 2.0 1.5 2.5 , T min = 0.01 , T max = 1.5 .
Table 7. Iteration results: x 0 A = 2.0 1.5 2.5 , T min = 0.01 , T max = 1.5 .
p 0123
x p A 2 1.5 2.5 1.253 0.938 5.248 1.026 0.990 0.980 1.00004 0.99998 0.99994
x p 0.1 0.075 0.125 0.046 0.061 0.026 0.0217 0.0079 0.063 3 × 10 4 1 × 10 4 2 × 10 4
f p A 55 1 47.5 2.5 6.320 0.253 61.28 0.062 0.621 0.026 0.005 0.010 0.00102 0.00004 0.00021 0.00002
q p A 7.47 32.5 22.0 4.915 0.846 243.9 1.184 1.269 0.307 0.160 0.201 0.327
x p + 1 A 1.253 0.938 5.248 1.026 0.990 0.980 1.00004 0.99998 0.99994 0.9 1.0 1.0
e p + 1 A 0.253 0.062 6.248 0.026 0.010 0.020 4 × 10 5 2 × 10 5 6 × 10 5 3 × 10 9 2 × 10 9 5 × 10 9
R x p + 1 A 6.2 × 10 1 6.2 × 10 1 1.0 × 10 3 9.0 × 10 8
ε p 2.1 × 10 0 1.1 × 10 2 2.6 × 10 5 2.2 × 10 9
f p + 1 A 6.32 0.253 61.3 0.062 0.621 0.026 0.005 0.010 0.00102 0.00004 0.00021 0.00002 0.0 0.0 0.0 0.0
t p F 0.115 0.253 1.290 0.025 0.098 0.102 0.01 0.163 0.01 0.01 0.044 0.01 0.01 0.01 0.01 0.01
q p B 120 1298 2365 51.6 5.52 240000 118 127 32 16.0 20.1 32.7
x p + 1 B 1.299 0.999 5.273 1.004 0.998 0.917 0.99978 1.00008 1.00013 1.0 0.9 0.9
x p + 1 0.046 0.061 0.026 0.0217 0.0079 0.063 3 × 10 4 1 × 10 4 2 × 10 4 4 × 10 7 2 × 10 7 6 × 10 7
Table 8. Initial trial vectors N = 10 , n = 10 , m = 18 , x * = 1 1 .
Table 8. Initial trial vectors N = 10 , n = 10 , m = 18 , x * = 1 1 .
x 0 A p max N f
1 1.3 1.5 2.1 1.1 1.3 1.8 1.8 1.7 2.0 2.1 15165
2 3.1 2.1 4.3 1.2 2.4 3.6 1.6 2.7 4.2 2.2 21231
3 4.1 1.1 6.3 3.2 4.4 1.6 3.6 5.7 2.2 3.2 --
4 3.0 3.1 2.3 4.2 2.4 1.6 3.6 2.7 2.2 4.2 --
5 2.1 3.1 1.3 2.2 3.4 1.6 2.6 1.7 2.2 3.2 16176
6 3.1 3.1 4.3 2.2 3.4 2.6 1.6 4.7 2.2 2.2 20220
Table 9. Iteration results ( N = 200 , L 1 = 99.9 , L 2 = 9 ) with initial trials 0.1 x 0 , i A 19.9 (dashed blue line on Figure 14).
Table 9. Iteration results ( N = 200 , L 1 = 99.9 , L 2 = 9 ) with initial trials 0.1 x 0 , i A 19.9 (dashed blue line on Figure 14).
p ε p R x p A N f
010.692583340579124123.437737263271
15.459174119119256895.1103569982861201
22.133384347464631247.4064173528971402
30.71430571273689220.36900527956962603
40.16351163903129932.621494717337107804
50.01456166202706592.40775097384139691005
60.0001970035117718940.0263662338310300461206
70.0000000847689096020.0000079828269138711407
80.0000000000327912100.0000000031144290231608
90.0000000000000138620.0000000000013338301809
100.0000000000000005460.0000000000001041852010
Table 10. Iteration results ( N = 1000 , L 1 = 5 , L 2 = 0 ) with initial trials 0.5 x 0 , i A 1.5 (solid green line on Figure 14).
Table 10. Iteration results ( N = 1000 , L 1 = 5 , L 2 = 0 ) with initial trials 0.5 x 0 , i A 1.5 (solid green line on Figure 14).
p ε p R x p A N f
00.287800987765134212.385127865603641
10.12121940364369557.873782113565121001
20.039626334837648713.7438405112114172002
30.02980608443657209.66180771420972383003
40.01203705390084355.94657821064068414004
50.0007054899229366290.424652468534448775005
60.0000027625867237540.0013241152543485896006
70.0000000009904213800.0000003889652530037007
80.0000000000004332090.0000000001559304108008
90.0000000000000008600.0000000000003631499009
Table 11. Efficiencies of classic and improved algorithms.
Table 11. Efficiencies of classic and improved algorithms.
Method d p EFF  [25] EFF * [26]
Secant1 1.618 1.618 1.618
Newton2 2.0 1.0 1.414
T-Secant2 2.618 1.309 1.618
T-Newton3 3.0 1.0 1.442
Table 12. Calculated values of the mean convergence rates ( L and L N ) for the Rosenbrock function (   1 : a substitute value 10 25 was used when R x p max A = 0 ).
Table 12. Calculated values of the mean convergence rates ( L and L N ) for the Rosenbrock function (   1 : a substitute value 10 25 was used when R x p max A = 0 ).
N Method R x 0 A R x p max A p max N f L L N
12Broyden 1. [27]4.91934.73 × 10 1 -590.3910.78
22Broyden 2. [27]4.91932.55 × 10−1-390.6071.22
32Powell [28]4.91937.00 × 10 1 -1510.1500.30
42ACD [29]130.0621.00 × 10 1 -3250.0860.17
52Nelder-Mead [30]2.00001.36 × 10 1 -1850.1270.25
62T-secant [27,28]4.91931.0 × 10−25 1396.573 113.15 1
72T-secant [29]130.061.0 × 10−25 1396.937 113.87 1
82T-secant [30]2.00006.66 × 10−15265.55611.11
93T-secant72.7221.41 × 10−145201.8095.43
10332.4661.0 × 10−25 14163.815 111.45 1
11593.5281.34 × 10−148480.7603.80
1257.1935.90 × 10−144241.3516.76
1310202.621.0 × 10−25 1141540.408 14.08 1
1420092.7789.00 × 10 15 1020100.0428.44
151000212.393.63 × 10 13 660060.0065.66
Table 13. Newton method iteration and computed convergence rate, α N .
Table 13. Newton method iteration and computed convergence rate, α N .
p x p A x p + 1 A e p + 1 A α N N f N f
04.53.187 1.1 × 10 0 11
13.1872.44965 3.6 × 10 1 22
22.449652.14996 5.5 × 10 2 1.4233
32.149962.096188 1.6 × 10 3 1.6644
42.0961882.094552 1.5 × 10 6 1.8955
52.0945522.09455148 1.3 × 10 12 1.9966
62.094551482.09455148154233 3.6 × 10 15 2.0077
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Berzi, P. Convergence and Stability Improvement of Quasi-Newton Methods by Full-Rank Update of the Jacobian Approximates. AppliedMath 2024, 4, 143-181. https://doi.org/10.3390/appliedmath4010008

AMA Style

Berzi P. Convergence and Stability Improvement of Quasi-Newton Methods by Full-Rank Update of the Jacobian Approximates. AppliedMath. 2024; 4(1):143-181. https://doi.org/10.3390/appliedmath4010008

Chicago/Turabian Style

Berzi, Peter. 2024. "Convergence and Stability Improvement of Quasi-Newton Methods by Full-Rank Update of the Jacobian Approximates" AppliedMath 4, no. 1: 143-181. https://doi.org/10.3390/appliedmath4010008

Article Metrics

Back to TopTop