Next Article in Journal
Total Coloring of Dumbbell Maximal Planar Graphs
Next Article in Special Issue
On Numerical Analysis of Bio-Ethanol Production Model with the Effect of Recycling and Death Rates under Fractal Fractional Operators with Three Different Kernels
Previous Article in Journal
Applications of Certain p-Valently Analytic Functions
Previous Article in Special Issue
Revisiting the Copula-Based Trading Method Using the Laplace Marginal Distribution Function
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dimensionality of Iterative Methods: The Adimensional Scale Invariant Steffensen (ASIS) Method

by
Vicente F. Candela
1,2
1
Departament de Matemàtiques, Universitat de València, 46100 Burjassot, Spain
2
ESI International Chair@CEU-UCH, Universidad Cardenal Herrera-CEU, CEU Universities San Bartolomé 55, 46115 Alfara del Patriarca, Spain
Mathematics 2022, 10(6), 911; https://doi.org/10.3390/math10060911
Submission received: 30 December 2021 / Revised: 4 March 2022 / Accepted: 8 March 2022 / Published: 12 March 2022
(This article belongs to the Special Issue New Trends and Developments in Numerical Analysis)

Abstract

:
The dimensionality of parameters and variables is a fundamental issue in physics but is mostly ignored from a mathematical point of view. Difficulties arising from dimensional inconsistency are overcome by scaling analysis and, often, both concepts, dimensionality and scaling, are confused. In the particular case of iterative methods for solving nonlinear equations, dimensionality and scaling affect their robustness: While some classical methods, such as Newton’s, are adimensional and scale independent, some other iterations such as Steffensen’s are not; their convergence depends on scaling, and their evaluation needs a dimensional congruence. In this paper, we introduce the concept of an adimensional form of a function in order to study the behavior of iterative methods, thus correcting, if possible, some pathological features. From this adimensional form, we will devise an adimensional and scale invariant method based on Steffensen’s, which we will call the ASIS method.

1. Introduction

The main difference between absolute and relative errors in approximation is that while absolute errors share the dimension of the values they approximate, relative errors are adimensional and, therefore, independent from scaling. This trivial idea is often ignored when the parameter we want to approximate is the solution of an equation. From a numerical point of view, scaling is a fundamental feature, but, often, dimensional congruence is ignored. As a consequence of disregarding dimensionality, some methods devised originally for scalar, real, or complex equations are not useful or are hard to adapt to general vector Banach spaces. This is a reason why many iterative methods are not used in optimization (of course, storage and computational costs are also important).
In [1], a nice introduction to dimensional and scaling analysis is found. Although dimensional analysis goes back as far as Fourier in 1822, it was boosted in the second decade of last century, especially since the works of E. Buckingham ([2], for example) and most of them are from a physical rather than mathematical point of view. In fact, dimensionality is a fundamental concept in physics in order to determine the consistency of equations and magnitudes, but it is not so important in mathematics. The simple example of coefficients of polynomials illustrates this assertion.
Given p ( x ) = a 0 + a 1 x + a 2 x 2 + + a n x n , if x and p ( x ) have dimensions denoted by [ x ] and [ p ] respectively, then the coefficients must have dimension [ a k ] = [ p ] [ x ] k = [ p ( k ) ( x ) ] . This obvious remark implies that a k is a constant magnitude but not a constant number (a change of scale both in x or in p ( x ) modifies its value). Numerically, this means that we must be careful with scaling (if x is a length, a change from centimeters to inches, for instance, implies a change of coefficients), and one must take into account that a k = p ( k ) ( 0 ) k ! . Dimensional considerations do not come from the physical units we use but from the kind of magnitude we are measuring (let us say time, length, weight, etc.). To put it shortly, dimensionality is inherent to the variable while scaling is accidental and subject to changes.
In general, dimensional analysis spans over different and apparently unrelated topics in mathematics. It reaches its strength in nonlinear equations, where dimensionality and scaling are clearly separated. The purpose of this paper is the application of dimensional analysis of iterative methods for nonlinear equations.
We start by some basic notions about convergent sequences. It is well known [3] that a convergent sequence, { x k } k 0 , in a Banach space ( E , · ) with limit x E has Q-order of convergence at least p if there exists a constant K > 0 such that the following is the case.
x n + 1 x x n x p K
Therefore, K is not an adimensional parameter when p 1 . It has dimension [ K ] = [ x ] 1 p . Any change of scale y = c x changes the value of K. This fact, which could become dangerous, is annoying because there exists another weaker definition, R-order of convergence [3], that is scale invariant, where the sequence has an R-order of at least p > 1 if there exists a constant K and an n 0 > 0 such that the following is the case.
lim sup n x n x 1 / p n K
In this definition, K is adimensional ( [ K ] = [ 1 ] ), and there are no issues derived from scales or dimensions. In addition to the evident differences between both definitions, there is a conceptual motivation: Q-order controls the evolution of every term n, while R-order focuses on the global (limit) behavior of the sequence. From this point of view, it is not surprising that the Q-order is a more restrictive definition than that of R-order.
Here, we need to establish an order independent from dimensionality, as it happens with R-order, but that is still able to analyze convergence for any step n as Q-order. Thus, we will introduce an adimensional Q-order, which is scale independent and adimensional and is weaker than the Q-order but we retain its essential core. As a consequence, we will obtain an adimensional R-order, which is, in fact, a reformulation of the traditional R-order.
Now, we are ready to focus on iterative methods for solving nonlinear equations such as F ( x ) = 0 , when F : E V , E, and V Banach spaces. The idea behind these methods is to correct a previous approximation x k to the solution x by the following:
x k + 1 = x k + Δ k = x k · ( 1 + δ k )
[ Δ k ] = [ x ] . Therefore, Δ k , the absolute error, depends on x: Any change of scale or dimension in x leads to a similar change on Δ k . The relative error, δ k , is adimensional, and δ k = Δ k / x k relates both definitions. As we said above, it is not surprising that, numerically, the first option is preferred, although δ k is free from any changes of magnitude.
One of the simplest and best known methods is the first order one obtained when Δ k = c F ( x k ) is under certain restrictions on c. Numerically, these restrictions are due to scale: I c F ( x ) < 1 for x in the domain. From a dimensional point of view, the following is the case.
[ Δ k ] = [ c ] [ F ] [ c ] = [ x ] [ F ] 1
Restrictions on the size of c are a consequence of its dimension, considering that c dependent of F is more clarifying than thinking of c as a constant.
On the other hand, Δ k = F ( x k ) 1 F ( x k ) yields Newton’s, which is dimensionally consistent.
[ Δ k ] = [ F ] [ F 1 ] = [ x ] ( because [ F ] = [ F ] [ x ] 1 ) .
Another second order method such as Steffensen’s is defined as follows:
Δ k = F [ x k + F ( x k ) , x k ] 1 F ( x k )
where F [ · , · ] is the divided difference operator (bracket notation should not be confused with the dimension). Scaling is indeed a concern, but the most important feature is the inconsistency of dimensions. The sum x + F ( x ) in the operator only makes sense if E = V , because [ x ] and [ F ] are not equal in general. This is one of the reasons why, although competitive with respect to Newton’s, Steffensen’s is less popular.
However, Steffensen’s has a clear advantage over Newton’s method: it is not necessary to evaluate Jacobians or gradients for Steffensen’s, because divided differences are usually obtained by the following interpolating condition:
F [ x , y ] ( x y ) = F ( x ) F ( y )
which does not require explicit derivatives.
The rest of this study will focus on two main goals. The first one is the concept of the adimensional form of a function, a tool we introduce, which keeps all information of the function, but it is invariant to changes of scales and dimensions of the variables. We will use these adimensional functions in a theoretical manner to obtain semilocal convergent results and, in practical applications, to obtain versions of nonrobust methods in order to make them scale independent. The second part of this paper deals with the analysis, development, and features of these modified methods.
In particular, we will prove the power of adimensional functions in the particular case of Steffensen’s method. We will devise an adimensional and scale invariant (ASIS) correction of Steffensen’s method and we will settle some sufficient semilocal conditions of convergence, from which optimal estimates of errors will be obtained. As a difference with previous published results, sufficient conditions of convergence will not be more restrictive than those of Newton’s. Furthermore, so far, optimality has not been obtained for Steffensen’s method in the literature. Thus, our estimates improve previous works in some aspects.
This paper is organized as follows: The next section will be divided into two subsections in which we review some basic and classical results on dimensionality and we introduce the adimensional form of functions; Section 3 will focus on iterative methods for nonlinear equations in Banach spaces; there, we also point out a fundamental theorem relating both worlds (dimensionality and iterative methods). In Section 4, we will apply the adimensional form of polynomials and functions to the analysis of semilocal convergence of iterative methods in Section 5. In Section 6, we devise new variations of classic scale-dependent iterative methods, such as Steffensen’s, in order to eliminate their scale and dimensional dependence. Section 7 will illustrate, by using examples, the theoretical results in this paper. Finally, we draw conclusions in Section 8.

2. Basic Concepts on Dimensionality

Dimension of a variable is a physical concept related to the units in which that variable is measured. It is one of the concepts that is easier to understand than to define. A physical quantity may represent time, length, weight, or any other magnitude. Equations in physics relate one or more of these dimensions and they must show consistency: dimensions in both sides of the equation must be equal. Nevertheless, this trivial remark is often ignored because either there is no evident method to obtain the dimension of any term or, as it happens in mathematics, good scaling may overcome the potential risks of dimensionality.
However, dimensionality and scale are not always equivalent. For example, changing the measure of a length from, say, centimeters to inches is a change of scale (and very dangerous indeed), but their dimensions are the same: length. Inconsistency happens when there are different measures involved in each term of the equation (time versus length, for instance). The problems increase when the equation is not scalar but is defined in Banach spaces. In this case, it is easy to confuse dimensionality and dimension (two vectors of the same dimension may belong to different Banach spaces and have different dimensionality even if there exists an isomorphism between both), leading to some problems in areas such as optimization.
In the rest of this section, we look over some aspects of dimensional analysis in Section 2.1, which will drive us to introduce the adimensional form of functions in Section 2.2.

2.1. Dimensionality

From this point onward, we will use classical notation, [ x ] , for the dimension of a variable x. If z is an adimensional parameter, we will denote it by [ z ] = [ 1 ] . Unlike classic dot notation in physics, we will denote the derivatives of a function f ( x ) by f ( x ) , f ( x ) , … as is normally used in mathematics. We will omit variables unless we shall explicate them due to the context.
For vectors and Banach spaces in general, the functions will be denoted by capital letters ( F ( x ) , G ( x ) , …), and the vectors x, y, … will use the same notation as for scalars (there shall not be any confusion). F ( x ) and F ( x ) will denote the Jacobian and Hessian of F ( x ) , respectively.
F : E V , where ( E , · ) and ( V , · ) are Banach spaces. Their norms, although different, will be denoted with the same notation (once again, it shall not be confusing).
It is not intuitive to introduce dimensionality in nonscalar Banach spaces. When the space is finite dimensional, we define [ x ] as a vector where every component has the dimension of the corresponding element of the base. Thus, if x = ( x 1 , , x m ) , then [ x ] = ( [ x 1 ] , , [ x m ] ) . We must consider other operations in addition to the sum and multiplication by scalars in the context of nonlinear equations. Hence, nonlinear operations must be carried pointwise (component by component). For example, x o y = ( x 1 o y 1 , , x m o y m ) , and [ x o y ] = ( [ x 1 o y 1 ] , , [ x m o y m ] ) , for any operation o .
For infinite dimensional Banach spaces, dimensionality loses its physical sense, and it is a more complicated concept. For the purpose of this paper, we consider dimensionality in a similar manner as finite dimensional vector spaces: here, [ x ] is an infinite vector, and its operations are also performed pointwise. There is no misunderstanding when all the bases have the same dimensionality although, in general, the dimensional vectors may not be homogeneous.
Elementary rules of dimensionality apply.
[ x · y ] = [ x ] [ y ] ; [ x / y ] = [ x ] [ y ] 1 ; [ x n ] = [ x ] n
Rules for differentiability and integration are particularly interesting.
[ F ] = [ F ] / [ x ] = [ F ] [ x ] 1 ; F ( x ) d x = [ F ] [ x ]

2.2. Adimensional Polynomials and Functions

Dimensionality may affect the evaluation of variables, functions, and their derivatives. Any equation with terms involving them must be carefully managed in order to not lose physical meaning. For example, terms such as x + f ( x ) or c f ( x ) f ( x ) can only make sense, from a dimensional point of view, if both terms in the sums are dimensionally homogeneous.
One of the consequences of careless handling of dimensionality is that stability of equations may be affected by rescaling, if the terms do not share the same dimensionality.
In order to avoid possible difficulties due to dimensionality, we propose confronting adimensional equations: Any function and variable will be reduced to adimensional forms by linear transforms; all the terms will have dimensionality [1]. Thus, dimensional and scaling issues will appear only in linear transforms and not in the resolution of the equation.
The following definitions will clear the concept and process of the above proposal.
Definition 1.
Given a (real or complex) scalar nonlinear function f ( x ) , such that both f ( 0 ) 0 and f ( 0 ) 0 , theadimensional normalized formof f ( x ) is a function g ( y ) such that g ( 0 ) = 1 , g ( 0 ) = 1 , and y = ( f ( 0 ) / f ( 0 ) ) x .
Then, given f ( x ) such that f ( 0 ) 0 and f ( 0 ) 0 , the following is the case.
f ( x ) = f ( 0 ) g f ( 0 ) f ( 0 ) x
We notice that [ f ( 0 ) ] [ g ] = [ f ] and, therefore, [ g ] = [ 1 ] and [ y ] = [ f ] [ f ] 1 [ x ] = [ 1 ] . Thus, both y and g ( y ) are adimensional scale independent, which is the goal of this definition.
Some remarks on this definition are provided here. In the first place, it is not necessary to normalize the function in x 0 = 0 . If f ( 0 ) = 0 , f ( 0 ) = 0 , or it is convenient not to use x 0 = 0 , any other value of x 0 can be useful. On the other hand, the choice of the minus sign on g ( 0 ) and y is not relevant, but it simplifies developments in the rest of the paper. Finally, let us say that normalization is not a strict requirement. Approximate normalization ( g ( 0 ) 1 , g ( 0 ) 1 ) is a realistic setting when errors appear in finite precision systems. Furthermore, in some cases, normalization will not be required at all.
For general Banach spaces, the definition above can be easily generalized.
Definition 2.
Let E , V be Banach spaces, U E , F : U V , a differentiable function, and x 0 U such that F ( x 0 ) is invertible, theadimensional normalized formof F ( x ) is the following function:
G ( y ) = F ( x ) F ( x 0 )
where the following is the case:
y = F ( x 0 ) F ( x 0 ) x
and the following is obtained.
y 0 = F ( x 0 ) F ( x 0 ) x 0
It is verified: G ( y 0 ) = 1 and G ( y 0 ) = I .
The remarks we pointed out in the scalar case are also valid here.
Let us notice that the change of variables from x to y is also a change of space from E to an adimensional space V ˜ homeomorphic to V. V ˜ is an example showing that dimension and dimensionality are different: it has the same dimension as V but, for any y V ˜ , [ y ] = 1 . As G : V ˜ V ˜ is an endomorphism, y + G ( y ) is a consistent sum in V ˜ as opposed to the general case. As we do not have to worry about dimensionality or scales, adimensional functions may speed up and regularize the computations.
Adimensional functions are not only useful for iterative approximations. For example, the well-known pathological zig-zag optimization problem ([4], p. 348) consisting of minimizing function H ( x , y ) = ( x 2 + b y 2 ) / 2 is implied to solve, via the steepest descent (a first-order method similar to (6) system: F ( x , y ) = ( F 1 ( x , y ) , F 2 ( x , y ) ) = ( x , b y ) = ( 0 , 0 ) ), a very slow convergence when b is small. Actually, its rate of convergence is as follows.
H ( x n + 1 , y n + 1 ) = 1 b 1 + b 2 H ( x n , y n )
However, its adimensional form is G ( u , v ) = ( x , y ) / F ( x 0 , y 0 ) , being u = x / F ( x 0 , y 0 ) , v = b · y / F ( x 0 , y 0 ) , and it converges in only one step.
Another example of how adimensionality can help improve the performance of some algorithms is the evaluation of matrix functions. In particular, Newton’s iteration for approximating the square root of a matrix A, from an initial invertible guess X 0 , provides the sequence of approximates { X k } k 0 for k 0 .
X k + 1 = 1 2 ( X k + X k 1 A )
Although, theoretically, this sequence converges to A 1 / 2 , it is well known ([5]) that it may not converge computationally if the condition number of A is greater than 9. If X 0 commutes with A, the adimensional form of the sequence becomes the following.
Y k + 1 = 1 2 ( Y k + Y k 1 X 0 2 A )
X k = X 0 1 Y k
It is seen that this adimensional recurrence is equivalent to a preconditioning of the matrix A by multiplying times X 0 2 . Adimensioning is equivalent to regularization, and some preconditioning techniques appearing in ([6]) can be considered as particular cases of adimensional matrices. Furthermore, in the adimensional case, it is not necessary to consider X 0 as the regularizer. Any other matrix C can be considered and the following is also a valid adimensional recurrence.
Y k + 1 = 1 2 ( Y k + Y k 1 C 2 A )
X k = C 1 Y k
If A is symmetric and C is diagonal, commutativity is given. If commutativity is not granted, a Lyapunov-type of equation may arise and some deeper analysis must be performed, although, even in such noncommutative case, adimensionality translates also as a preconditioning of A.
Similar reasoning follows for the other matrix functions.
Before proceeding, we will review some basics on iterative methods. This classic topic has been treated in most elementary books on iterative methods ([7,8], for instance).

3. Iterative Methods for Nonlinear Equations

Iterative methods for solving a nonlinear scalar equation f ( x ) = 0 are based on the simple idea stated in the Introduction: We start by approximating value x such that f ( x ) = 0 by an initial guess x 0 , and we correct the approximations by recurrence, x k + 1 = x k + Δ k . Thus, a sequence { x n } n 0 is obtained. If that sequence converges to a limit x , we obtain the solution of the equation.
Of course, this idea becomes more and more complicated in practice. In order to obtain convergence, the corrections Δ k must be well devised (depending on function f ( x ) and, often, its derivatives, if they exist), and some restrictions must apply on the own function and the initial guess. On the other side, a stopping criterion is needed in order to obtain a good approximation relative to the root.
The accuracy of the method depends not only on the design (consistence) of the recurrence but on the stability (sensitivity to errors) and the speed of convergence. There are different methods to define the speed of a sequence, but we point out Q-order and R-order defined in (1) and (2), respectively.
Perhaps the easiest iterative method is the bisection. Its approximates not values but intervals. However, a point-wise version of this method can be obtained if we consider the middle point of the interval where the function f ( x ) switches signs. As the intervals are divided every step, a choice (an if) must be performed every step. No other information about f ( x ) is needed but the sign. When there is a change of sign, it has first-order convergence.
x n + 1 x 1 2 x n x
The more we know about f ( x ) , the more sophisticated methods can be obtained. By carefully choosing a constant c R , the recurrence of the following:
Δ k = c · f ( x k )
is a first-order method.
As we saw in the introduction, we can eliminate dimensionality of parameter c by introducing this first-order method:
x n + 1 = x n λ f ( x 0 ) f ( x k )
where λ is adimensional, but some scale requirements are still needed: 0 < λ f ( x ) f ( x 0 ) < 2 . Improvement comes from the fact that, once λ is fixed, any rescaling on x or f ( x ) does not affect λ .
However, the main method around which all theory has evolved is Newton’s, where the following is the case.
Δ k = f ( x k ) f ( x k )
This needs more requirements than the other two ( f ( x ) must be differentiable and its derivative must be invertible), and its convergence is usually local, but when it works, it is a second order method. More importantly, it provides insight on this topic. A great amount of papers have been published on Newton’s method. Since Kantorovich settled conditions for convergence of this method in Banach spaces ([9]), a lot of papers have appeared that analyzed this method, and some of them are classics ([10,11,12] among others) while new and different forms of variants and analysis are also frequent ([13,14,15,16] to cite just a few). In [14], an updated bibliography can also be found).
In the cited bibliography, some variants of Newton provided in the literature can be found: quasi-Newton in optimization [17] and approximated Newton in computational analysis are some of these versions [18,19].
Let us focus on quasi-Newton methods. They are based on the approximation of the derivative by means of divided differences. For scalar functions, a divided difference operator in two nodes x and y, denoted by f [ x , y ] , is defined as follows:
f [ x , y ] = f ( x ) f ( y ) x y
where f [ x , y ] verifies the interpolatory condition.
f [ x , y ] ( x y ) = f ( x ) f ( y )
If f ( x ) is at least twice differentiable, divided differences approximate the derivative of f ( x ) :
f [ x , y ] = f ( x ) + f ( x + ξ ( y x ) ) ( y x )
for ξ [ 0 , 1 ] .
Thus, an adequate choice of the nodes is a good strategy to obtain an approximation to Newton’s method when derivatives are unknown or costly.
By choosing x k 1 , x k as nodes, the secant method is obtained.
Δ k = f [ x k 1 , x k ] 1 f ( x k )
In spite of a small reduction in the order of convergence (it is a superlinear method with order of convergence ( 1 + 5 ) / 2 ) and an increase in storage in memory (a previous approximation must be kept), the fact that derivatives are not explicitly needed and that it only requires one additional evaluation of f ( x ) for every iteration constitutes reasons why this is a very popular method.
On the other side, the dimension of f [ x n 1 , x n ] is [ f ] [ x ] 1 = [ f ] , and the secant method does not introduce issues on scales and dimensions.
Finally, our goal in this paper is addressed to Steffensen’s method. It is a quasi-Newton method in which the nodes are x and x + f ( x ) .
Δ k = f [ x k + f ( x k ) , x k ] 1 f ( x k )
This method is second order, as Newton’s, and it requires only two evaluations of f ( x ) , but the derivative is not needed. However, in spite of the amount of literature it has generated (as in [20,21,22,23,24]), it is not very popular and is often dismissed as an alternative to other classical methods. The reasons, even though not always explicit, are related to dimensionality.
As opposed to the other methods we have outlined in this section (except perhaps (6)), Steffensen’s is scale dependent. The sum x + f ( x ) depends strongly on the scales of both x and f ( x ) . A change of scale on x and c x or f ( x ) and k f ( x ) transforms x + f ( x ) in c x + k f ( x ) , which is not a change of scale in the node.
Some variations have been proposed in order to adjust this method. One of the most rediscovered is the modification of node x + f ( x ) by shrinking (or dilating) the function times a constant α : x + α f ( x ) . The idea behind this is the same useful resource after many scaling problems: a small value of | α | provides a better approximation of f [ x + α f ( x ) , x ] to the exact derivative f ( x ) (see [20]).
While being a good alternative to the method, the problem is deeper. It is not a scaling but a dimensional problem: [ x ] and [ f ] are usually different, and their sum is inconsistent.
Part of the problem can be solved by considering α not as an adimensional factor but as [ α ] = [ x ] [ f ] 1 (that is, with the same dimension as f ( x ) 1 ). In this case, difficulties may arise from the definitions of α , which can vary greatly if the derivative also varies. A possibility is to use the same idea we used in (7).
f [ x , x + λ f ( x ) / f ( x 0 ) ]
Here, λ is an adimensional constant, and it is not influenced by a change of scale.
To conclude, bisection and Newton’s and secant methods are robust due to their scale independence, while the first-order one (6) and Steffensen’s are not and some adjustments must be performed in order to keep them practical.
Emphasis on the importance of Newton’s is not a subjective appreciation. The following theorem shows a dimensional aspect of this method, which deserves to be highlighted:
Theorem 1.
Under the notation in this paragraph, if f ( x ) is at least once differentiable, then, any iterative method in the form (3) must verify the following dimensional relation.
[ Δ k ] = f ( x k ) f ( x k )
As a consequence, any iterative method can be represented as follows:
Δ k = h ( x k ) f ( x k ) f ( x k )
where h ( x ) is an adimensional function, [ h ] = [ 1 ] .
The proof is evident because [ Δ k ] = [ x ] = [ f / f ] . However, some of its consequences are not evident: for instance, it is not trivial to think of bisection as an adimensional modification of Newton’s method.
Moreover, by taking into account that [ f / f ] = [ x ] , its derivative of the following:
f ( x ) f ( x ) = 1 f ( x ) f ( x ) f ( x ) 2 = 1 L f ( x )
is adimensional. L f ( x ) is called the degree of logarithmic convexity, and (12) can be expressed as follows.
Δ k = h ( L f ( x k ) ) f ( x k ) f ( x k )
By differentiating (13) and canceling derivatives, we obtain this well-known characterization of second and third-order algorithms [25]:
Corollary 1.
An iterative method such as (13) is second order if h ( 0 ) = 1 and third order if, in addition, h ( 0 ) = 1 / 2 .
From Theorem 1, higher-order characterizations can be obtained, depending on higher derivatives of L f ( x ) .

4. Adimensional Semilocal Analysis for Iterative Methods

Previous sections provide some insight on the relation between iterative methods, scaling, and dimension. How do linear changes in x or f ( x ) affect iterations?
As we stated above, Newton and secant are independent of the scale.
If y = c x , g ( y ) = k f ( y ) , and y k = c x k , then the following is the case.
y k + 1 = y k g ( y k ) g ( y k ) = c x k f ( x k ) f ( x k ) = c x k + 1
Rescaling f ( x ) does not modify the iteration in the sense that a change of scale in variable x produces the same rescaling on the iterates.
Similar behavior shows the secant method and even bisection. However, as we saw above, first order (6) and Steffensen’s are different, and we must redefine the variables or the parameters in order to be rid of scaling troubles.
In order to analyze Steffensen’s method, some preliminary operations must be performed in the estimation of the errors of iterative methods.
There are two principles to ensure the convergence of the methods in Banach spaces. One of them is the majorizing sequence and the other one is the majorizing function.
Definition 3.
Given a sequence { x n } n 0 E (E a Banach space), an increasing sequence { t n } n 0 R is a majorizing sequence if, for all k 0 , the following is the case.
x k + 1 x k t k + 1 t k
Convergence of { t n } n implies convergence of { x n } n and, if t and x are their limits, x x k t t k .
Majorizing sequences are often obtained by applying the iterative method to majorizing functions, which are defined as follows.
Definition 4.
Given F : E V , a real function f ( t ) is a majorizing function of F ( x ) if for an x 0 E , for any x in a neighborhood of x 0 , F ( x ) | f ( x x 0 ) | .
Newton’s method ( x n + 1 = x n F ( x n ) 1 F ( x n ) ) illustrates semilocal convergence analysis. Kantorovich established conditions on the initial guess, x 0 , in order to obtain convergence of Newton’s. These conditions were related to the boundness of the following variables.
Theorem 2.
Let F : U V be a differentiable function defined in a neighborhood, U E , and x 0 U such that F ( x 0 ) is invertible, and there exist three positive constants K 2 , η , B > 0 verifying the following:
1. 
F ( x ) K 2 , for all x U ;
2. 
F ( x 0 ) 1 B ;
3. 
F ( x 0 ) 1 F ( x 0 ) η .
Then, if K 2 B η 1 / 2 , Newton’s method from x 0 converges.
The proof can be found in classical textbooks [7,8]. For our purposes, we point out a key step in this proof:
p ( t ) = K 2 2 t 2 1 B t + η B
where it is a majorizing function, when starting from t 0 = 0 . Condition K 2 B η 1 2 means that the discriminant of p ( t ) is greater or equal to zero; therefore, p ( t ) has two simple positive roots or a double positive root.
Dimensional analysis simplifies this polynomial. The adimensional form of p ( t ) is the following:
q ( s ) = a 2 s 2 s + 1
where a = K 2 B η , s = t η , and p ( t ) = B η q ( s ) .
As q ( s ) is invariant from scales, any change of scale implies changes on K 2 , B, or η but not on a and, hence, on q ( s ) . Thus, control of the convergence of the method depends on parameter a.
For third-order iterative methods, such as Halley’s, Chebyshev’s, or Euler’s, it is useful to consider the cubic majorizing polynomial of the following ([26,27]):
p ( t ) = K 3 6 t 3 + K 2 2 t 2 1 B t + η B
assuming that the third-order derivative is bounded | F ( x ) | K 3 for all x in the domain.
The adimensional form of p ( t ) is as follows: q ( s ) = b 6 s 3 + a 2 s 2 s + 1 .
( b = K 3 B η 2 , a = K 2 B η ) .
Convergence is obtained, as for Newton’s, when q ( s ) has two simple positive roots or one positive double root (there is always a negative root).
In [26,27], a different method to study convergence is introduced. The idea is to obtain a system of sequences, two of them being fundamental because they control both iterates x k and their values F ( x k ) and some auxiliary sequences controlling different aspects of the iteration, if needed. The main advantage of these systems is that the sequences are adimensional, and the error estimates do not depend on the scales. We will use that technique in the next section applied to Steffensen’s method.

5. A System of a Priori Error Estimates for Newton’s Method

We start by illustrating the use of systems of bounds in the last paragraph and applying it to Newton’s method. We refer the interested reader to [26] for details. Let us consider the following system of sequences.
Let a > 0 , a 0 = 1 , and d 0 = 1 for all n 0 . By recurrence, we define the following sequences.
( i ) a n + 1 = a n 1 a a n d n ( ii ) d n + 1 = a 2 a n + 1 d n 2
This system is said to be positive if a n > 0 for all n 0 . It is easy to prove that the system is positive if and only if 0 < a < 1 / 2 . Positivity and convergence of n 0 d n are equivalent.
It can be checked that, for all n 0 , the following is the case.
1 a n 2 a d n a n = 1 2 a
Therefore, a n depends on d n , and the following is the case.
d n + 1 = a 2 d n 2 a d n + ( a d n ) 2 + ( 1 2 a )
This function relating d n and d n + 1 is called the rate of convergence in [28].
Thus, we obtain the following.
n = 0 + d n = 1 1 2 a a = : s
Now, this theorem provides semilocal conditions for the convergence of Newton’s method in Banach spaces ( ( E , · ) , ( V , · ) ).
Theorem 3.
Given a function F : E V that is at least twice differentiable, x 0 E such that F ( x 0 ) is invertible, and there exists positive constants K 2 , B , η verifying:
( i ) a = K 2 B η 1 / 2 ( ii ) F ( x ) K 2 f o r a l l x E ( iii ) F ( x 0 ) 1 B ( iv ) F ( x 0 ) 1 F ( x 0 ) η
Then, Newton’s method ( x n + 1 = x n F ( x n ) 1 F ( x n ) ) converges to a root, x , of F ( x ) and the following inequalities hold:
1. 
F ( x n ) 1 a n B , for all n 0 ;
2. 
x n + 1 x n d n η , for all n 0 ;
3. 
x x 0 s η ;
4. 
x x n ( s k = 0 n 1 d k ) η , for all n 1 .
{ a n , d n } is the system generated by (14) from a. Furthermore, x is the only root of F ( x ) in the ball B ( x 0 , s η ) , where the following is the case.
s : = 1 + 1 2 a a
The proof of this theorem is a direct consequence of the adimensional polynomial q ( d ) = ( a / 2 ) s 2 s + 1 related to the majorizing one: p ( t ) = ( K 2 / 2 ) t 2 t / B + η / B . Under the hypothesis of the theorem, q ( s ) has two positive roots, s s , and Newton’s sequence is as follows: s 0 = 0 , s n + 1 = s n q ( s n ) / q ( s n ) converges monotonically to s and verifies s n + 1 s n = d n and 1 / q ( s n ) = a n .
The adimensional polynomial explains also why (15) and (17) are verified: The value q ( s ) 2 4 q ( s ) q ( s ) is an invariant for any quadratic polynomial, and its value is the discriminant, which in this case is 1 2 a . Moreover, s = lim { s n } = n d n .
On the other side, it is easy to check that if a 0 :
k = 0 n 1 d k = 1 a ( 1 1 a n )
given that 1 / a n = q ( s n ) = a s n 1 and s n = k = 0 n 1 d k .
As we said in the Introduction, assymptotical constants of convergence from classical Q- and R-orders of convergence depend on dimensionality and scaling of the equation. In order to analyze the speed of convergence, adimensional functions and sequences provide a new characterization of order of convergence, which is stronger than classical R-order but weaker than Q-order, with the advantage that their definitions are adimensional and independent from scales.
Definition 5.
Let us assume that ( E , · ) is a Banach space and { x n } n E . Then, p is theadimensional Q-order(respectively,adimensional R-order) of { x n } n if there exists an adimensional scalar sequence { d n } n with Q-order (respectively, R-order) p and a constant η with [ η ] = [ x 0 ] such that the following is the case.
x n + 1 x n d n η
As expected, R-order and adimensional R-order (AR-order) are equivalent (R-order itself is always adimensional), but adimensional Q-order (AQ-order) lies between classic Q and R-order (Q-order implies AQ-order, and AQ-order implies R-order). In fact, AQ-order behaves as classic Q-order but is applied to relative instead of absolute errors.
From (5) and 2., it is deduced that Newton’s method has second AR-order when the root is simple ( a < 1 / 2 ), and first AR-order when a = 1 / 2 . Actually, discriminant 1 2 a marks the boundary between fast and slow convergence [28].
This is an example illustrating how adimensional polynomials and dimensional analysis make easier the study of the iterative method. In the next paragraph, we proceed further in order to devise robust methods.

6. Steffensen’s Method

As it was defined above, Steffensen’s method for a scalar function f ( x ) is defined, from x 0 , by the following iteration.
x n + 1 = x n f [ x n + f ( x n ) , x n ] 1 f ( x n ) for all n 0
We consider nodes x n and x n + f ( x n ) . The key idea is that if the root x is simple, by the mean value theorem, f ( x n ) = O ( x n x ) ; therefore, f [ x n , x n + f ( x n ) ] is a first-order approximation to f ( x n ) . In fact, when it converges to a simple root, Steffensen’s is a second-order method. In addition to sharing with the secant method and the fact that no derivatives are needed, Steffensen’s method has the advantage of a greater convergence order and no additional storage required.
As we said before, a large number of papers have appeared that are related to Steffensen’s method, but it is not widely used because of its scale dependence and dimension inconsistency. From a dimensional point of view, [ x ] and [ f ( x ) ] are different, and it is not a good idea to add both terms. Even in those cases where [ x ] = [ f ] , a different rescaling of the variable c x and the function k f makes the sum c x + k f ( x ) unrelated to the original x + f ( x ) . In Section 3, some of the best known proposals in the literature to overcome possible troubles can be found.
Adimensional functions provide a different point of view, which will be explained in the rest of this paper. We begin by finding a system of bounds for Steffensen’s, which is similar to that of Newton’s.

6.1. A System of a Priori Error Estimates for Steffensen’s Method

Let us try to understand the behaviour of Steffensen’s method with a simple example. We consider the adimensional quadratic polynomial q ( s ) = ( a / 2 ) s 2 s + 1 and we remind the reader that this polynomial does not have scale or dimensional difficulties. For these polynomials, Steffensen’s converges at least as fast as Newton’s, and both are of second-order.
Theorem 4.
If 0 a 1 / 2 , s 0 = t 0 = 0 and we denote the sequence of Steffensen’s approximations as follows:
s n + 1 = s n q [ s n , s n + q ( s n ) ] 1 q ( s n )
then { s n } converges monotonically to s . Furthermore, if we denote the following Newton’s approximations:
t n + 1 = t n q ( t n ) q ( t n )
it is verified that, for all n 0 , 0 t n s n s .
Proof. 
First, we observe that Steffensen’s iteration is well-defined. For all 0 < s < s , we have the following:
q [ s , s + q ( s ) ] = q ( s + q ( s ) ) q ( s ) q ( s ) = q ( s ) + a 2 q ( s ) = : g ( s )
where g ( s ) is an increasing function in [ 0 , s ] .
g ( s ) = q ( s ) + a 2 q ( s ) = a + a 2 ( a s 1 ) = a 2 ( 1 + a s ) > 0
Moreover, 1 + a / 2 = g ( 0 ) < g ( s ) < g ( s ) = ( a / 2 ) q ( s ) 0 .
Therefore, q [ s , s + q ( s ) ] 0 , and it is invertible.
On the other side, denoting by Δ S Steffensen’s correction and by Δ N for Newton’s one, we have the following.
Δ S ( s ) = q [ s , s + q ( s ) ] 1 q ( s ) = q ( s ) q ( s ) + a 2 q ( s ) q ( s ) q ( s ) = Δ N ( s )
Δ S ( s 0 ) = Δ S ( 0 ) = 1 1 a 2 1 1 2 a a = s = s s 0
Δ S ( s ) = 0
Differentiating Δ S ( s ) with respect to s, we obtain the following.
Δ S ( s ) = q ( s ) 2 a q ( s ) ( q ( s ) + a 2 q ( s ) ) 2
As the following is the case:
q ( s ) 2 a q ( s ) = ( q ( s ) 2 2 a q ( s ) ) + a q ( s ) = ( 1 2 a ) + a q ( s ) 0
Δ S ( s ) is decreasing and positive, and if 0 s s , then s + Δ S ( s ) s .
As Δ S ( s ) 0 , sequence { s n } n is increasing and bounded; hence, it converges. The limit is, obviously, s .
On the other side, monotony and (19) prove 0 t n s n for all n 0 . □
However, if we take η > 2 / a and K 2 = a / η , polynomial p ( t ) = ( K 2 / 2 ) t 2 t + η has the same adimensional form q ( s ) = ( a / 2 ) s 2 s + 1 , but p [ 0 , p ( 0 ) ] > 0 and, if t 0 = 0 , then t 1 < t 0 and the sequence of Steffensen’s iterates does not converge. If η = 2 / a , it becomes worse, because p [ 0 , p ( 0 ) ] = 0 and as it is not invertible; the sequence is not even defined.
These phenomena do not happen in scale and dimensional independent methods such as Newton’s or secant, but they may become a real problem in Steffensen’s.
We construct now a system of estimates for Steffensen’s applied to adimensional polynomials. Given a positive a 0 and c 0 = a 0 = 1 , r 0 = 0 , we define for all n 0 the following.
( i ) b n = a n 1 a 2 a n c n ( ii ) d n = b n c n ( iii ) a n + 1 = a n 1 a a n d n ( iv ) c n + 1 = a 2 2 d n 2 ( r n + 1 2 c n ) = 1 a n + 1 + a 2 c n a 2 d n 2 ( v ) r n + 1 = r n + d n
Now, our system consists of five sequences: two of them fundamental, { a n } n and { c n } n , and the other three are auxiliaries only for simplification.
The following theorem states that the optimality of the system holds.
Theorem 5.
Given q ( s ) , adimensional polynomial, and the sequence { s n } n 0 generated by s 0 = 0 , s n + 1 = s n q [ s n , s n + q ( s n ) ] , n 0 , then we have the following.
( I ) | q ( s n ) 1 | = a n . ( II ) | q [ s n , s n + q ( s n ) ] 1 | = b n . ( III ) | q ( s n ) | = c n . ( IV ) | s n + 1 s n | = d n . ( V ) s n = r n .
Proof. 
By induction, if (III) and (IV) are valid for n, then the following is the case.
( I ) q ( s n + 1 ) = q ( s n ) + a ( s n + 1 s n ) implies
1 q ( s n + 1 ) = 1 q ( s n ) + a ( s n + 1 s n ) = 1 1 a n + a d n = a n 1 a a n d n = a n + 1
( II ) q [ s n , s n + q ( s n ) ] = q ( s n ) + a 2 q ( s n ) = 1 a n + a 2 c n
Then, we obtain the following.
q [ s n , s n + q ( s n ) ] 1 = a n 1 a 2 a n c n = b n
( III ) q ( s n + 1 ) = q ( s n ) + q ( s n ) ( s n + 1 s n ) + a 2 ( s n + 1 s n ) 2
The following expression must be 0.
q ( s n ) + q [ s n , s n + q ( s n ) ] ( s n + 1 s n ) = q ( s n ) + q ( s n ) ( s n + 1 s n ) + a 2 q ( s n ) ( s n + 1 s n )
We have the following.
q ( s n + 1 ) = a 2 d n ( d n c n )
From (20), q ( s n ) = q ( s n ) d n ( a / 2 ) q ( s n ) d n .
Moreover, (21) becomes the following:
q ( s n + 1 ) = a 2 d n 2 ( 1 + q ( s n ) + a 2 q ( s n ) ) = a 2 d n 2 ( ( q ( s n ) q ( 0 ) ) + a 2 c n )
because q ( 0 ) = 1 . Taking into account that s n = r n , we finally approach the following.
q ( s n + 1 ) = a 2 2 d n 2 ( s n + 1 2 c n ) = c n + 1
The rest of the theorem is obvious. □
System (I)–(V) show that sequence { d n } n (similarly, { c n } n ) is second AQ-order when 1 2 a > 0 and first AQ-order when 2 a = 1 , as it happened with Newton. The rate of convergence, however, is not so simple. It is long and tedious but involves not so complicated algebraic manipulations, and the invariance of ( 1 / a n 2 ) 2 a c n = 1 2 a provides the following.
d n + 1 = a 2 · ( 1 + α n ) + a 2 α n d n 1 + a 2 d n α n 1 a 2 d n 1 + a 2 d n · d n 2
with α n = a d n 1 + a 2 d n + a d n 1 + a 2 d n 2 + ( 1 2 a )
As in Newton, 1 2 a is the boundary separating low and high speed of convergence. However, while this region of high-order convergence is reached by ( a d n ) 2 in Newton, in Steffensen’s, it is reached by ( a d n / ( 1 + ( a / 2 ) d n ) ) 2 . This shows a faster (although with the same order) convergence of Steffensen’s versus Newton’s method.
Now, we are ready to introduce the system of error bounds for Steffensen’s method in Banach spaces. First, we must remind the reader which kind of operator is the divided differences in Banach spaces. As opposed to a scalar, there is no unique divided difference operator for vectors. The characterization is based on interpolation.
Definition 6.
Given F : E V , x , y E , a linear operator H : E V is a divided difference of F with nodes x , y if H ( x y ) = F ( x ) F ( y ) . We denote F [ x , y ] : = H .
If F is twice differentiable, with F ( ξ ) K 2 for every ξ E , the following is verified.
F [ x , y ] F ( x ) K 2 x y
Further insight on divided differences in Banach spaces can be found in [28]. There, we can find one example of differentiable divided difference as follows.
F [ x , y ] = 0 1 F ( x + θ ( y x ) ) d θ
In practice, this integral is not explicitly evaluated, and divided differences are obtained from scalar-divided differences.
The key point of the rest of this paragraph is to consider Steffensen’s method applied to G, the adimensional form of F, defined in Section 2.2.
As the sum y + G ( y ) is not only consistent but also independent from scales and dimensions, two of the main concerns about Steffensen’s are avoided. We can consider G as a preprocessing step of the nonlinear operator and, certainly, its performance is similar to that of preprocessing of matrices to solve linear systems: it reduces computational costs after small additional work before starting the method.
Instead of obtaining a root of F, we must find a root of G. Any method is admissible. Even if the method is scale and dimensional independent, such as Newton’s, adimensioning the function may be useful to regularize the system, as we saw in the square root example in Section 2.1. For Steffensen’s, adimensioning is indeed very useful. For example, it is easier to ensure convergence. In addition, the error bounds are more accurate in the sense that they are optimal (optimality does not always guarantee better estimates, but under certain conditions they cannot be improved). Optimal bounds have not been found in general for the secant method. We will prove that the system of estimates as defined above provides optimal error bounds for Steffensen’s method applied to adimensional equations.

6.2. Adimensional Scale Invariant Steffensen’s Method (ASIS Method) and Its Convergence

ASIS (adimensional scale invariant Steffensen’s) is the method where Steffensen’s method applied to the adimensional form of the function.
By the linear transformation from F to G, as it was explained in (4) and y 0 in (5), we define the following for n 0 .
y n + 1 = y n G [ y n , y n + G ( y n ) ] 1 G ( y n )
x n = F ( x 0 ) F ( x 0 ) 1 y n
Theorem 6.
Let us assume that there exists G in V and G ( ξ ) a for ξ V and a 1 / 2 . Then, from the system of bounds (i)–(v) in the last subsection, we have the following.
( a ) G ( y n ) 1 a n . ( b ) G [ y n , y n + G ( y n ) ] 1 b n . ( c ) G ( y n ) c n . ( d ) y n + 1 y n d n . ( e ) y n + 1 y 0 r n .
Proof. 
It is similar to that of theorem 5 by using the Banach invertibility criterion.
First, we note that G ( y ) a for all y.
By induction, if (a) and (c) hold for n = 0 , then, for n 0 , we have the following:
G ( y n ) G [ y n , y n + G ( y n ) ] a 2 G ( y n ) < 1 G ( y n ) 1
which is the last inequality by the induction hypothesis. Then, by the Banach invertibility criterion, G [ y n , y n + G ( y n ) ] is invertible and the following is the case:
G [ y n , y n + G ( y n ) ] 1 G ( y n ) 1 1 a 2 G ( y n ) · G ( y n ) 1
and (b) is verified.
(d) and (e) are evident.
By the same Banach criterion, the following is the case.
G ( y n + 1 ) G ( y n ) a y n + 1 y n < 1 G ( y n ) 1
Thus, G ( y n + 1 ) is invertible and (a) is verified because the following is the case.
G ( y n + 1 ) 1 G ( y n ) 1 1 G ( y n ) 1 · y n + 1 y n
Finally, from the following:
G ( y n ) + G [ y n , y n + G ( y n ) ] ( y n + 1 y n ) = 0
G [ y n , y n + G ( y n ) ] = G ( y n ) + G ( ξ ) G ( y n ) , for ξ V ˜
and the following:
G ( y n + 1 ) G ( y n ) G ( y n ) ( y n + 1 y n ) a 2 y n + 1 y n 2
we obtain the following result.
G ( y n ) a 2 y n + 1 y n ( y n + 1 y n ) G ( y n ) ( y n + 1 y n )
a 2 y n + 1 y n 2 I G ( y n ) a 2 d n 2 G ( y n ) + G ( y 0 )
Moreover, it follows from the following.
G ( y n ) + G ( y 0 ) = G ( y 0 ) G ( y n ) G ( ξ ) 2 ( y n + 1 y n )
a 2 ( a y n y 0 + a 2 y n + 1 y n )
The estimates of this theorem are optimal, because they are reached for the adimensional polynomial q ( s ) = ( a / 2 ) s 2 s + 1 .
As a consequence of Theorem 6, the following main result proves the convergence of Steffensen’s method in Banach spaces.
Theorem 7.
Let F : E V be a twice differentiable nonlinear function, x 0 E such that F ( x 0 ) 0 and F ( x 0 ) are invertible. Let K 2 , B , η > 0 verifying F ( x 0 ) η / B , F ( x 0 ) 1 B , and, for all x E , F ( x ) K 2 .
If a = K 2 B η 1 / 2 , then the sequence generated by the ASIS method converges to a root y , G ( y ) = 0 . Furthermore, if for any n 0 , x n = ( η / B ) F ( x 0 ) 1 y n , the following inequalities hold.
x n + 1 x n d n η
x x n ( s s n ) η
F ( x n ) i s i n v e r t i b l e a n d F ( x n ) 1 a n B
This theorem improves previous results. So far, sufficient conditions in literature implied the existence of a strictly positive constant h > 0 depending on K 2 , B , η such that a ( 1 + h ) < 1 / 2 . These conditions are more restrictive than those of Newton and, hence, than those we propose in Theorem 7, where a 1 / 2 suffices to ensure convergence.
We remark that not only invertibility but the existence of F [ x n , x n + F ( x n ) ] is not ensured in general for the classic Steffensen’s method for a 1 / 2 ; even x n + F ( x n ) may not be evaluated. However, differences of the adimensional form do exist and they are invertible.

7. Numerical Examples

Our first example is a simple scalar equation:
f 1 ( x ) = exp ( x 1 ) 1 = 0
with the obvious solution, x = 1 .
We want to compare the performance of Newton’s, Steffensen’s, and ASIS methods. Both x and f ( x ) are real.
This is a case where Kantorovich conditions are more restrictive than the actual Newtonian iteration. Kantorovich ( a 1 / 2 ) guarantees convergence for x 0 1 log ( 1 + 3 2 ) 0.6881 . However, the convexity of f 1 ( x ) enlarges the region of convergence, as we will see.
We start Newton’s and Steffensen’s methods from x 0 = 0 . This initial guess does not verify Kantorovich conditions, but both methods converge. In the figure, we represent the logarithms of the errors, where it can be seen that Newton’s converges faster than Steffensen’s. Although both are second order, Steffensen’s first iterations are clearly slower, because it takes some steps to enter into the second-order region. Steffensen’s errors are not only greater than those of Newton’s, but it requires some more iterations in order to obtain computational accuracy.
The setting varies for ASIS. The adimensional version of f ( x ) is as follows:
g ( s ) = 1 exp ( 1 ) 1 ( exp ( ( e 1 ) s 1 ) 1 )
and x = ( e 1 ) s . Therefore, s 0 = 0 . In Figure 1, we represent the logarithms of the errors | 1 ( e 1 ) s n | . It can be seen that these errors are smaller than Newton’s. That is, in this case, ASIS does not only improves Steffensen’s but also Newton’s, as expected from the theoretical results above.
A second example shows the consequences of rescaling the equation. We consider the following.
f 2 ( x ) = e 2 x 1 1 = 0
This function is obtained by the following: f 2 ( x ) = f 1 ( 2 x ) . The root of f 2 ( x ) , x = 1 2 is 1 2 times the root of f 1 ( x ) . Hence, Newton’s errors for f 2 are one-half the errors of f 1 due to the invariance with respect to scales in the variable. The adimensional version of f 2 ( x ) is exactly the same as for f 1 ( x ) and g ( s ) , and the errors of ASIS are also one-half of the errors obtained for f 1 ( x ) . As a consequence, ASIS and Newton’s behave in the same manner as in the first example (ASIS converges faster than Newton’s).
However, Steffensen’s applied to f 2 ( x ) needs 3705 iterations to obtain an error that is less than 0.5 . We remark that 0.5 is the initial error. Although it proceeds faster after that (iteration 3716 provides an approximation less than 10 16 , double precision accuracy), it is clearly an unpractical method in this case. In Figure 2, we show the logarithms of the errors of classic Steffensen’s (not adimensional): In the first 3700 iterations, the errors look similar to a constant, and after that, it behaves as a second-order method. Its region of quadratic convergence is not so small, but it takes a long period to reach it from the initial guess. It can become worse for finer rescalings: The method can blow up due to roundoff errors for f ( x ) = f 1 ( c x ) for large c. This behavior is normal for Steffensen’s method: its convergence and its speed of convergence, when it converges, depend strongly on the scaling of both the variable and the own function.
The last example in this section is a system of two nonlinear equations with two variables.
F 1 ( x , y ) = 4 x ( y x 2 + 2 ) 2 ( 1 x ) = 0 F 2 ( x , y ) = 2 ( y x 2 + 2 ) = 0
This system is obtained from an optimization problem by unrestricted minimization of the following function:
F ( x , y ) = ( y x 2 + 2 ) 2 + ( 1 x 2 ) 2
which is the Euclidean norm of a quadratic curve.
Starting from x 0 = y 0 = 0 , both Newton and Steffensen converge. As it happens with our first one dimensional equation, Steffensen’s is clearly slower than Newton. Among the possible choices of the divided difference operator, we used the one in [28].
Once again, we try the adimensional version of the system, which is normalized via the Euclidean norm, and we obtain the following:
g 1 ( s , t ) = 2 3 s ( 5 t 5 9 s 2 + 2 ) + ( 1 + 5 3 s ) = 0 g 2 ( s , t ) = ( t 5 9 s 2 + 2 5 ) = 0
with the change of the following variables.
s = 3 5 x ; t = 1 5 y
The fact that Δ F ( 0 , 0 ) is diagonal simplifies the change of variables (which becomes a diagonal transformation). On the other side, some good features of the original system, such as the symmetry of the gradient, are lost in the adimensional system (therefore, the divided differences operator is no longer quasi-symmetric.
Its behavior is similar to that of the scalar case: ASIS is faster than Newton’s, and Steffensen’s is clearly slower, as observed in Figure 1.
As systems with greater order behave in a similar manner (ASIS is competitive with Newton’s, even sometimes with better performance, and often overcomes stability and convergence problems arising in classic Steffensen’s method), the last example was included as a representative for the sake of simplicity.
However, for really large systems, other issues appear, such as conditioning and dependence among variables and the advantages of ASIS method are not so evident (advantages of Newton’s method are not so evident either). There, ASIS can be a supporting method for solving equations. For instance, the BFGS formula for updating gradients in quasi-Newton methods is a strategy where ASIS improves the performance of the secant method, which is traditionally used in optimization.
The increase in computational cost of ASIS versus classical Steffensen’s is due to the evaluation of preconditioning F ( x 0 ) 1 . As normalization of the adimensional form of F is not necessary, a simpler matrix B F ( x 0 ) 1 is used such that [ B ] = [ F ( x 0 ) 1 ] is an option. A good choice for B can simplify the problem, as we saw in the square root example in Section 2.2.

8. Conclusions

This paper was motivated by the study of quasi-Newton methods in optimization. It is somehow surprising that despite Steffensen’s method being a well-known method (not as popular as Newton or secant, but well known), which can be implemented without explicit knowledge of derivatives, it is not used or even proposed as a feasible method in this topic. Some insight makes it clear that the weakness of Steffensen’s in scalar equations increases when dealing with equations in several variables. Nevertheless, most of these weaknesses are related to dimensional troubles. Independence from dimensions makes Steffensen’s method (and other methods) more robust and liable.
After some considerations about dimensionality, in this paper, we introduced the adimensional form of functions and polynomials in order to analyze the convergence of iterative methods. These adimensional forms make possible the construction of methods based on classical ones that are adapted from scale dependent methods and overcome all concerns due to dimensionality and scaling. Here, we produced the ASIS method based on the classic Steffensen’s method.
Semilocal conditions are settled in order to ensure convergence of ASIS. The ASIS method has the same convergence conditions as Newton’s method. This is a really interesting result, because, in general, conditions so far are more restrictive for classic Steffensen’s than for Newton’s. The obtained estimates are optimal in the sense that adimensional polynomials reach estimates. This optimality is also an important improvement with respect to other estimators in the literature, which are not optimal. Optimality improves the theoretical region of convergence. In fact, ASIS is at least as fast as Newton’s for polynomials.
These semilocal conditions are not only introduced for theoretical knowledge but also for practical reasons: hybrid methods using different algorithms that slower for the initial iterations with the goal of obtaining approximations into the regions of convergence and faster ones when the approximations are close enough to the exact solution are often devised. In order to obtain good performance, well fitted estimators are needed.
The price to pay in order to obtain adimensional functions is an increase in computational cost due to a change of variable that must be implemented. However, this cost can be reduced because, in general, F ( x 0 ) 1 is not an accurate requirement, although the analysis of softening conditions is outside the scope of this paper. It is remarkable that even random numerical errors do have dimension. Randomness is a numerical condition, but it affects the measure of the parameters.
Finally, we remark that some conditions in the theorems in this paper can be relaxed. Not only the exact evaluation of the inverse of F ( x 0 ) but also the conditions on the second derivative can be relaxed, because it is enough to ask for Lipschitz continuity of the first derivative or some other weaker conditions appearing often in the literature.
Last but not least, ASIS is a good alternative method for quasi-Newton methods in optimization because of its interpolating conditions (no need to evaluate derivatives), robustness, and speed of convergence.
Some features of the method, such as difficulties from ill conditioning, or large systems are actually under research, and the results so far are promising.
The techniques developed here can be extended and generalized to other kind of algorithms (numerical solution of differential and integral equations, regression, interpolation, etc.).
Summing up: although it is often disregraded, dimensional analysis is a powerful tool that helps to improve the performance of algorithms and helps in obtaining a better understanding of their theoretical properties. Adimensional functions rid us of the drawbacks induced by heterogeneous data.

Funding

Funded by ESI International Chair@CEU-UCH, Universidad Cardenal Herrera-CEU.

Acknowledgments

First and foremost, this paper and its author are greatly indebted to the comments, suggestions, and encouragement provided by Antonio Marquina, whose help improved both the paper and the author.

Conflicts of Interest

The author declares no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Meinsma, G. Dimensional and scaling analysis. SIAM Rev. 2019, 61, 159–184. [Google Scholar] [CrossRef] [Green Version]
  2. Buckingham, E. On physically similar systems: Illustrations of the use of dimensional equations. Phys. Rev. 1914, 4, 345–376. [Google Scholar] [CrossRef]
  3. Potra, F.A. On Q-order and R-order of convergence. J. Optim. Theory Appl. 1989, 63, 415–431. [Google Scholar] [CrossRef]
  4. Strang, G. Linear Algebra and Learning from Data; Wellesley-Cambridge Press: Wellesley, MA, USA, 2019. [Google Scholar]
  5. Higham, N.J. Newton’s method for the matrix square root. Math. Comp. 1986, 46, 537–549. [Google Scholar]
  6. Higham, N.J. Stable iterations for the matrix square root. Numer. Algorithms 1997, 15, 227–242. [Google Scholar] [CrossRef]
  7. Ortega, J.M.; Rheinboldt, W.C. Iterative Solution of Nonlinear Equations in Several Variables; Academic Press: New York, NY, USA, 1970. [Google Scholar]
  8. Ostrowski, A.M. Solution of Equations in Euclidean and Banach Spaces, 3rd ed.; Academic Press: New York, NY, USA, 1970. [Google Scholar]
  9. Kantorovich, L.V.; Akilov, G.P. Functional Analysis in Normed Spaces; Pergamon: New York, NY, USA, 1964. [Google Scholar]
  10. Dennis, J.E. On the Kantorovich Hypothesis for Newton’s method. SIAM J. Numer. Anal. 1969, 6, 493–507. [Google Scholar] [CrossRef]
  11. Gragg, W.B.; Tapia, R.A. Optimal error bounds for the Newton-Kantorovich theorem. SIAM J. Numer. Anal. 1974, 11, 10–13. [Google Scholar] [CrossRef]
  12. Tapia, R.A. The Kantorovich theorem for Newton’s method. Am. Math. Mon. 1971, 78, 389–392. [Google Scholar] [CrossRef]
  13. Argyros, I.K.; González, D. Extending the applicabiility of Newton’s method for k-Fréchet differentiable operators in Banach spaces. Appl. Math. Comput. 2014, 234, 167–178. [Google Scholar]
  14. Ezquerro, J.A.; Hernández, M.A. Mild Differentiability Conditions for Newton’s Method in Banach Spaces; BirkhAuser: Basel, Switzerland, 2020. [Google Scholar]
  15. Ozban, A.Y. Some new variants of Newton’s method. Appl. Math. Lett. 2004, 17, 677–682. [Google Scholar] [CrossRef] [Green Version]
  16. Weerakoon, S.; Fernando, T.G.I. A variant of Newton’s method with accelerated third order convergence. Appl. Math. Lett. 2000, 13, 87–93. [Google Scholar] [CrossRef]
  17. Nocedal, J.; Wright, S. Numerical Optimization; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
  18. Dembo, R.S.; Eisenstat, S.C.; Steihaug, T. Inexact Newton methods. SIAM J. Numer. Anal. 1982, 19, 400–408. [Google Scholar] [CrossRef]
  19. Ye, H.; Luo, L.; Zhang, Z. Approximate Newton methods and their local convergence. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Volume 70, pp. 3931–3933. [Google Scholar]
  20. Amat, S.; Busquier, S.; Candela, V. A class of quasi-Newton generalized Steffensen methods on Banach spaces. J. Comp. Appl. Math. 2002, 149, 397–406. [Google Scholar] [CrossRef] [Green Version]
  21. Argyros, I.K.; Hernández-Verón, M.A.; Rubio, M.J. Convergence of Steffensen’s method for non-differentiable operators. Numer. Algorithms 2017, 75, 229–244. [Google Scholar] [CrossRef]
  22. Chen, D. On the convergence of a class of generalized steffensen’s iterative procedures and error analysis. Int. J. Comput. Math. 1990, 31, 195–203. [Google Scholar] [CrossRef]
  23. Jain, P. Steffensen type methods for solving non-linear equations. Appl. Math. Comput. 2007, 194, 527–533. [Google Scholar] [CrossRef]
  24. Johnson, L.W.; Scholz, D.R. On Steffensen’s method. SIAM J. Numer. Anal. 1968, 5, 296–302. [Google Scholar] [CrossRef]
  25. Gander, W. On Halley’s iteration method. Am. Math. Mon. 1985, 92, 131–134. [Google Scholar] [CrossRef]
  26. Candela, V.; Marquina, A. Recurrence relations for rational cubic methods I: The Halley method. Computing 1990, 44, 169–184. [Google Scholar] [CrossRef]
  27. Candela, V.; Marquina, A. Recurrence relations for rational cubic methods II: The Chebyshev method. Computing 1990, 45, 355–367. [Google Scholar] [CrossRef]
  28. Potra, F.A.; Ptak, V. Nondiscrete Induction and Iterative Processes; Pitman: Boston, MA, USA, 1984. [Google Scholar]
Figure 1. Logarithms of errors: Newton, blue solid line, red ‘o’; Steffensen, black ‘.-.’, red ‘+’; ASIS, red ‘–’, green ‘*’. Example 1 (scalar) (left); Example 3 (system) (right).
Figure 1. Logarithms of errors: Newton, blue solid line, red ‘o’; Steffensen, black ‘.-.’, red ‘+’; ASIS, red ‘–’, green ‘*’. Example 1 (scalar) (left); Example 3 (system) (right).
Mathematics 10 00911 g001
Figure 2. Example 2: Logarithms of Steffensen’s method for f 2 ( x ) , red solid line, blue ‘*’ (left); iterations from 3700 to 3716, blue solid line, red ‘*’ (right).
Figure 2. Example 2: Logarithms of Steffensen’s method for f 2 ( x ) , red solid line, blue ‘*’ (left); iterations from 3700 to 3716, blue solid line, red ‘*’ (right).
Mathematics 10 00911 g002
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Candela, V.F. Dimensionality of Iterative Methods: The Adimensional Scale Invariant Steffensen (ASIS) Method. Mathematics 2022, 10, 911. https://doi.org/10.3390/math10060911

AMA Style

Candela VF. Dimensionality of Iterative Methods: The Adimensional Scale Invariant Steffensen (ASIS) Method. Mathematics. 2022; 10(6):911. https://doi.org/10.3390/math10060911

Chicago/Turabian Style

Candela, Vicente F. 2022. "Dimensionality of Iterative Methods: The Adimensional Scale Invariant Steffensen (ASIS) Method" Mathematics 10, no. 6: 911. https://doi.org/10.3390/math10060911

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop