Next Article in Journal
Next-Generation Sequencing Data-Based Association Testing of a Group of Genetic Markers for Complex Responses Using a Generalized Linear Model Framework
Next Article in Special Issue
Representations by Beurling Systems
Previous Article in Journal
Performance Analysis of the CHAID Algorithm for Accuracy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On the Rate of Convergence of Greedy Algorithms

1
Steklov Mathematical Institute of Russian Academy of Sciences, 117312 Moscow, Russia
2
Department of Mechanics and Mathematics, Lomonosov Moscow State University, 119991 Moscow, Russia
3
Moscow Center of Fundamental and Applied Mathematics, 119333 Moscow, Russia
4
Department of Mathematics, University of South Carolina, Columbia, SC 29208, USA
Mathematics 2023, 11(11), 2559; https://doi.org/10.3390/math11112559
Submission received: 8 May 2023 / Revised: 27 May 2023 / Accepted: 31 May 2023 / Published: 2 June 2023
(This article belongs to the Special Issue Fourier Analysis, Approximation Theory and Applications)

Abstract

:
In this paper, a new criterion for the evaluation of the theoretical efficiency of a greedy algorithm is suggested. Using this criterion, we prove some results on the rate of convergence of greedy algorithms, which provide expansions. We consider both the case of Hilbert spaces and the more general case of Banach spaces. The new component of this paper is that we bound the error of approximation by the product of two norms—the norm of f and the A 1 -norm of f. Typically, only the A 1 -norm of f is used. In particular, we establish that some greedy algorithms (Pure Greedy Algorithm (PGA) and its modifications) are as good as the Orthogonal Greedy Algorithm (OGA) in this new sense of the rate of convergence, while it is known that the PGA is much worse than the OGA in the standard sense. Our new results provide better bounds for the accuracy than known results in the case of small f .

1. Introduction

This paper is devoted to the theoretical study of the efficiency of some greedy algorithms. Greedy algorithms are very useful in applications; in particular, adaptive methods are used in PDE solvers, and sparse approximation is used in image/signal/data processing, as well as in the design of neural networks, and in convex optimization. This fact motivated deep theoretical study of a variety of greedy algorithms. In this paper, we study two the most popular greedy algorithms—the Pure Greedy Algorithm (PGA) and the Orthogonal Greedy Algorithm (OGA) and their natural modifications. The reader can find other important greedy algorithms in the book [1] and in the papers (Regularized Orthogonal Matching Pursuit) [2], (Compressive Sampling Matching Pursuit) [3], (Subspace Pursuit) [4], (Rescaled Pure Greedy Algorithm) [5], and (Biorthogonal Greedy Algorithm) [6]. The reader can find some results on the application of greedy algorithms in convex optimization in the papers [7,8,9,10,11,12,13,14].
There are different criteria for the theoretical efficiency of greedy algorithms. All of them are based on the accuracy of the algorithm (the error after the mth iteration of the algorithm) and take different forms. One of those criteria uses the worst error over a given class of incomes (elements, which we approximate by the algorithm). We discuss a variant of this criterion in this paper. There is another, a more delicate criterion, which is based on the Lebesgue-type inequalities for individual elements. We do not discuss this criterion here. The reader can find a survey of the corresponding results in [15], Ch. 8. We now proceed to a detailed discussion of our results.
Let us begin with a general description of the problem. Let X be a Banach space with the norm · X and Y X be a subspace of X with a stronger norm f Y f X , f Y . Consider a homogeneous approximation operator (linear or nonlinear) G : Y X , G ( a f ) = a G ( f ) , f Y , a R , and the error of approximation:
e ( B Y , G ) X : = sup f B Y f G ( f ) X , B Y : = { f : f Y 1 } .
Then, for any f Y , we have:
f G ( f ) X e ( B Y , G ) X f Y .
The characteristic e ( B Y , G ) X plays an important role in approximation theory, with many classical examples of spaces X and Y; for instance, X = L p and Y is one the smoothness spaces such as Sobolev, Nikol’skii, or Besov space.
In this paper, we focus on the following version of the inequality (1): find the best γ ( α , G , X , Y ) such that the inequality:
f G ( f ) X γ ( α , G , X , Y ) f X 1 α f Y α , α [ 0 , 1 ] ,
holds for all f Y . Clearly, γ ( 1 , G , X , Y ) = e ( B Y , G ) X . Additionally, it is clear that under assumption: f G ( f ) X f X , f Y , we obtain the trivial bound:
γ ( α , G , X , Y ) e ( B Y , G ) X α .
In this paper, we discuss greedy approximation with respect to a given dictionary and prove some nontrivial inequalities for γ ( α , G , X , Y ) both in the case of X being a Hilbert space and X being a Banach space. In particular, we establish that some greedy algorithms (Pure Greedy Algorithm (PGA) and its generalizations) are as good as the Orthogonal Greedy Algorithm (OGA) in the sense of inequality (2), while it is known that the the PGA is much worse than the OGA in the sense of the inequality (1) (for definitions and precise formulations, see below).
Let H be a real Hilbert space with the inner product · , · and norm · . We say that a set of elements (functions) D from H is a dictionary (symmetric dictionary) if each g D has norm one ( g = 1 ) and span ¯ D = H . In addition, we assume for convenience the property of symmetry:
g D implies g D .
We define the Pure Greedy Algorithm (PGA). We describe this algorithm for a general dictionary D . If f H , we let g ( f ) D be an element from D which maximizes f , g . We assume for simplicity that such a maximizer exists; if not, suitable modifications are necessary (see Weak Greedy Algorithm below) in the algorithm that follows. We define:
G ( f , D ) : = f , g ( f ) g ( f ) and R ( f , D ) : = f G ( f , D ) .
Pure Greedy Algorithm (PGA). We define f 0 : = f and G 0 ( f , D ) : = 0 . Then, for each m 1 , we inductively define
G m ( f , D ) : = G m 1 ( f , D ) + G ( f m 1 , D ) ,
f m : = f G m ( f , D ) = R ( f m 1 , D ) .
Note that for a given element f, the sequence { G m ( f , D ) } may not be unique.
This algorithm is well studied from the point of view of convergence and rate of convergence. The reader can find the corresponding results and historical comments in [1], Ch. 2. In this paper, we focus on the rate of convergence. Typically, in approximation theory, we define the rate of convergence for specific classes. In classical approximation theory, these are smoothness classes. Clearly, in the general setting with arbitrary H and D , we do not have a concept of smoothness similar to the classical smoothness of functions. It turns out that the geometrically defined class, namely, the closure of the convex hull of D , which we denote by A 1 ( D ) , is a very natural class. For each f H , we associate the following norm:
f A 1 ( D ) : = inf { M > 0 : f / M A 1 ( D ) } .
Clearly, f f A 1 ( D ) . Then, the problem of the rate of convergence of the PGA can be formulated as follows (see [1], p. 95). Find the order of decay of the sequence:
γ m ( H ) : = sup { G m ( f , D ) } , f , D f G m ( f , D ) f A 1 ( D ) ,
where the supremum is taken over all possible choices of { G m ( f , D ) } , over all elements f H , f 0 , f A 1 ( D ) < , and over all dictionaries D . This problem is a central theoretical problem in greedy approximation in Hilbert spaces and it is still open. We mention some of the known results here and refer the reader, for the detailed history of the problem, to [1], Ch. 2. It is clear that for any f H , such that f A 1 ( D ) < we have:
f G m ( f , D ) γ m ( H ) f A 1 ( D ) .
In this paper, we discuss the following extension of the asymptotic characteristic γ m ( H ) : for α ( 0 , 1 ] define:
γ m ( α , H ) : = sup { G m ( f , D ) } , f , D f G m ( f , D ) f 1 α f A 1 ( D ) α .
Clearly,
γ m ( 1 , H ) = γ m ( H ) , γ m ( α , H ) γ m ( β , H ) if α β .
The first upper bound on γ m ( H ) was obtained in [16]:
γ m ( H ) m 1 / 6 .
Actually, the proof in [16] (see also [1], pp. 92–93) gives:
γ m ( 1 / 3 , H ) m 1 / 6 .
We establish here the following bounds:
1 2 m α / 2 γ m ( α , H ) m α / 2 , α 1 / 3 .
Additionally, in Section 2, we find the right behavior of the asymptotic characteristic similar to γ m ( α , H ) for a more general algorithm than the PGA, namely, for the Weak Greedy Algorithm with parameter b.
It is interesting to compare the rates of convergence of the PGA and the Orthogonal Greedy Algorithm (OGA). We now give a brief definition of the OGA. We define f 0 o : = f , G 0 o ( f , D ) : = 0 and for m 1 , we inductively define G m o ( f , D ) to be the orthogonal projection of f onto the span of g ( f 0 o ) ,…, g ( f m 1 o ) and set f m o : = f G m o ( f , D ) . The analogs of the characteristics γ m ( H ) and γ m ( α , H ) for the OGA are denoted by γ m o ( H ) and γ m o ( α , H ) . The following bound is proved in [16] (see also [1], p. 93):
γ m o ( H ) m 1 / 2 .
It is known (see [17]) that γ m ( H ) decays slower than m 0.1898 . Therefore, from the point of view of the characteristics γ m ( H ) and γ m o ( H ) , the OGA is much better than the PGA. We establish here the following bounds:
1 2 m α / 2 γ m o ( α , H ) m α / 2 , α 1 .
This means that from the point of view of the characteristics γ m ( α , H ) and γ m o ( α , H ) , the OGA is the same (in the sense of order) as PGA for α 1 / 3 . This is a very surprising fact.
We do not know if the upper bound in (4) holds for α > 1 / 3 . However, the inequality in (3) and the lower bound for the γ m ( H ) show that:
γ m ( α , H ) γ m ( H ) c m 0.1898 .
Therefore, the upper bound in (4) cannot be extended beyond α 0 : = 0.3796 .
Section 3 deals with the case of a Banach space X. The results for the Banach space case are similar to those for Hilbert spaces, but are not as sharp as their counterparts.
Novelty. In this paper, we suggest a new criterion for the evaluation of the theoretical efficiency of a greedy algorithm. The classical criterion uses the worst error (for instance, γ m ( H ) ) of approximation of elements from the class A 1 ( D ) by our algorithm (for instance, the PGA). In other words, this criterion uses the norm f A 1 ( D ) for estimating the error of approximation of f. Our new criterion uses two norms, f A 1 ( D ) and f ; more precisely, the weighted product of these norms f 1 α f A 1 ( D ) α , α [ 0 , 1 ] . The most important qualitative discovery of this paper is that the PGA and its natural modifications have the same theoretical efficiency for some α as the OGA and its modifications. It is known that in accordance with the old criterion, the PGA is much worse than the OGA.
Method. The standard way of analyzing the accuracy of a greedy algorithm is based on estimating from below the difference f m 1 f m and then solving (estimating) the corresponding recurrent inequalities. For instance, this method works very well for the OGA. In the paper [16], this standard method was modified in such a way that it allowed us to obtain some new nontrivial upper bounds for γ m ( H ) . The method from [16] simultaneously analyzes two sequences, the { f m } m = 1 and the sequence of sums of absolute values of the coefficients of the G m ( f , D ) . In this paper, we further develop the method from [16].
The results of this paper (see Theorem 2 and lower bounds in Section 2) show that our method, which is a development of the method from [16], is optimal for proving the upper bounds for γ m t , b ( α , H ) in the case α ( 2 b ) t ( 2 b ) t + 2 . It is known that the bound γ m ( H ) m 1 / 6 , which was obtained in [16], is not optimal. The bound γ m ( H ) 4 m 11 / 62 was proved in [18] by a method distinct from the one in [16]. The method from [18] was further developed in [19,20]. It would be interesting to understand if the method from [18] and its further developments allow us to prove an analog of Theorem 2 for α > ( 2 b ) t ( 2 b ) t + 2 .
Conclusion. The PGA at the mth iteration searches for an element g ( f m 1 ) , which maximizes the inner product f m 1 , g over all g D . Then, the update is very easy: f m = f m 1 f m 1 , g ( f m 1 ) g ( f m 1 ) . The OGA, like the PGA at the mth iteration, searches for an element g ( f m 1 o ) , which maximizes the inner product f m 1 o , g over all g D . However, the second steps of the PGA and OGA at the mth iteration are different—the OGA makes an orthogonal projection onto the span of g ( f 0 o ) , …, g ( f m 1 o ) . Clearly, this step of the OGA is more difficult than the corresponding step of the PGA. Moreover, it is clear from the definition of the PGA that it provides an expansion of f into a series with respect to D . The OGA does not provide an expansion. Thus, the advantage of the PGA over the OGA is that it is simpler and provides an expansion. The advantage of the OGA over the PGA is that in accordance with the old criterion, we can guarantee better accuracy for an f A 1 ( D ) . The results of this paper show that in accordance with the new criterion, the OGA does not have an advantage in the sense of accuracy for some parameters α . Similar results are obtained for the modifications of the PGA—the Weak Greedy Algorithm with parameter b in Hilbert spaces (see Section 2) and the Dual Greedy Algorithm with parameters ( t , b , μ ) in Banach spaces (see Section 3).

2. Hilbert Space: The Weak Greedy Algorithm with Parameter b

Let a sequence τ = { t k } k = 1 , 0 t k 1 , and a parameter b ( 0 , 1 ] be given. We define the Weak Greedy Algorithm with parameter b.
Weak Greedy Algorithm with parameter b (WGA( τ , b )). We define f 0 : = f 0 τ , b : = f . Then, for each m 1 , we inductively define:
(1)
φ m : = φ m τ , b D is any satisfying:
f m 1 , φ m t m sup g D f m 1 , g ;
(2)
f m : = f m τ , b : = f m 1 b f m 1 , φ m φ m ;
(3)
G m ( f , D ) : = G m τ , b ( f , D ) : = b j = 1 m f j 1 , φ j φ j .
In the case t k = t , k = 1 , 2 , , we write t in the notation instead of τ .
We proceed to the rate of convergence. The following Theorem 1 was proved in [21].
Theorem 1.
Let D be an arbitrary dictionary in H. Assume τ : = { t k } k = 1 is a nonincreasing sequence and b ( 0 , 1 ] . Then, for f A 1 ( D ) , we have:
f G m τ , b ( f , D ) e m ( τ , b ) ,
where:
e m ( τ , b ) : = 1 + b ( 2 b ) k = 1 m t k 2 ( 2 b ) t m 2 ( 2 + ( 2 b ) t m ) .
Theorem 1 implies the following inequality for any f and any D :
f G m τ , b ( f , D ) f A 1 ( D ) e m ( τ , b ) .
We now extend Theorem 1 to provide a bound for:
γ m t , b ( α , H ) : = sup D sup f A 1 ( D ) , f 0 sup G m t , b ( f , D ) f G m t , b ( f , D ) f 1 α f A 1 ( D ) α .
We prove the following Theorem 2 in the case t k = t , k = 1 , 2 , .
Theorem 2.
For any Hilbert space H, we have:
γ m t , b ( α , H ) ( 1 + m b ( 2 b ) t 2 ) α / 2 ,
provided α ( 2 b ) t ( 2 b ) t + 2 .
Proof. 
The proof of this theorem goes along the lines of the proof of Theorem 1 in [21]. Let D be a dictionary in H. We introduce some notations:
f k : = f k t , b , φ k : = φ k t , b , k = 0 , 1 , ,
a m : = f m 2 , y m : = f m 1 , φ m , m = 1 , 2 , ,
and consider the sequence { B n } defined as follows:
B 0 : = f A 1 ( D ) , B m : = B m 1 + b y m , m = 1 , 2 , .
It is clear that f n A 1 ( D ) B n , n = 0 , 1 , . By Lemma 3.5 from [16] (see also [1], p. 91, Lemma 2.17), we get:
sup g D f m 1 , g f m 1 2 / B m 1 .
From here and from the equality:
f m 2 = f m 1 2 b ( 2 b ) f m 1 , φ m 2
we obtain the following relations:
a m = a m 1 b ( 2 b ) y m 2 ,
B m = B m 1 + b y m ,
y m t a m 1 / B m 1 .
From (13) and (15), we obtain:
a m a m 1 ( 1 b ( 2 b ) t 2 a m 1 B m 1 2 ) .
Using that B m 1 B m , we derive from here:
a m B m 2 a m 1 B m 1 2 ( 1 b ( 2 b ) t 2 a m 1 B m 1 2 ) .
We shall need the following simple known Lemma (see, for example, [1], p. 91, in case C 1 = C 2 ).
Lemma 1.
Let { x m } m = 0 be a sequence of non-negative numbers satisfying the inequalities:
x 0 C 1 , x m + 1 x m ( 1 x m C 2 ) , m = 0 , 1 , 2 , , C 1 , C 2 > 0 .
Then, we have for each m:
x m ( C 1 1 + C 2 m ) 1 .
Proof. 
The proof is by induction on m. For m = 0 , the statement is true by assumption. We assume x m ( C 1 1 + C 2 m ) 1 and prove that x m + 1 ( C 1 1 + C 2 ( m + 1 ) ) 1 . If x m + 1 = 0 , this statement is obvious. Assume, therefore, that x m + 1 > 0 . Then, we have:
x m + 1 1 x m 1 ( 1 x m C 2 ) 1 x m 1 ( 1 + x m C 2 ) = x m 1 + C 2 C 1 1 + ( m + 1 ) C 2 ,
which implies x m + 1 ( C 1 1 + C 2 ( m + 1 ) ) 1 . □
We apply Lemma 1 with x m : = a m B m 2 . Then, the inequality f f A 1 ( D ) implies that we can take C 1 = 1 . We set C 2 = b ( 2 b ) t 2 and obtain from (16) and Lemma 1:
a m B m 2 ( 1 + m b ( 2 b ) t 2 ) 1 .
Relations (13) and (15) imply:
a m a m 1 b ( 2 b ) y m t a m 1 / B m 1 = a m 1 ( 1 b ( 2 b ) t y m / B m 1 ) .
We now need the following simple inequality: for any x < 1 and any a > 0 , we have:
( 1 x ) ( 1 + x / a ) a 1 .
Rewriting (14) in the form:
B m = B m 1 ( 1 + b y m / B m 1 )
and using the inequality (19) with x = b ( 2 b ) t y m / B m 1 and a = ( 2 b ) t , we get from (18) and (20) that:
a m B m ( 2 b ) t a m 1 B m 1 ( 2 b ) t f 2 f A 1 ( D ) ( 2 b ) t .
Combining (17) and (21), we obtain:
a m ( 2 b ) t + 2 f 4 f A 1 ( D ) 2 ( 2 b ) t ( 1 + m b ( 2 b ) t 2 ) ( 2 b ) t ,
which completes the proof of Theorem 2 with α 0 : = ( 2 b ) t ( 2 b ) t + 2 . The case α α 0 follows from Lemma 2 below. □
Lower bounds. Let H be an infinite dimensional Hilbert space and { e k } k = 1 be an orthonormal system in H. Suppose that our symmetric dictionary D consists of ± e k , k = 1 , 2 , , and other elements g D have the property g , e k = 0 , k = 1 , 2 , . We present an example in the case t = 1 . Let b ( 0 , 1 ] and m be given. We consider two cases (I) b 1 / 4 and (II) b ( 1 / 4 , 1 ] .
(I). Set m : = [ 2 b m ] + 1 and:
f = k = 1 m e k .
Then, at each iteration, the WGA( 1 , b ) will pick one of the e k , k [ 1 , m ] with the largest coefficient. After m iterations, we will get:
f m = k = 1 m c k e k , c k 1 ( m / m + 1 ) b 1 / 4 .
Therefore, we obtain:
f m ( m ) 1 / 2 / 4 , f = ( m ) 1 / 2 , f A 1 ( D ) m .
Thus, for any α [ 0 , 1 ] , we find:
f m f 1 α f A 1 ( D ) α ( m ) α / 2 / 4 .
(II). Set:
f = k = 1 2 m e k .
Then,
f m = k = 1 2 m c k e k , c k = 1 , k G , c k = 1 b , k G , | G | = m .
Therefore, we obtain:
f m ( m ) 1 / 2 , f = ( 2 m ) 1 / 2 , f A 1 ( D ) 2 m .
Thus, for any α [ 0 , 1 ] , we find:
f m f 1 α f A 1 ( D ) α 2 1 / 2 ( 2 m ) α / 2 .
Bounds (22) and (23) show that in the case t = 1 , inequality (11) is sharp in the sense of dependence on m and b, when m goes to and b goes to 0.
Proof of (6).
The lower bound in (6) follows from (23) because in the case of an orthonormal system, the PGA and OGA algorithms coincide. We now prove the upper bound. □
Lemma 2.
Let g m ( α , H ) denote either γ m ( α , H ) or γ m o ( α , H ) . Suppose that for some β ( 0 , 1 ] , we have:
g m ( β , H ) C φ ( m ) β / 2 .
Then, for any α ( 0 , β ) , we have:
g m ( α , H ) C α / β φ ( m ) α / 2 .
Proof. 
By the definition of the PGA and OGA definitions, we have f m f and f m o f . For both PGA and OGA algorithms, the proof is identical. We will carry it out for the PGA. Our assumption gives for any f, any dictionary D , and any realization of G m ( f , D ) :
f m = f G m ( f , D ) f 1 β f A 1 ( D ) β C φ ( m ) β / 2 .
Therefore, for any a [ 0 , 1 ] , we have:
f m f 1 a ( f 1 β f A 1 ( D ) β C φ ( m ) β / 2 ) a .
Choosing a = α / β , we obtain (24) from (25). □
The upper bound in (6) follows from (5) and Lemma 2.

3. Greedy Expansions in Banach Spaces

In this section, we extend the results from Section 2 to the case of a Banach space instead of a Hilbert space. We begin with some definitions. Let X be a real Banach space with norm · . As above, we say that a set of elements (functions) D from X is a dictionary (symmetric dictionary) if each g D has norm one ( g = 1 ) and span ¯ D = X . In addition, we assume for convenience that the dictionary is symmetric:
g D implies g D .
In this paper, we study greedy algorithms with regard to D that provide greedy expansions. For a nonzero element f X , we denote by F f a norming (peak) functional for f:
F f = 1 , F f ( f ) = f .
The existence of such a functional is guaranteed by Hahn–Banach theorem. Denote
r D ( f ) : = sup F f sup g D F f ( g ) .
We note that, in general, a norming functional F f is not unique. This is why we take sup F f over all norming functionals of f in the definition of r D ( f ) . It is known that in the case of uniformly smooth Banach spaces (our primary object here), the norming functional F f is unique. In such a case, we do not need sup F f in the definition of r D ( f ) .
We consider here approximation in uniformly smooth Banach spaces. For a Banach space X, we define the modulus of smoothness:
ρ ( u ) : = ρ ( u , X ) : = sup x = y = 1 ( 1 2 ( x + u y + x u y ) 1 ) .
A uniformly smooth Banach space is one with the property:
lim u 0 ρ ( u ) / u = 0 .
It is well known (see, for instance, [22], Lemma B.1) that in the case X = L p , 1 p < , we have:
ρ ( u , L p ) u p / p if 1 p 2 , ( p 1 ) u 2 / 2 if 2 p < .
We now give a definition of the DGA ( τ , b , μ ) , τ = { t k } k = 1 , t k ( 0 , 1 ] introduced in [21] (see also [1], Ch. 6).
Dual Greedy Algorithm with parameters ( τ , b , μ ) (DGA ( τ , b , μ ) ). Let X be a uniformly smooth Banach space with the modulus of smoothness ρ ( u ) and let μ ( u ) be a majorant of ρ ( u ) : ρ ( u ) μ ( u ) , u [ 0 , ) . For a sequence τ = { t k } k = 1 , t k ( 0 , 1 ] and a parameter b ( 0 , 1 ] , we define sequences { f m } m = 0 , { φ m } m = 1 , { c m } m = 1 , and { G m } m = 0 inductively. Let f 0 : = f and G 0 : = 0 . If for m 1 f m 1 = 0 , then we set f j = 0 for j m and stop. If f m 1 0 , then we conduct the following three steps:
(1)
take any φ m D such that:
F f m 1 ( φ m ) t m r D ( f m 1 ) ;
(2)
choose c m > 0 from the equation:
f m 1 μ ( c m / f m 1 ) = t m b 2 c m r D ( f m 1 ) ;
(3)
define:
f m : = f m 1 c m φ m , G m : = G m τ , b , μ : = G m 1 + c m φ m .
Along with the algorithm DGA ( τ , b , μ ) , we consider a slight modification of it; when at step (2), we find c m from the equation (see [21], Remark 3.1):
f m 1 μ ( c m / f m 1 ) = b 2 c m F f m 1 ( φ m ) .
We denote this modification by DGA ( τ , b , μ ) * .
We proceed to studying the rate of convergence of the DGA ( τ , b , μ ) in the uniformly smooth Banach spaces with the power type majorant of the modulus of smoothness: ρ ( u ) μ ( u ) = γ u q , 1 < q 2 . The following Theorem 3 is from [21] (see also [1], p. 372).
Theorem 3.
Let τ : = { t k } k = 1 be a nonincreasing sequence 1 t 1 t 2 > 0 and b ( 0 , 1 ) . Assume that X has a modulus of smoothness ρ ( u ) γ u q , q ( 1 , 2 ] . Denote μ ( u ) : = γ u q . Then, for any dictionary D and any f A 1 ( D ) , the rate of convergence of the DGA ( τ , b , μ ) is given by:
f m C ( b , γ , q ) 1 + k = 1 m t k p t m ( 1 b ) p ( 1 + t m ( 1 b ) ) , p : = q q 1 .
Remark 1.
It is pointed out in [21], Remark 3.2, that Theorem 3 holds for the algorithm DGA ( τ , b , μ ) * as well.
Theorem 3 is an analog of Theorem 1. We now prove an analog of Theorem 2. We extend Theorem 3 to provide a bound for:
γ m t , b , μ ( α , X ) : = sup D sup f A 1 ( D ) , f 0 sup G m t , b , μ ( f , D ) f G m t , b , μ ( f , D ) f 1 α f A 1 ( D ) α .
The corresponding characteristic for the algorithm DGA ( τ , b , μ ) * is denoted by γ m t , b , μ ( α , X ) * . We prove the following Theorem 4 in the case t k = t , k = 1 , 2 , .
Theorem 4.
For any Banach space X with modulus of smoothness ρ ( u , X ) γ u q , 1 < q 2 , p : = q q 1 , we have:
γ m t , b , μ ( α , X ) ( 1 + m c t p ) α / p , c : = ( 1 b ) b 2 γ 1 q 1 ,
provided α t ( 1 b ) 1 + t ( 1 b ) . The same inequality holds for the γ m t , b , μ ( α , X ) * .
Proof. 
The proof is identical for both characteristics γ m t , b , μ ( α , X ) and γ m t , b , μ ( α , X ) * . We carry it out for the γ m t , b , μ ( α , X ) . From the definition of the modulus of smoothness, we have:
f n 1 c n φ n + f n 1 + c n φ n 2 f n 1 ( 1 + ρ ( c n / f n 1 ) ) .
Using the definition of φ n :
F f n 1 ( φ n ) t r D ( f n 1 ) ,
we get:
f n 1 + c n φ n F f n 1 ( f n 1 + c n φ n ) = f n 1 + c n F f n 1 ( φ n ) f n 1 + c n t r D ( f n 1 ) .
Combining (33) and (35), we get:
f n = f n 1 c n φ n f n 1 ( 1 + 2 ρ ( c n / f n 1 ) ) c n t r D ( f n 1 ) .
Using the choice of c m , we get from here:
f m f m 1 t ( 1 b ) c m r D ( f m 1 ) .
Thus, we need to estimate from below c m r D ( f m 1 ) . It is clear that:
f m 1 A 1 ( D ) = f j = 1 m 1 c j φ j A 1 ( D ) f A 1 ( D ) + j = 1 m 1 c j .
Denote B n : = f A 1 ( D ) + j = 1 n c j . Then, by (38), we have:
f m 1 A 1 ( D ) B m 1 .
Next, by Lemma 6.10 from [1], p. 343, we obtain:
r D ( f m 1 ) = sup g D F f m 1 ( g ) = sup φ A 1 ( D ) F f m 1 ( φ ) f m 1 A 1 ( D ) 1 F f m 1 ( f m 1 ) f m 1 / B m 1 .
Substituting (39) into (37), we get:
f m f m 1 ( 1 t ( 1 b ) c m / B m 1 ) .
From the definition of B m , we find:
B m = B m 1 + c m = B m 1 ( 1 + c m / B m 1 ) .
Using the inequality:
( 1 + x ) α 1 + α x , 0 α 1 , x 0 ,
we obtain:
B m t ( 1 b ) B m 1 t ( 1 b ) ( 1 + t ( 1 b ) c m / B m 1 ) .
Multiplying (40) and (41), we get:
f m B m t ( 1 b ) f m 1 B m 1 t ( 1 b ) f f A 1 ( D ) t ( 1 b ) .
The function μ ( u ) / u = γ u q 1 is increasing on [ 0 , ) . Therefore, the c m from (28) is greater than or equal to c m from the following Equation (43) (see (39)):
γ f m 1 ( c m / f m 1 ) q = t b 2 c m f m 1 / B m 1 ,
c m = t b 2 γ 1 q 1 f m 1 q q 1 B m 1 1 q 1 .
Using notations:
p : = q q 1 , c : = ( 1 b ) b 2 γ 1 q 1 ,
we get from (37), (39), (44):
f m f m 1 1 c t p f m 1 p B m 1 p .
Noting that B m B m 1 , we derive from (45) that:
( f m / B m ) p ( f m 1 / B m 1 ) p ( 1 c t p ( f m 1 / B m 1 ) p ) .
Taking into account that f f A 1 ( D ) , we obtain from (46) by Lemma 1 with C 1 = 1 , C 2 = c t p :
( f m / B m ) p ( 1 + m c t p ) 1 .
Combining (42) and (47), we get:
f m f 1 α 0 f A 1 ( D ) α 0 ( 1 + m c t p ) α 0 / p , p : = q q 1 , α 0 : = t ( 1 b ) 1 + t ( 1 b ) .
This completes the proof of Theorem 4 for α = α 0 . The case α < α 0 follows from the case α = α 0 and the corresponding analog of Lemma 2. □
Let us discuss an application of Theorem 4 in the case of a Hilbert space. It is well known and easy to check that for a Hilbert space H, one has:
ρ ( u ) ( 1 + u 2 ) 1 / 2 1 u 2 / 2 .
Let us figure out how the DGA ( t , b , u 2 / 2 ) works in a Hilbert space. Consider the mth step of it. Let φ m D be from (27) with t m = t (we assume existence in case t = 1 ). Then, it is clear that for φ m , we have:
f m 1 , φ m t f m 1 r D ( f m 1 ) = t sup g D f m 1 , g .
The WGA( t , 1 ) would use φ m with the coefficient f m 1 , φ m at this step. The DGA ( t , b , u 2 / 2 ) * , like WGA( t , b ), uses the same φ m and only a fraction of f m 1 , φ m :
c m = b f m 1 F f m 1 ( φ m ) = b f m 1 , φ m .
Thus, the choice b = 1 in (48) corresponds to the WGA. However, it is clear from the above considerations that our technique, designed for general Banach spaces, does not work in the case b = 1 . By Theorem 4 with μ ( u ) = u 2 / 2 , the DGA ( t , b , μ ) and DGA ( t , b , μ ) * provide the following error estimate:
f m f 1 α f A 1 ( D ) α ( 1 + m c t 2 ) α / 2 , α α 0 : = t ( 1 b ) 1 + t ( 1 b ) .
Note that the inequality (49) is similar to the corresponding inequality, which follows from Theorem 2, for α α 1 : = t ( 2 b ) 2 + t ( 2 b ) . It is easy to check that α 0 < α 1 , which means that Theorem 2 gives a stronger result than the corresponding corollary of Theorem 4.
A remark on lower bounds. In Section 2, we obtained the lower bounds, which are sharp in both parameters m and b. Clearly, the most important parameter is m. Here, we obtain the lower bounds in m, which apply to any algorithm providing m-term approximation after m iterations. Recall the definition of the concept of m-term approximation with respect to a given dictionary D . Given an integer m N , we denote by Σ m ( D ) the set of all m-term approximants with respect to D :
Σ m ( D ) : = h X : h = i = 1 m c i g i , g i D , c i R , i = 1 , , m .
Define for a Banach space X:
σ m ( f , D ) X : = inf h Σ m ( D ) f h X
to be the best m-term approximation of f X in the X-norm with respect to D .
Let 1 < q 2 . Consider X = l q . It is known ([23], p. 67) that l q , 1 < q 2 , is a uniformly smooth Banach space with a modulus of smoothness ρ ( u ) of power type q: ρ ( u ) γ u q . Choose D : = E as a symmetrized standard basis { ± e j } j = 1 , e j : = ( 0 , , 0 , 1 , 0 , ) , for l q . For a given m N , set:
f : = i = 1 2 m e i .
Then, the following relations are obvious:
f l q = ( 2 m ) 1 / q , f A 1 ( E ) = 2 m , σ m ( f , E ) l q = m 1 / q .
Therefore, for any α [ 0 , 1 ] :
σ m ( f , E ) l q f l q 1 α f A 1 ( E ) α 1 2 m α / p , p : = q q 1 .
This means that the upper bounds provided by Theorem 4 are sharp in m for any fixed parameters t and b. More precisely, for every q ( 1 , 2 ] , there exists a Banach space X with ρ ( u , X ) γ u q , a dictionary D X , and f X such that the following inequality holds:
σ m ( f , D ) X f X 1 α f A 1 ( D ) α 1 2 m α / p , p : = q q 1 .
Inequality (51) follows directly from (50) with X = l q and D = E .
Remark 2.
Inequality (51) gives the lower bound for the best m-term approximation. It is known (see [1], Ch. 6) that there are greedy-type algorithms; for instance, the Weak Chebyshev Greedy Algorithm and the Weak Greedy Algorithm with Free Relaxation with the weakness parameter t ( 0 , 1 ] , which provide the following rate of convergence for f X with ρ ( u , X ) γ u q , 1 < q 2 ,
f m X C ( q , γ ) ( 1 + m t p ) 1 / p f A 1 ( D ) , p : = q q 1 .
This means that the corresponding upper bound in (51) (in the sense of order) can be realized by a greedy-type algorithm.
Note that for specific X and D , the inequality (51) may be improved. We illustrate it on the example of X = l q with q ( 2 , ) and D = E . Without loss of generality, we can assume that:
f = i = 1 c i e i , c 1 c 2 0 .
Then,
f l q = i = 1 c i q 1 / q , f A 1 ( E ) = f l 1 = i = 1 c i .
We now estimate from above:
σ m ( f , E ) l q = i = m + 1 c i q 1 / q .
Our monotonicity assumption on { c i } implies:
c m m 1 / q f l q , c m m 1 f l 1
and, therefore, for any β [ 0 , 1 ] :
c m m ( 1 / q ) ( 1 β ) β f l q 1 β f l 1 β .
Setting α : = β ( 1 1 / q ) + 1 / q , we obtain from here with p : = q q 1 :
σ m ( f , E ) l q c m 1 1 / q f l 1 1 / q m α / p f l q 1 α f l 1 α .
Therefore, for any α [ 0 , 1 ] , we have for all f l q :
σ m ( f , E ) l q f l q 1 α f A 1 ( E ) α m α / p , p : = q q 1 .
Note that it is known (see (26) above) that the space l q with q [ 2 , ) has the modulus of smoothness of the power type 2. Thus, Theorem 4 gives an analog of (55) with a weaker rate of decay m α / 2 than m α / p in (55).
We briefly discuss another example of the L q -type spaces with q ( 2 , ) . Consider the space L q ( 0 , 2 π ) of real 2 π -periodic functions and take D = T to be the real trigonometric system. For a given m N , set:
f : = k = 1 2 m cos ( 2 k x ) .
Then, it is well known that:
f L q C ( q ) ( 2 m ) 1 / 2 , f A 1 ( T ) = 2 m .
Moreover,
σ m ( f , T ) L q σ m ( f , T ) L 2 C m 1 / 2 .
Therefore, for any α [ 0 , 1 ] , we obtain:
σ m ( f , T ) L q f L q 1 α f A 1 ( T ) α C ( q ) m α / 2 .

4. Discussion

In this paper, we propose and study a new criterion for the evaluation of efficiency of a greedy algorithm. This criterion takes into account two characteristics of an element f, which we approximate: its norm f (either in a Hilbert or in a Banach space) and the f A 1 ( D ) . We test this new criterion on two standard classes of algorithms. The first class consists of algorithms which provide for each element f an expansion with respect to a given dictionary D . In this paper, we discuss two such algorithms: the PGA and the WGA( t , b ) for Hilbert spaces, and the DGA( t , b , μ ) for Banach spaces. Using our new criterion, we compare the efficiency of these algorithms with algorithms from the second class, which include the OGA (Hilbert spaces) and its generalization for Banach spaces—the Weak Chebyshev Greedy Algorithm. Algorithms from the second class are known for providing optimal in the sense of order rate of m-term approximation for the class A 1 ( D ) . These algorithms do not provide expansions. In this paper, we show that from the point of view of our new criterion algorithms, PGA, WGA( t , b ), and DGA( t , b , μ ) are optimal in the sense of order. We illustrate this fact on the example of WGA( 1 , b ). Theorem 2 with t = 1 gives the following upper bound for α 2 b 4 b :
γ m 1 , b ( α , H ) ( 1 + m b ) α / 2 .
Inequality (22) shows that this bound cannot be improved in the sense of order. Moreover, inequality (51) with p = 2 , which corresponds to a Hilbert space, shows that (57) cannot be improved in the sense of order with respect to m even if instead of the algorithm WGA( 1 , b ), we use the best m-term approximations.
We point out that our new results provide better bounds for the accuracy than known results in the case of small f . For simplicity, we illustrate that on the example of the PGA. Suppose that f A 1 ( D ) . Then, Ref [16] gives the bound f m m 1 / 6 , which does not include the norm f . The upper bound in (4) with α = 1 / 3 gives:
f m f 2 / 3 m 1 / 6 ,
which is better than f m m 1 / 6 , when f is small.
In this paper, we only applied our new criterion to a few important greedy algorithms and established a new qualitative effect on optimality (in the sense of order) of some of the algorithms, which provide expansions. It will be interesting to apply this criterion to other greedy algorithms.

Funding

This research was funded by Russian Science Foundation, grant number 23-71-30001.

Data Availability Statement

Not applicable.

Acknowledgments

The author is grateful to the referees for their useful comments and suggestions. This research was supported by the Russian Science Foundation (project No. 23-71-30001) at the Lomonosov Moscow State University.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Temlyakov, V. Greedy Approximation; Cambridge University Press: New York, NY, USA, 2011. [Google Scholar]
  2. Needell, D.; Vershynin, R. Uniform uncertainty principle and signal recovery via orthogonal matching pursuit. Found. Comput. Math. 2009, 9, 317–334. [Google Scholar] [CrossRef]
  3. Needell, D.; Tropp, J.A. CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal. 2009, 26, 301–321. [Google Scholar] [CrossRef] [Green Version]
  4. Dai, W.; Milenkovic, O. Subspace pursuit for compressive sensing signal reconstruction. IEEE Trans. Inf. Theory 2009, 55, 2230–2249. [Google Scholar] [CrossRef] [Green Version]
  5. Petrova, G. Rescaled Pure Greedy Algorithm for Hilbert and Banach spaces. Appl. Comput. Harmon. Anal. 2016, 41, 852–866. [Google Scholar] [CrossRef] [Green Version]
  6. Dereventsov, A.V.; Temlyakov, V.N. A unified way of analyzing some greedy algorithms. J. Funct. Anal. 2019, 277, 108286. [Google Scholar] [CrossRef] [Green Version]
  7. DeVore, R.A.; Temlyakov, V.N. Convex optimization on Banach spaces. Found. Comput. Math. 2016, 16, 369–394. [Google Scholar] [CrossRef] [Green Version]
  8. Chandrasekaran, V.; Recht, B.; Parrilo, P.; Willsky, A.A. The convex geometry of linear inverse problems. Found. Comput. Math. 2012, 12, 805–849. [Google Scholar] [CrossRef] [Green Version]
  9. Dereventsov, A.V.; Temlyakov, V.N. Biorthogonal greedy algorithms in convex optimization. Appl. Comput. Harmon. Anal. 2022, 60, 489–511. [Google Scholar] [CrossRef]
  10. Figueiredo, M.A.; Nowak, R.D.; Wright, S.J. Gradient projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems. IEEE Sel. Top. Signal Process. 2007, 1, 586–597. [Google Scholar] [CrossRef] [Green Version]
  11. Jaggi, M. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. ICML 2013, 1, 427–435. [Google Scholar]
  12. Shalev-Shwartz, S.; Srebro, N.; Zhang, T. Trading accuracy for sparsity in optimization problems with sparsity constrains. SIAM J. Optim. 2010, 20, 2807–2832. [Google Scholar] [CrossRef] [Green Version]
  13. Tewari, A.P.; Ravikumar, K.; Dhillon, I.S. Greedy algorithms for structurally constrained high dimensional problems. Adv. Neural Inf. Process. Syst. 2011, 24, 882–890. [Google Scholar]
  14. Zhang, T. Sequential greedy approximation for certain convex optimization problems. IEEE Trans. Inf. Theory 2003, 49, 682–691. [Google Scholar] [CrossRef]
  15. Temlyakov, V. Multivariate Approximation; Cambridge University Press: New York, NY, USA, 2018. [Google Scholar]
  16. DeVore, R.A.; Temlyakov, V.N. Some remarks on Greedy Algorithms. Adv. Comp. Math. 1996, 5, 173–187. [Google Scholar] [CrossRef]
  17. Livshitz, E.D. On lower estimates of rate of convergence of greedy algorithms. Izv. RAN Ser. Matem. 2009, 73, 125–144. [Google Scholar]
  18. Konyagin, S.V.; Temlyakov, V.N. Rate of convergence of Pure Greedy Algorithm. East J. Approx. 1999, 5, 493–499. [Google Scholar]
  19. Sil’nichenko, A.V. Rate of convergence of greedy algorithms. Matem. Zametki 2004, 76, 628–632. [Google Scholar] [CrossRef]
  20. Nelson, J.L.; Temlyakov, V.N. Greedy expansions in Hilbert Spaces. Proc. Steklov Inst. Math. 2013, 280, 227–239. [Google Scholar] [CrossRef]
  21. Temlyakov, V.N. Greedy Expansions in Banach Spaces. Adv. Comput. Math. 2007, 26, 431–449. [Google Scholar] [CrossRef]
  22. Donahue, M.; Gurvits, L.; Darken, C.; Sontag, E. Rate of convex approximation in non-Hilbert spaces. Constr. Approx. 1997, 13, 187–220. [Google Scholar] [CrossRef]
  23. Lindenstrauss, J.; Tzafriri, L. Classical Banach Spaces I; Springer: Berlin/Heidelberg, Germany, 1977. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Temlyakov, V. On the Rate of Convergence of Greedy Algorithms. Mathematics 2023, 11, 2559. https://doi.org/10.3390/math11112559

AMA Style

Temlyakov V. On the Rate of Convergence of Greedy Algorithms. Mathematics. 2023; 11(11):2559. https://doi.org/10.3390/math11112559

Chicago/Turabian Style

Temlyakov, Vladimir. 2023. "On the Rate of Convergence of Greedy Algorithms" Mathematics 11, no. 11: 2559. https://doi.org/10.3390/math11112559

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop