On the Rate of Convergence of Greedy Algorithms

Temlyakov, Vladimir

doi:10.3390/math11112559

Open AccessEditor’s ChoiceArticle

On the Rate of Convergence of Greedy Algorithms

by

Vladimir Temlyakov

^1,2,3,4

¹

Steklov Mathematical Institute of Russian Academy of Sciences, 117312 Moscow, Russia

²

Department of Mechanics and Mathematics, Lomonosov Moscow State University, 119991 Moscow, Russia

³

Moscow Center of Fundamental and Applied Mathematics, 119333 Moscow, Russia

⁴

Department of Mathematics, University of South Carolina, Columbia, SC 29208, USA

Mathematics 2023, 11(11), 2559; https://doi.org/10.3390/math11112559

Submission received: 8 May 2023 / Revised: 27 May 2023 / Accepted: 31 May 2023 / Published: 2 June 2023

(This article belongs to the Special Issue Fourier Analysis, Approximation Theory and Applications)

Download Versions Notes

Abstract

:

In this paper, a new criterion for the evaluation of the theoretical efficiency of a greedy algorithm is suggested. Using this criterion, we prove some results on the rate of convergence of greedy algorithms, which provide expansions. We consider both the case of Hilbert spaces and the more general case of Banach spaces. The new component of this paper is that we bound the error of approximation by the product of two norms—the norm of f and the

A_{1}

-norm of f. Typically, only the

A_{1}

-norm of f is used. In particular, we establish that some greedy algorithms (Pure Greedy Algorithm (PGA) and its modifications) are as good as the Orthogonal Greedy Algorithm (OGA) in this new sense of the rate of convergence, while it is known that the PGA is much worse than the OGA in the standard sense. Our new results provide better bounds for the accuracy than known results in the case of small

∥ f ∥

.

Keywords:

greedy approximation; rate of convergence; pure greedy algorithm

MSC:

41A25

1. Introduction

This paper is devoted to the theoretical study of the efficiency of some greedy algorithms. Greedy algorithms are very useful in applications; in particular, adaptive methods are used in PDE solvers, and sparse approximation is used in image/signal/data processing, as well as in the design of neural networks, and in convex optimization. This fact motivated deep theoretical study of a variety of greedy algorithms. In this paper, we study two the most popular greedy algorithms—the Pure Greedy Algorithm (PGA) and the Orthogonal Greedy Algorithm (OGA) and their natural modifications. The reader can find other important greedy algorithms in the book [1] and in the papers (Regularized Orthogonal Matching Pursuit) [2], (Compressive Sampling Matching Pursuit) [3], (Subspace Pursuit) [4], (Rescaled Pure Greedy Algorithm) [5], and (Biorthogonal Greedy Algorithm) [6]. The reader can find some results on the application of greedy algorithms in convex optimization in the papers [7,8,9,10,11,12,13,14].

There are different criteria for the theoretical efficiency of greedy algorithms. All of them are based on the accuracy of the algorithm (the error after the mth iteration of the algorithm) and take different forms. One of those criteria uses the worst error over a given class of incomes (elements, which we approximate by the algorithm). We discuss a variant of this criterion in this paper. There is another, a more delicate criterion, which is based on the Lebesgue-type inequalities for individual elements. We do not discuss this criterion here. The reader can find a survey of the corresponding results in [15], Ch. 8. We now proceed to a detailed discussion of our results.

Let us begin with a general description of the problem. Let X be a Banach space with the norm

{∥ \cdot ∥}_{X}

and

Y \subset X

be a subspace of X with a stronger norm

{∥ f ∥}_{Y} \geq {∥ f ∥}_{X}

,

f \in Y

. Consider a homogeneous approximation operator (linear or nonlinear)

G : Y \to X

,

G (a f) = a G (f)

,

f \in Y

,

a \in R

, and the error of approximation:

e {(B_{Y}, G)}_{X} : = sup_{f \in B_{Y}} {∥ f - G (f) ∥}_{X}, B_{Y} : = {{f : ∥ f ∥}_{Y} \leq 1} .

Then, for any

f \in Y

, we have:

{∥ f - G (f) ∥}_{X} \leq e {(B_{Y}, G)}_{X} {∥ f ∥}_{Y} .

(1)

The characteristic

e {(B_{Y}, G)}_{X}

plays an important role in approximation theory, with many classical examples of spaces X and Y; for instance,

X = L_{p}

and Y is one the smoothness spaces such as Sobolev, Nikol’skii, or Besov space.

In this paper, we focus on the following version of the inequality (1): find the best

γ (α, G, X, Y)

such that the inequality:

{∥ f - G (f) ∥}_{X} \leq {γ (α, G, X, Y) ∥ f ∥}_{X}^{1 - α} {∥ f ∥}_{Y}^{α}, α \in [0, 1],

(2)

holds for all

f \in Y

. Clearly,

γ (1, G, X, Y) = e {(B_{Y}, G)}_{X}

. Additionally, it is clear that under assumption:

{∥ f - G (f) ∥}_{X} \leq {∥ f ∥}_{X}

,

f \in Y

, we obtain the trivial bound:

γ (α, G, X, Y) \leq e {(B_{Y}, G)}_{X}^{α} .

In this paper, we discuss greedy approximation with respect to a given dictionary and prove some nontrivial inequalities for

γ (α, G, X, Y)

both in the case of X being a Hilbert space and X being a Banach space. In particular, we establish that some greedy algorithms (Pure Greedy Algorithm (PGA) and its generalizations) are as good as the Orthogonal Greedy Algorithm (OGA) in the sense of inequality (2), while it is known that the the PGA is much worse than the OGA in the sense of the inequality (1) (for definitions and precise formulations, see below).

Let H be a real Hilbert space with the inner product

〈 \cdot, \cdot 〉

and norm

∥ \cdot ∥

. We say that a set of elements (functions)

D

from H is a dictionary (symmetric dictionary) if each

g \in D

has norm one (

∥ g ∥ = 1

) and

\bar{span} D = H

. In addition, we assume for convenience the property of symmetry:

g \in D implies - g \in D .

We define the Pure Greedy Algorithm (PGA). We describe this algorithm for a general dictionary

D

. If

f \in H

, we let

g (f) \in D

be an element from

D

which maximizes

〈 f, g 〉

. We assume for simplicity that such a maximizer exists; if not, suitable modifications are necessary (see Weak Greedy Algorithm below) in the algorithm that follows. We define:

G (f, D) : = 〈 f, g (f) 〉 g (f) and R (f, D) : = f - G (f, D) .

Pure Greedy Algorithm (PGA). We define

f_{0} : = f

and

G_{0} (f, D) : = 0

. Then, for each

m \geq 1

, we inductively define

G_{m} (f, D) : = G_{m - 1} (f, D) + G (f_{m - 1}, D),

f_{m} : = f - G_{m} (f, D) = R (f_{m - 1}, D) .

Note that for a given element f, the sequence

{G_{m} (f, D)}

may not be unique.

This algorithm is well studied from the point of view of convergence and rate of convergence. The reader can find the corresponding results and historical comments in [1], Ch. 2. In this paper, we focus on the rate of convergence. Typically, in approximation theory, we define the rate of convergence for specific classes. In classical approximation theory, these are smoothness classes. Clearly, in the general setting with arbitrary H and

D

, we do not have a concept of smoothness similar to the classical smoothness of functions. It turns out that the geometrically defined class, namely, the closure of the convex hull of

D

, which we denote by

A_{1} (D)

, is a very natural class. For each

f \in H

, we associate the following norm:

{∥ f ∥}_{A_{1} (D)} : = inf {M > 0 : f / M \in A_{1} (D)} .

Clearly,

∥ f ∥ \leq {∥ f ∥}_{A_{1} (D)}

. Then, the problem of the rate of convergence of the PGA can be formulated as follows (see [1], p. 95). Find the order of decay of the sequence:

γ_{m} (H) : = sup_{{G_{m} (f, D)}, f, D} \frac{∥ f - G_{m} (f, D) ∥}{{∥ f ∥}_{A_{1} (D)}},

where the supremum is taken over all possible choices of

{G_{m} (f, D)}

, over all elements

f \in H

,

f \neq 0

,

{∥ f ∥}_{A_{1} (D)} < \infty

, and over all dictionaries

D

. This problem is a central theoretical problem in greedy approximation in Hilbert spaces and it is still open. We mention some of the known results here and refer the reader, for the detailed history of the problem, to [1], Ch. 2. It is clear that for any

f \in H

, such that

{∥ f ∥}_{A_{1} (D)} < \infty

we have:

∥ f - G_{m} (f, D) ∥ \leq γ_{m} (H) {∥ f ∥}_{A_{1} (D)} .

In this paper, we discuss the following extension of the asymptotic characteristic

γ_{m} (H)

: for

α \in (0, 1]

define:

γ_{m} (α, H) : = sup_{{G_{m} (f, D)}, f, D} \frac{∥ f - G_{m} (f, D) ∥}{{∥ f ∥}^{1 - α} {∥ f ∥}_{A_{1} (D)}^{α}} .

Clearly,

γ_{m} (1, H) = γ_{m} (H), γ_{m} (α, H) \geq γ_{m} (β, H) if α \leq β .

(3)

The first upper bound on

γ_{m} (H)

was obtained in [16]:

γ_{m} (H) \leq m^{- 1 / 6} .

Actually, the proof in [16] (see also [1], pp. 92–93) gives:

γ_{m} (1 / 3, H) \leq m^{- 1 / 6} .

We establish here the following bounds:

\frac{1}{2} m^{- α / 2} \leq γ_{m} (α, H) \leq m^{- α / 2}, α \leq 1 / 3 .

(4)

Additionally, in Section 2, we find the right behavior of the asymptotic characteristic similar to

γ_{m} (α, H)

for a more general algorithm than the PGA, namely, for the Weak Greedy Algorithm with parameter b.

It is interesting to compare the rates of convergence of the PGA and the Orthogonal Greedy Algorithm (OGA). We now give a brief definition of the OGA. We define

f_{0}^{o} : = f

,

G_{0}^{o} (f, D) : = 0

and for

m \geq 1

, we inductively define

G_{m}^{o} (f, D)

to be the orthogonal projection of f onto the span of

g (f_{0}^{o})

,…,

g (f_{m - 1}^{o})

and set

f_{m}^{o} : = f - G_{m}^{o} (f, D)

. The analogs of the characteristics

γ_{m} (H)

and

γ_{m} (α, H)

for the OGA are denoted by

γ_{m}^{o} (H)

and

γ_{m}^{o} (α, H)

. The following bound is proved in [16] (see also [1], p. 93):

γ_{m}^{o} (H) \leq m^{- 1 / 2} .

(5)

It is known (see [17]) that

γ_{m} (H)

decays slower than

m^{- 0.1898}

. Therefore, from the point of view of the characteristics

γ_{m} (H)

and

γ_{m}^{o} (H)

, the OGA is much better than the PGA. We establish here the following bounds:

\frac{1}{2} m^{- α / 2} \leq γ_{m}^{o} (α, H) \leq m^{- α / 2}, α \leq 1 .

(6)

This means that from the point of view of the characteristics

γ_{m} (α, H)

and

γ_{m}^{o} (α, H)

, the OGA is the same (in the sense of order) as PGA for

α \leq 1 / 3

. This is a very surprising fact.

We do not know if the upper bound in (4) holds for

α > 1 / 3

. However, the inequality in (3) and the lower bound for the

γ_{m} (H)

show that:

γ_{m} (α, H) \geq γ_{m} (H) \geq c m^{- 0.1898} .

Therefore, the upper bound in (4) cannot be extended beyond

α_{0} : = 0.3796

.

Section 3 deals with the case of a Banach space X. The results for the Banach space case are similar to those for Hilbert spaces, but are not as sharp as their counterparts.

Novelty. In this paper, we suggest a new criterion for the evaluation of the theoretical efficiency of a greedy algorithm. The classical criterion uses the worst error (for instance,

γ_{m} (H)

) of approximation of elements from the class

A_{1} (D)

by our algorithm (for instance, the PGA). In other words, this criterion uses the norm

{∥ f ∥}_{A_{1} (D)}

for estimating the error of approximation of f. Our new criterion uses two norms,

{∥ f ∥}_{A_{1} (D)}

and

∥ f ∥

; more precisely, the weighted product of these norms

{∥ f ∥}^{1 - α} {∥ f ∥}_{A_{1} (D)}^{α}

,

α \in [0, 1]

. The most important qualitative discovery of this paper is that the PGA and its natural modifications have the same theoretical efficiency for some

α

as the OGA and its modifications. It is known that in accordance with the old criterion, the PGA is much worse than the OGA.

Method. The standard way of analyzing the accuracy of a greedy algorithm is based on estimating from below the difference

∥ f_{m - 1} ∥ - ∥ f_{m} ∥

and then solving (estimating) the corresponding recurrent inequalities. For instance, this method works very well for the OGA. In the paper [16], this standard method was modified in such a way that it allowed us to obtain some new nontrivial upper bounds for

γ_{m} (H)

. The method from [16] simultaneously analyzes two sequences, the

{∥ f_{m} {∥}}_{m = 1}^{\infty}

and the sequence of sums of absolute values of the coefficients of the

G_{m} (f, D)

. In this paper, we further develop the method from [16].

The results of this paper (see Theorem 2 and lower bounds in Section 2) show that our method, which is a development of the method from [16], is optimal for proving the upper bounds for

γ_{m}^{t, b} (α, H)

in the case

α \leq \frac{(2 - b) t}{(2 - b) t + 2}

. It is known that the bound

γ_{m} (H) \leq m^{- 1 / 6}

, which was obtained in [16], is not optimal. The bound

γ_{m} (H) \leq 4 m^{- 11 / 62}

was proved in [18] by a method distinct from the one in [16]. The method from [18] was further developed in [19,20]. It would be interesting to understand if the method from [18] and its further developments allow us to prove an analog of Theorem 2 for

α > \frac{(2 - b) t}{(2 - b) t + 2}

.

Conclusion. The PGA at the mth iteration searches for an element

g (f_{m - 1})

, which maximizes the inner product

〈 f_{m - 1}, g 〉

over all

g \in D

. Then, the update is very easy:

f_{m} = f_{m - 1} - 〈 f_{m - 1}, g (f_{m - 1}) 〉 g (f_{m - 1})

. The OGA, like the PGA at the mth iteration, searches for an element

g (f_{m - 1}^{o})

, which maximizes the inner product

〈 f_{m - 1}^{o}, g 〉

over all

g \in D

. However, the second steps of the PGA and OGA at the mth iteration are different—the OGA makes an orthogonal projection onto the span of

g (f_{0}^{o})

, …,

g (f_{m - 1}^{o})

. Clearly, this step of the OGA is more difficult than the corresponding step of the PGA. Moreover, it is clear from the definition of the PGA that it provides an expansion of f into a series with respect to

D

. The OGA does not provide an expansion. Thus, the advantage of the PGA over the OGA is that it is simpler and provides an expansion. The advantage of the OGA over the PGA is that in accordance with the old criterion, we can guarantee better accuracy for an

f \in A_{1} (D)

. The results of this paper show that in accordance with the new criterion, the OGA does not have an advantage in the sense of accuracy for some parameters

α

. Similar results are obtained for the modifications of the PGA—the Weak Greedy Algorithm with parameter b in Hilbert spaces (see Section 2) and the Dual Greedy Algorithm with parameters

(t, b, μ)

in Banach spaces (see Section 3).

2. Hilbert Space: The Weak Greedy Algorithm with Parameter $b$

Let a sequence

τ = {t_{k}}_{k = 1}^{\infty}

,

0 \leq t_{k} \leq 1

, and a parameter

b \in (0, 1]

be given. We define the Weak Greedy Algorithm with parameter b.

Weak Greedy Algorithm with parameter b (WGA( $τ, b$ )). We define

f_{0} : = f_{0}^{τ, b} : = f

. Then, for each

m \geq 1

, we inductively define:

(1): $φ_{m} : = φ_{m}^{τ, b} \in D$ is any satisfying:

$〈 f_{m - 1}, φ_{m} 〉 \geq t_{m} sup_{g \in D} 〈 f_{m - 1}, g 〉;$
(2): $f_{m} : = f_{m}^{τ, b} : = f_{m - 1} - b 〈 f_{m - 1}, φ_{m} 〉 φ_{m};$
(3): $G_{m} (f, D) : = G_{m}^{τ, b} (f, D) : = b \sum_{j = 1}^{m} 〈 f_{j - 1}, φ_{j} 〉 φ_{j} .$

In the case

t_{k} = t

,

k = 1, 2, \dots

, we write t in the notation instead of

τ

.

We proceed to the rate of convergence. The following Theorem 1 was proved in [21].

Theorem 1.

Let

D

be an arbitrary dictionary in H. Assume

τ : = {t_{k}}_{k = 1}^{\infty}

is a nonincreasing sequence and

b \in (0, 1]

. Then, for

f \in A_{1} (D)

, we have:

∥ f - G_{m}^{τ, b} (f, D) ∥ \leq e_{m} (τ, b),

(7)

where:

e_{m} (τ, b) : = {(1 + b (2 - b) \sum_{k = 1}^{m} t_{k}^{2})}^{- \frac{(2 - b) t_{m}}{2 (2 + (2 - b) t_{m})}} .

(8)

Theorem 1 implies the following inequality for any f and any

D

:

\frac{∥ f - G_{m}^{τ, b} (f, D) ∥}{{∥ f ∥}_{A_{1} (D)}} \leq e_{m} (τ, b) .

(9)

We now extend Theorem 1 to provide a bound for:

γ_{m}^{t, b} (α, H) : = sup_{D} sup_{f \in A_{1} (D), f \neq 0} sup_{G_{m}^{t, b} (f, D)} \frac{∥ f - G_{m}^{t, b} (f, D) ∥}{{∥ f ∥}^{1 - α} {∥ f ∥}_{A_{1} (D)}^{α}} .

(10)

We prove the following Theorem 2 in the case

t_{k} = t

,

k = 1, 2, \dots

.

Theorem 2.

For any Hilbert space H, we have:

γ_{m}^{t, b} (α, H) \leq {(1 + m b (2 - b) t^{2})}^{- α / 2},

(11)

provided

α \leq \frac{(2 - b) t}{(2 - b) t + 2}

.

Proof.

The proof of this theorem goes along the lines of the proof of Theorem 1 in [21]. Let

D

be a dictionary in H. We introduce some notations:

f_{k} : = f_{k}^{t, b}, φ_{k} : = φ_{k}^{t, b}, k = 0, 1, \dots,

a_{m} : = {∥ f_{m} ∥}^{2}, y_{m} : = 〈 f_{m - 1}, φ_{m} 〉, m = 1, 2, \dots,

and consider the sequence

{B_{n}}

defined as follows:

B_{0} : = {∥ f ∥}_{A_{1} (D)}, B_{m} : = B_{m - 1} + b y_{m}, m = 1, 2, \dots .

It is clear that

{∥ f_{n} ∥}_{A_{1} (D)} \leq B_{n}

,

n = 0, 1, \dots

. By Lemma 3.5 from [16] (see also [1], p. 91, Lemma 2.17), we get:

sup_{g \in D} 〈 f_{m - 1}, g 〉 \geq {∥ f_{m - 1} ∥}^{2} / B_{m - 1} .

(12)

From here and from the equality:

{∥ f_{m} ∥}^{2} = {∥ f_{m - 1} ∥}^{2} - b (2 - b) {〈 f_{m - 1}, φ_{m} 〉}^{2}

we obtain the following relations:

a_{m} = a_{m - 1} - b (2 - b) y_{m}^{2},

(13)

B_{m} = B_{m - 1} + b y_{m},

(14)

y_{m} \geq t a_{m - 1} / B_{m - 1} .

(15)

From (13) and (15), we obtain:

a_{m} \leq a_{m - 1} (1 - b (2 - b) t^{2} a_{m - 1} B_{m - 1}^{- 2}) .

Using that

B_{m - 1} \leq B_{m}

, we derive from here:

a_{m} B_{m}^{- 2} \leq a_{m - 1} B_{m - 1}^{- 2} (1 - b (2 - b) t^{2} a_{m - 1} B_{m - 1}^{- 2}) .

(16)

We shall need the following simple known Lemma (see, for example, [1], p. 91, in case

C_{1} = C_{2}

).

Lemma 1.

Let

{x_{m}}_{m = 0}^{\infty}

be a sequence of non-negative numbers satisfying the inequalities:

x_{0} \leq C_{1}, x_{m + 1} \leq x_{m} (1 - x_{m} C_{2}), m = 0, 1, 2, \dots, C_{1}, C_{2} > 0 .

Then, we have for each m:

x_{m} \leq {(C_{1}^{- 1} + C_{2} m)}^{- 1} .

Proof.

The proof is by induction on m. For

m = 0

, the statement is true by assumption. We assume

x_{m} \leq {(C_{1}^{- 1} + C_{2} m)}^{- 1}

and prove that

x_{m + 1} \leq {(C_{1}^{- 1} + C_{2} (m + 1))}^{- 1}

. If

x_{m + 1} = 0

, this statement is obvious. Assume, therefore, that

x_{m + 1} > 0

. Then, we have:

x_{m + 1}^{- 1} \geq x_{m}^{- 1} {(1 - x_{m} C_{2})}^{- 1} \geq x_{m}^{- 1} (1 + x_{m} C_{2}) = x_{m}^{- 1} + C_{2} \geq C_{1}^{- 1} + (m + 1) C_{2},

which implies

x_{m + 1} \leq {(C_{1}^{- 1} + C_{2} (m + 1))}^{- 1}

. □

We apply Lemma 1 with

x_{m} : = a_{m} B_{m}^{- 2}

. Then, the inequality

∥ f ∥ \leq {∥ f ∥}_{A_{1} (D)}

implies that we can take

C_{1} = 1

. We set

C_{2} = b (2 - b) t^{2}

and obtain from (16) and Lemma 1:

a_{m} B_{m}^{- 2} \leq {(1 + m b (2 - b) t^{2})}^{- 1} .

(17)

Relations (13) and (15) imply:

a_{m} \leq a_{m - 1} - b (2 - b) y_{m} t a_{m - 1} / B_{m - 1} = a_{m - 1} (1 - b (2 - b) t y_{m} / B_{m - 1}) .

(18)

We now need the following simple inequality: for any

x < 1

and any

a > 0

, we have:

(1 - x) {(1 + x / a)}^{a} \leq 1 .

(19)

Rewriting (14) in the form:

B_{m} = B_{m - 1} (1 + b y_{m} / B_{m - 1})

(20)

and using the inequality (19) with

x = b (2 - b) t y_{m} / B_{m - 1}

and

a = (2 - b) t

, we get from (18) and (20) that:

a_{m} B_{m}^{(2 - b) t} \leq a_{m - 1} B_{m - 1}^{(2 - b) t} \leq \dots \leq {∥ f ∥}^{2} {∥ f ∥}_{A_{1} (D)}^{(2 - b) t} .

(21)

Combining (17) and (21), we obtain:

a_{m}^{(2 - b) t + 2} \leq {∥ f ∥}^{4} {∥ f ∥}_{A_{1} (D)}^{2 (2 - b) t} {(1 + m b (2 - b) t^{2})}^{- (2 - b) t},

which completes the proof of Theorem 2 with

α_{0} : = \frac{(2 - b) t}{(2 - b) t + 2}

. The case

α \leq α_{0}

follows from Lemma 2 below. □

Lower bounds. Let H be an infinite dimensional Hilbert space and

{e_{k}}_{k = 1}^{\infty}

be an orthonormal system in H. Suppose that our symmetric dictionary

D

consists of

\pm e_{k}

,

k = 1, 2, \dots

, and other elements

g \in D

have the property

〈 g, e_{k} 〉 = 0

,

k = 1, 2, \dots

. We present an example in the case

t = 1

. Let

b \in (0, 1]

and m be given. We consider two cases (I)

b \leq 1 / 4

and (II)

b \in (1 / 4, 1]

.

(I). Set

m^{'} : = [2 b m] + 1

and:

f = \sum_{k = 1}^{m^{'}} e_{k} .

Then, at each iteration, the WGA(

1, b

) will pick one of the

e_{k}

,

k \in [1, m^{'}]

with the largest coefficient. After m iterations, we will get:

f_{m} = \sum_{k = 1}^{m^{'}} c_{k} e_{k}, c_{k} \geq 1 - (m / m^{'} + 1) b \geq 1 / 4 .

Therefore, we obtain:

∥ f_{m} ∥ \geq {(m^{'})}^{1 / 2} / 4, ∥ f ∥ = {(m^{'})}^{1 / 2}, {∥ f ∥}_{A_{1} (D)} \leq m^{'} .

Thus, for any

α \in [0, 1]

, we find:

\frac{∥ f_{m} ∥}{{∥ f ∥}^{1 - α} {∥ f ∥}_{A_{1} (D)}^{α}} \geq {(m^{'})}^{- α / 2} / 4 .

(22)

(II). Set:

f = \sum_{k = 1}^{2 m} e_{k} .

Then,

f_{m} = \sum_{k = 1}^{2 m} c_{k} e_{k}, c_{k} = 1, k \in G, c_{k} = 1 - b, k \notin G, | G | = m .

Therefore, we obtain:

∥ f_{m} ∥ \geq {(m)}^{1 / 2}, ∥ f ∥ = {(2 m)}^{1 / 2}, {∥ f ∥}_{A_{1} (D)} \leq 2 m .

Thus, for any

α \in [0, 1]

, we find:

\frac{∥ f_{m} ∥}{{∥ f ∥}^{1 - α} {∥ f ∥}_{A_{1} (D)}^{α}} \geq 2^{- 1 / 2} {(2 m)}^{- α / 2} .

(23)

Bounds (22) and (23) show that in the case

t = 1

, inequality (11) is sharp in the sense of dependence on m and b, when m goes to ∞ and b goes to 0.

Proof of (6).

The lower bound in (6) follows from (23) because in the case of an orthonormal system, the PGA and OGA algorithms coincide. We now prove the upper bound. □

Lemma 2.

Let

g_{m} (α, H)

denote either

γ_{m} (α, H)

or

γ_{m}^{o} (α, H)

. Suppose that for some

β \in (0, 1]

, we have:

g_{m} (β, H) \leq C φ {(m)}^{- β / 2} .

Then, for any

α \in (0, β)

, we have:

g_{m} (α, H) \leq C^{α / β} φ {(m)}^{- α / 2} .

(24)

Proof.

By the definition of the PGA and OGA definitions, we have

∥ f_{m} ∥ \leq ∥ f ∥

and

∥ f_{m}^{o} ∥ \leq ∥ f ∥

. For both PGA and OGA algorithms, the proof is identical. We will carry it out for the PGA. Our assumption gives for any f, any dictionary

D

, and any realization of

G_{m} (f, D)

:

∥ f_{m} ∥ = ∥ f - G_{m} (f, D) ∥ \leq {∥ f ∥}^{1 - β} {∥ f ∥}_{A_{1} (D)}^{β} C φ {(m)}^{- β / 2} .

Therefore, for any

a \in [0, 1]

, we have:

∥ f_{m} ∥ \leq {∥ f ∥}^{1 - a} {({∥ f ∥}^{1 - β} {∥ f ∥}_{A_{1} (D)}^{β} C φ {(m)}^{- β / 2})}^{a} .

(25)

Choosing

a = α / β

, we obtain (24) from (25). □

The upper bound in (6) follows from (5) and Lemma 2.

3. Greedy Expansions in Banach Spaces

In this section, we extend the results from Section 2 to the case of a Banach space instead of a Hilbert space. We begin with some definitions. Let X be a real Banach space with norm

∥ \cdot ∥

. As above, we say that a set of elements (functions)

D

from X is a dictionary (symmetric dictionary) if each

g \in D

has norm one (

∥ g ∥ = 1

) and

\bar{span} D = X

. In addition, we assume for convenience that the dictionary is symmetric:

g \in D implies - g \in D .

In this paper, we study greedy algorithms with regard to

D

that provide greedy expansions. For a nonzero element

f \in X

, we denote by

F_{f}

a norming (peak) functional for f:

∥ F_{f} ∥ = 1, F_{f} (f) = ∥ f ∥ .

The existence of such a functional is guaranteed by Hahn–Banach theorem. Denote

r_{D} (f) : = sup_{F_{f}} sup_{g \in D} F_{f} (g) .

We note that, in general, a norming functional

F_{f}

is not unique. This is why we take

{sup}_{F_{f}}

over all norming functionals of f in the definition of

r_{D} (f)

. It is known that in the case of uniformly smooth Banach spaces (our primary object here), the norming functional

F_{f}

is unique. In such a case, we do not need

{sup}_{F_{f}}

in the definition of

r_{D} (f)

.

We consider here approximation in uniformly smooth Banach spaces. For a Banach space X, we define the modulus of smoothness:

ρ (u) : = ρ (u, X) : = sup_{∥ x ∥ = ∥ y ∥ = 1} (\frac{1}{2} (∥ x + u y ∥ + ∥ x - u y ∥) - 1) .

A uniformly smooth Banach space is one with the property:

lim_{u \to 0} ρ (u) / u = 0 .

It is well known (see, for instance, [22], Lemma B.1) that in the case

X = L_{p}

,

1 \leq p < \infty

, we have:

ρ (u, L_{p}) \leq \{\begin{matrix} u^{p} / p & if 1 \leq p \leq 2, \\ (p - 1) u^{2} / 2 & if 2 \leq p < \infty . \end{matrix}

(26)

We now give a definition of the DGA

(τ, b, μ)

,

τ = {t_{k}}_{k = 1}^{\infty}

,

t_{k} \in (0, 1]

introduced in [21] (see also [1], Ch. 6).

Dual Greedy Algorithm with parameters $(τ, b, μ)$ (DGA $(τ, b, μ)$ ). Let X be a uniformly smooth Banach space with the modulus of smoothness

ρ (u)

and let

μ (u)

be a majorant of

ρ (u)

:

ρ (u) \leq μ (u)

,

u \in [0, \infty)

. For a sequence

τ = {t_{k}}_{k = 1}^{\infty}

,

t_{k} \in (0, 1]

and a parameter

b \in (0, 1]

, we define sequences

{f_{m}}_{m = 0}^{\infty}

,

{φ_{m}}_{m = 1}^{\infty}

,

{c_{m}}_{m = 1}^{\infty}

, and

{G_{m}}_{m = 0}^{\infty}

inductively. Let

f_{0} : = f

and

G_{0} : = 0

. If for

m \geq 1

f_{m - 1} = 0

, then we set

f_{j} = 0

for

j \geq m

and stop. If

f_{m - 1} \neq 0

, then we conduct the following three steps:

(1): take any $φ_{m} \in D$ such that:

$F_{f_{m - 1}} (φ_{m}) \geq t_{m} r_{D} (f_{m - 1});$

(27)
(2): choose $c_{m} > 0$ from the equation:

$∥ f_{m - 1} ∥ μ (c_{m} / ∥ f_{m - 1} ∥) = \frac{t_{m} b}{2} c_{m} r_{D} (f_{m - 1});$

(28)
(3): define:

$f_{m} : = f_{m - 1} - c_{m} φ_{m}, G_{m} : = G_{m}^{τ, b, μ} : = G_{m - 1} + c_{m} φ_{m} .$

(29)

Along with the algorithm DGA

(τ, b, μ)

, we consider a slight modification of it; when at step (2), we find

c_{m}

from the equation (see [21], Remark 3.1):

∥ f_{m - 1} ∥ μ (c_{m} / ∥ f_{m - 1} ∥) = \frac{b}{2} c_{m} F_{f_{m - 1}} (φ_{m}) .

(30)

We denote this modification by DGA

{(τ, b, μ)}^{*}

.

We proceed to studying the rate of convergence of the DGA

(τ, b, μ)

in the uniformly smooth Banach spaces with the power type majorant of the modulus of smoothness:

ρ (u) \leq μ (u) = γ u^{q}

,

1 < q \leq 2

. The following Theorem 3 is from [21] (see also [1], p. 372).

Theorem 3.

Let

τ : = {t_{k}}_{k = 1}^{\infty}

be a nonincreasing sequence

1 \geq t_{1} \geq t_{2} \dots > 0

and

b \in (0, 1)

. Assume that X has a modulus of smoothness

ρ (u) \leq γ u^{q}

,

q \in (1, 2]

. Denote

μ (u) : = γ u^{q}

. Then, for any dictionary

D

and any

f \in A_{1} (D)

, the rate of convergence of the DGA

(τ, b, μ)

is given by:

∥ f_{m} ∥ \leq C (b, γ, q) {(1 + \sum_{k = 1}^{m} t_{k}^{p})}^{- \frac{t_{m} (1 - b)}{p (1 + t_{m} (1 - b))}}, p : = \frac{q}{q - 1} .

Remark 1.

It is pointed out in [21], Remark 3.2, that Theorem 3 holds for the algorithm DGA

{(τ, b, μ)}^{*}

as well.

Theorem 3 is an analog of Theorem 1. We now prove an analog of Theorem 2. We extend Theorem 3 to provide a bound for:

γ_{m}^{t, b, μ} (α, X) : = sup_{D} sup_{f \in A_{1} (D), f \neq 0} sup_{G_{m}^{t, b, μ} (f, D)} \frac{∥ f - G_{m}^{t, b, μ} (f, D) ∥}{{∥ f ∥}^{1 - α} {∥ f ∥}_{A_{1} (D)}^{α}} .

(31)

The corresponding characteristic for the algorithm DGA

{(τ, b, μ)}^{*}

is denoted by

γ_{m}^{t, b, μ} {(α, X)}^{*}

. We prove the following Theorem 4 in the case

t_{k} = t

,

k = 1, 2, \dots

.

Theorem 4.

For any Banach space X with modulus of smoothness

ρ (u, X) \leq γ u^{q}

,

1 < q \leq 2

,

p : = \frac{q}{q - 1}

, we have:

γ_{m}^{t, b, μ} (α, X) \leq {(1 + m c t^{p})}^{- α / p}, c : = (1 - b) {(\frac{b}{2 γ})}^{\frac{1}{q - 1}},

(32)

provided

α \leq \frac{t (1 - b)}{1 + t (1 - b)}

. The same inequality holds for the

γ_{m}^{t, b, μ} {(α, X)}^{*}

.

Proof.

The proof is identical for both characteristics

γ_{m}^{t, b, μ} (α, X)

and

γ_{m}^{t, b, μ} {(α, X)}^{*}

. We carry it out for the

γ_{m}^{t, b, μ} (α, X)

. From the definition of the modulus of smoothness, we have:

∥ f_{n - 1} - c_{n} φ_{n} ∥ + ∥ f_{n - 1} + c_{n} φ_{n} ∥ \leq 2 ∥ f_{n - 1} ∥ (1 + ρ (c_{n} / ∥ f_{n - 1} ∥)) .

(33)

Using the definition of

φ_{n}

:

F_{f_{n - 1}} (φ_{n}) \geq t r_{D} (f_{n - 1}),

(34)

we get:

\begin{matrix} ∥ f_{n - 1} + c_{n} φ_{n} ∥ \geq F_{f_{n - 1}} (f_{n - 1} + c_{n} φ_{n}) \\ = ∥ f_{n - 1} ∥ + c_{n} F_{f_{n - 1}} (φ_{n}) \geq ∥ f_{n - 1} ∥ + c_{n} t r_{D} (f_{n - 1}) . \end{matrix}

(35)

Combining (33) and (35), we get:

∥ f_{n} ∥ = ∥ f_{n - 1} - c_{n} φ_{n} ∥ \leq ∥ f_{n - 1} ∥ (1 + 2 ρ (c_{n} / ∥ f_{n - 1} ∥)) - c_{n} t r_{D} (f_{n - 1}) .

(36)

Using the choice of

c_{m}

, we get from here:

∥ f_{m} ∥ \leq ∥ f_{m - 1} ∥ - t (1 - b) c_{m} r_{D} (f_{m - 1}) .

(37)

Thus, we need to estimate from below

c_{m} r_{D} (f_{m - 1})

. It is clear that:

∥ f_{m - 1} ∥_{A_{1} (D)} = ∥ f - \sum_{j = 1}^{m - 1} c_{j} φ_{j} ∥_{A_{1} (D)} \leq {∥ f ∥}_{A_{1} (D)} + \sum_{j = 1}^{m - 1} c_{j} .

(38)

Denote

B_{n} : = {∥ f ∥}_{A_{1} (D)} + \sum_{j = 1}^{n} c_{j}

. Then, by (38), we have:

{∥ f_{m - 1} ∥}_{A_{1} (D)} \leq B_{m - 1} .

Next, by Lemma 6.10 from [1], p. 343, we obtain:

\begin{matrix} r_{D} (f_{m - 1}) = sup_{g \in D} F_{f_{m - 1}} (g) = sup_{φ \in A_{1} (D)} F_{f_{m - 1}} (φ) \\ \geq ∥ f_{m - 1} ∥_{A_{1} (D)}^{- 1} F_{f_{m - 1}} (f_{m - 1}) \geq ∥ f_{m - 1} ∥ / B_{m - 1} . \end{matrix}

(39)

Substituting (39) into (37), we get:

∥ f_{m} ∥ \leq ∥ f_{m - 1} ∥ (1 - t (1 - b) c_{m} / B_{m - 1}) .

(40)

From the definition of

B_{m}

, we find:

B_{m} = B_{m - 1} + c_{m} = B_{m - 1} (1 + c_{m} / B_{m - 1}) .

Using the inequality:

{(1 + x)}^{α} \leq 1 + α x, 0 \leq α \leq 1, x \geq 0,

we obtain:

B_{m}^{t (1 - b)} \leq B_{m - 1}^{t (1 - b)} (1 + t (1 - b) c_{m} / B_{m - 1}) .

(41)

Multiplying (40) and (41), we get:

∥ f_{m} ∥ B_{m}^{t (1 - b)} \leq ∥ f_{m - 1} ∥ B_{m - 1}^{t (1 - b)} \leq \dots \leq {∥ f ∥ ∥ f ∥}_{A_{1} (D)}^{t (1 - b)} .

(42)

The function

μ (u) / u = γ u^{q - 1}

is increasing on

[0, \infty)

. Therefore, the

c_{m}

from (28) is greater than or equal to

c_{m}^{'}

from the following Equation (43) (see (39)):

γ ∥ f_{m - 1} ∥ (c_{m}^{'} / ∥ f_{m - 1} {∥)}^{q} = \frac{t b}{2} c_{m}^{'} ∥ f_{m - 1} ∥ / B_{m - 1},

(43)

c_{m}^{'} = {(\frac{t b}{2 γ})}^{\frac{1}{q - 1}} \frac{∥ f_{m - 1} ∥^{\frac{q}{q - 1}}}{B_{m - 1}^{\frac{1}{q - 1}}} .

(44)

Using notations:

p : = \frac{q}{q - 1}, c : = (1 - b) {(\frac{b}{2 γ})}^{\frac{1}{q - 1}},

we get from (37), (39), (44):

∥ f_{m} ∥ \leq ∥ f_{m - 1} ∥ (1 - c t^{p} \frac{∥ f_{m - 1} ∥^{p}}{B_{m - 1}^{p}}) .

(45)

Noting that

B_{m} \geq B_{m - 1}

, we derive from (45) that:

(∥ f_{m} ∥ / B_{m})^{p} \leq (∥ f_{m - 1} ∥ / B_{m - 1})^{p} (1 - c t^{p} (∥ f_{m - 1} ∥ / B_{m - 1})^{p}) .

(46)

Taking into account that

∥ f ∥ \leq {∥ f ∥}_{A_{1} (D)}

, we obtain from (46) by Lemma 1 with

C_{1} = 1

,

C_{2} = c t^{p}

:

(∥ f_{m} {∥ / B_{m})}^{p} \leq {(1 + m c t^{p})}^{- 1} .

(47)

Combining (42) and (47), we get:

∥ f_{m} {∥ \leq ∥ f ∥}^{1 - α_{0}} {∥ f ∥}_{A_{1} (D)}^{α_{0}} {(1 + m c t^{p})}^{- α_{0} / p}, p : = \frac{q}{q - 1}, α_{0} : = \frac{t (1 - b)}{1 + t (1 - b)} .

This completes the proof of Theorem 4 for

α = α_{0}

. The case

α < α_{0}

follows from the case

α = α_{0}

and the corresponding analog of Lemma 2. □

Let us discuss an application of Theorem 4 in the case of a Hilbert space. It is well known and easy to check that for a Hilbert space H, one has:

ρ (u) \leq {(1 + u^{2})}^{1 / 2} - 1 \leq u^{2} / 2 .

Let us figure out how the DGA

(t, b, u^{2} / 2)

works in a Hilbert space. Consider the mth step of it. Let

φ_{m} \in D

be from (27) with

t_{m} = t

(we assume existence in case

t = 1

). Then, it is clear that for

φ_{m}

, we have:

〈 f_{m - 1}, φ_{m} 〉 \geq t ∥ f_{m - 1} ∥ r_{D} (f_{m - 1}) = t sup_{g \in D} 〈 f_{m - 1}, g 〉 .

The WGA(

t, 1

) would use

φ_{m}

with the coefficient

〈 f_{m - 1}, φ_{m} 〉

at this step. The DGA

{(t, b, u^{2} / 2)}^{*}

, like WGA(

t, b

), uses the same

φ_{m}

and only a fraction of

〈 f_{m - 1}, φ_{m} 〉

:

c_{m} = b ∥ f_{m - 1} ∥ F_{f_{m - 1}} (φ_{m}) = b 〈 f_{m - 1}, φ_{m} 〉 .

(48)

Thus, the choice

b = 1

in (48) corresponds to the WGA. However, it is clear from the above considerations that our technique, designed for general Banach spaces, does not work in the case

b = 1

. By Theorem 4 with

μ (u) = u^{2} / 2

, the DGA

(t, b, μ)

and DGA

{(t, b, μ)}^{*}

provide the following error estimate:

∥ f_{m} {∥ \leq ∥ f ∥}^{1 - α} {∥ f ∥}_{A_{1} (D)}^{α} {(1 + m c t^{2})}^{- α / 2}, α \leq α_{0} : = \frac{t (1 - b)}{1 + t (1 - b)} .

(49)

Note that the inequality (49) is similar to the corresponding inequality, which follows from Theorem 2, for

α \leq α_{1} : = \frac{t (2 - b)}{2 + t (2 - b)}

. It is easy to check that

α_{0} < α_{1}

, which means that Theorem 2 gives a stronger result than the corresponding corollary of Theorem 4.

A remark on lower bounds. In Section 2, we obtained the lower bounds, which are sharp in both parameters m and b. Clearly, the most important parameter is m. Here, we obtain the lower bounds in m, which apply to any algorithm providing m-term approximation after m iterations. Recall the definition of the concept of m-term approximation with respect to a given dictionary

D

. Given an integer

m \in N

, we denote by

Σ_{m} (D)

the set of all m-term approximants with respect to

D

:

Σ_{m} (D) : = \{h \in X : h = \sum_{i = 1}^{m} c_{i} g_{i}, g_{i} \in D, c_{i} \in R, i = 1, \dots, m\} .

Define for a Banach space X:

σ_{m} {(f, D)}_{X} : = inf_{h \in Σ_{m} (D)} {∥ f - h ∥}_{X}

to be the best m-term approximation of

f \in X

in the X-norm with respect to

D

.

Let

1 < q \leq 2

. Consider

X = l_{q}

. It is known ([23], p. 67) that

l_{q}

,

1 < q \leq 2

, is a uniformly smooth Banach space with a modulus of smoothness

ρ (u)

of power type q:

ρ (u) \leq γ u^{q}

. Choose

D : = E

as a symmetrized standard basis

{\pm e_{j}}_{j = 1}^{\infty}

,

e_{j} : = (0, \dots, 0, 1, 0, \dots)

, for

l_{q}

. For a given

m \in N

, set:

f : = \sum_{i = 1}^{2 m} e_{i} .

Then, the following relations are obvious:

{∥ f ∥}_{l_{q}} = {(2 m)}^{1 / q}, {∥ f ∥}_{A_{1} (E)} = 2 m, σ_{m} {(f, E)}_{l_{q}} = m^{1 / q} .

Therefore, for any

α \in [0, 1]

:

\frac{σ_{m} {(f, E)}_{l_{q}}}{{∥ f ∥}_{l_{q}}^{1 - α} {∥ f ∥}_{A_{1} (E)}^{α}} \geq \frac{1}{2} m^{- α / p}, p : = \frac{q}{q - 1} .

(50)

This means that the upper bounds provided by Theorem 4 are sharp in m for any fixed parameters t and b. More precisely, for every

q \in (1, 2]

, there exists a Banach space X with

ρ (u, X) \leq γ u^{q}

, a dictionary

D \subset X

, and

f \in X

such that the following inequality holds:

\frac{σ_{m} {(f, D)}_{X}}{{∥ f ∥}_{X}^{1 - α} {∥ f ∥}_{A_{1} (D)}^{α}} \geq \frac{1}{2} m^{- α / p}, p : = \frac{q}{q - 1} .

(51)

Inequality (51) follows directly from (50) with

X = l_{q}

and

D = E

.

Remark 2.

Inequality (51) gives the lower bound for the best m-term approximation. It is known (see [1], Ch. 6) that there are greedy-type algorithms; for instance, the Weak Chebyshev Greedy Algorithm and the Weak Greedy Algorithm with Free Relaxation with the weakness parameter

t \in (0, 1]

, which provide the following rate of convergence for

f \in X

with

ρ (u, X) \leq γ u^{q}

,

1 < q \leq 2

,

∥ f_{m} ∥_{X} \leq C (q, γ) {(1 + m t^{p})}^{- 1 / p} {∥ f ∥}_{A_{1} (D)}, p : = \frac{q}{q - 1} .

This means that the corresponding upper bound in (51) (in the sense of order) can be realized by a greedy-type algorithm.

Note that for specific X and

D

, the inequality (51) may be improved. We illustrate it on the example of

X = l_{q}

with

q \in (2, \infty)

and

D = E

. Without loss of generality, we can assume that:

f = \sum_{i = 1}^{\infty} c_{i} e_{i}, c_{1} \geq c_{2} \geq \dots \geq 0 .

Then,

{∥ f ∥}_{l_{q}} = {(\sum_{i = 1}^{\infty} c_{i}^{q})}^{1 / q} {, ∥ f ∥}_{A_{1} (E)} = {∥ f ∥}_{l_{1}} = \sum_{i = 1}^{\infty} c_{i} .

(52)

We now estimate from above:

σ_{m} {(f, E)}_{l_{q}} = {(\sum_{i = m + 1}^{\infty} c_{i}^{q})}^{1 / q} .

Our monotonicity assumption on

{c_{i}}

implies:

c_{m} \leq m^{- 1 / q} {∥ f ∥}_{l_{q}}, c_{m} \leq m^{- 1} {∥ f ∥}_{l_{1}}

and, therefore, for any

β \in [0, 1]

:

c_{m} \leq m^{- (1 / q) (1 - β) - β} {∥ f ∥}_{l_{q}}^{1 - β} {∥ f ∥}_{l_{1}}^{β} .

(53)

Setting

α : = β (1 - 1 / q) + 1 / q

, we obtain from here with

p : = \frac{q}{q - 1}

:

σ_{m} {(f, E)}_{l_{q}} \leq c_{m}^{1 - 1 / q} {∥ f ∥}_{l_{1}}^{1 / q} \leq m^{- α / p} {∥ f ∥}_{l_{q}}^{1 - α} {∥ f ∥}_{l_{1}}^{α} .

(54)

Therefore, for any

α \in [0, 1]

, we have for all

f \in l_{q}

:

\frac{σ_{m} {(f, E)}_{l_{q}}}{{∥ f ∥}_{l_{q}}^{1 - α} {∥ f ∥}_{A_{1} (E)}^{α}} \leq m^{- α / p}, p : = \frac{q}{q - 1} .

(55)

Note that it is known (see (26) above) that the space

l_{q}

with

q \in [2, \infty)

has the modulus of smoothness of the power type 2. Thus, Theorem 4 gives an analog of (55) with a weaker rate of decay

m^{- α / 2}

than

m^{- α / p}

in (55).

We briefly discuss another example of the

L_{q}

-type spaces with

q \in (2, \infty)

. Consider the space

L_{q} (0, 2 π)

of real

2 π

-periodic functions and take

D = T

to be the real trigonometric system. For a given

m \in N

, set:

f : = \sum_{k = 1}^{2 m} cos (2^{k} x) .

Then, it is well known that:

{∥ f ∥}_{L_{q}} \leq C (q) {(2 m)}^{1 / 2}, {∥ f ∥}_{A_{1} (T)} = 2 m .

Moreover,

σ_{m} {(f, T)}_{L_{q}} \geq σ_{m} {(f, T)}_{L_{2}} \geq C m^{1 / 2} .

Therefore, for any

α \in [0, 1]

, we obtain:

\frac{σ_{m} {(f, T)}_{L_{q}}}{{∥ f ∥}_{L_{q}}^{1 - α} {∥ f ∥}_{A_{1} (T)}^{α}} \geq C^{'} (q) m^{- α / 2} .

(56)

4. Discussion

In this paper, we propose and study a new criterion for the evaluation of efficiency of a greedy algorithm. This criterion takes into account two characteristics of an element f, which we approximate: its norm

∥ f ∥

(either in a Hilbert or in a Banach space) and the

{∥ f ∥}_{A_{1} (D)}

. We test this new criterion on two standard classes of algorithms. The first class consists of algorithms which provide for each element f an expansion with respect to a given dictionary

D

. In this paper, we discuss two such algorithms: the PGA and the WGA(

t, b

) for Hilbert spaces, and the DGA(

t, b, μ

) for Banach spaces. Using our new criterion, we compare the efficiency of these algorithms with algorithms from the second class, which include the OGA (Hilbert spaces) and its generalization for Banach spaces—the Weak Chebyshev Greedy Algorithm. Algorithms from the second class are known for providing optimal in the sense of order rate of m-term approximation for the class

A_{1} (D)

. These algorithms do not provide expansions. In this paper, we show that from the point of view of our new criterion algorithms, PGA, WGA(

t, b

), and DGA(

t, b, μ

) are optimal in the sense of order. We illustrate this fact on the example of WGA(

1, b

). Theorem 2 with

t = 1

gives the following upper bound for

α \leq \frac{2 - b}{4 - b}

:

γ_{m}^{1, b} (α, H) \leq {(1 + m b)}^{- α / 2} .

(57)

Inequality (22) shows that this bound cannot be improved in the sense of order. Moreover, inequality (51) with

p = 2

, which corresponds to a Hilbert space, shows that (57) cannot be improved in the sense of order with respect to m even if instead of the algorithm WGA(

1, b

), we use the best m-term approximations.

We point out that our new results provide better bounds for the accuracy than known results in the case of small

∥ f ∥

. For simplicity, we illustrate that on the example of the PGA. Suppose that

f \in A_{1} (D)

. Then, Ref [16] gives the bound

∥ f_{m} ∥ \leq m^{- 1 / 6}

, which does not include the norm

∥ f ∥

. The upper bound in (4) with

α = 1 / 3

gives:

∥ f_{m} ∥ \leq {∥ f ∥}^{2 / 3} m^{- 1 / 6},

which is better than

∥ f_{m} ∥ \leq m^{- 1 / 6}

, when

∥ f ∥

is small.

In this paper, we only applied our new criterion to a few important greedy algorithms and established a new qualitative effect on optimality (in the sense of order) of some of the algorithms, which provide expansions. It will be interesting to apply this criterion to other greedy algorithms.

Funding

This research was funded by Russian Science Foundation, grant number 23-71-30001.

Data Availability Statement

Not applicable.

Acknowledgments

The author is grateful to the referees for their useful comments and suggestions. This research was supported by the Russian Science Foundation (project No. 23-71-30001) at the Lomonosov Moscow State University.

Conflicts of Interest

The author declares no conflict of interest.

References

Temlyakov, V. Greedy Approximation; Cambridge University Press: New York, NY, USA, 2011. [Google Scholar]
Needell, D.; Vershynin, R. Uniform uncertainty principle and signal recovery via orthogonal matching pursuit. Found. Comput. Math. 2009, 9, 317–334. [Google Scholar] [CrossRef]
Needell, D.; Tropp, J.A. CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal. 2009, 26, 301–321. [Google Scholar] [CrossRef] [Green Version]
Dai, W.; Milenkovic, O. Subspace pursuit for compressive sensing signal reconstruction. IEEE Trans. Inf. Theory 2009, 55, 2230–2249. [Google Scholar] [CrossRef] [Green Version]
Petrova, G. Rescaled Pure Greedy Algorithm for Hilbert and Banach spaces. Appl. Comput. Harmon. Anal. 2016, 41, 852–866. [Google Scholar] [CrossRef] [Green Version]
Dereventsov, A.V.; Temlyakov, V.N. A unified way of analyzing some greedy algorithms. J. Funct. Anal. 2019, 277, 108286. [Google Scholar] [CrossRef] [Green Version]
DeVore, R.A.; Temlyakov, V.N. Convex optimization on Banach spaces. Found. Comput. Math. 2016, 16, 369–394. [Google Scholar] [CrossRef] [Green Version]
Chandrasekaran, V.; Recht, B.; Parrilo, P.; Willsky, A.A. The convex geometry of linear inverse problems. Found. Comput. Math. 2012, 12, 805–849. [Google Scholar] [CrossRef] [Green Version]
Dereventsov, A.V.; Temlyakov, V.N. Biorthogonal greedy algorithms in convex optimization. Appl. Comput. Harmon. Anal. 2022, 60, 489–511. [Google Scholar] [CrossRef]
Figueiredo, M.A.; Nowak, R.D.; Wright, S.J. Gradient projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems. IEEE Sel. Top. Signal Process. 2007, 1, 586–597. [Google Scholar] [CrossRef] [Green Version]
Jaggi, M. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. ICML 2013, 1, 427–435. [Google Scholar]
Shalev-Shwartz, S.; Srebro, N.; Zhang, T. Trading accuracy for sparsity in optimization problems with sparsity constrains. SIAM J. Optim. 2010, 20, 2807–2832. [Google Scholar] [CrossRef] [Green Version]
Tewari, A.P.; Ravikumar, K.; Dhillon, I.S. Greedy algorithms for structurally constrained high dimensional problems. Adv. Neural Inf. Process. Syst. 2011, 24, 882–890. [Google Scholar]
Zhang, T. Sequential greedy approximation for certain convex optimization problems. IEEE Trans. Inf. Theory 2003, 49, 682–691. [Google Scholar] [CrossRef]
Temlyakov, V. Multivariate Approximation; Cambridge University Press: New York, NY, USA, 2018. [Google Scholar]
DeVore, R.A.; Temlyakov, V.N. Some remarks on Greedy Algorithms. Adv. Comp. Math. 1996, 5, 173–187. [Google Scholar] [CrossRef]
Livshitz, E.D. On lower estimates of rate of convergence of greedy algorithms. Izv. RAN Ser. Matem. 2009, 73, 125–144. [Google Scholar]
Konyagin, S.V.; Temlyakov, V.N. Rate of convergence of Pure Greedy Algorithm. East J. Approx. 1999, 5, 493–499. [Google Scholar]
Sil’nichenko, A.V. Rate of convergence of greedy algorithms. Matem. Zametki 2004, 76, 628–632. [Google Scholar] [CrossRef]
Nelson, J.L.; Temlyakov, V.N. Greedy expansions in Hilbert Spaces. Proc. Steklov Inst. Math. 2013, 280, 227–239. [Google Scholar] [CrossRef]
Temlyakov, V.N. Greedy Expansions in Banach Spaces. Adv. Comput. Math. 2007, 26, 431–449. [Google Scholar] [CrossRef]
Donahue, M.; Gurvits, L.; Darken, C.; Sontag, E. Rate of convex approximation in non-Hilbert spaces. Constr. Approx. 1997, 13, 187–220. [Google Scholar] [CrossRef]
Lindenstrauss, J.; Tzafriri, L. Classical Banach Spaces I; Springer: Berlin/Heidelberg, Germany, 1977. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Temlyakov, V. On the Rate of Convergence of Greedy Algorithms. Mathematics 2023, 11, 2559. https://doi.org/10.3390/math11112559

AMA Style

Temlyakov V. On the Rate of Convergence of Greedy Algorithms. Mathematics. 2023; 11(11):2559. https://doi.org/10.3390/math11112559

Chicago/Turabian Style

Temlyakov, Vladimir. 2023. "On the Rate of Convergence of Greedy Algorithms" Mathematics 11, no. 11: 2559. https://doi.org/10.3390/math11112559

APA Style

Temlyakov, V. (2023). On the Rate of Convergence of Greedy Algorithms. Mathematics, 11(11), 2559. https://doi.org/10.3390/math11112559

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On the Rate of Convergence of Greedy Algorithms

Abstract

1. Introduction

2. Hilbert Space: The Weak Greedy Algorithm with Parameter $b$

3. Greedy Expansions in Banach Spaces

4. Discussion

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

On the Rate of Convergence of Greedy Algorithms

Abstract

1. Introduction

2. Hilbert Space: The Weak Greedy Algorithm with Parameter b

3. Greedy Expansions in Banach Spaces

4. Discussion

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2. Hilbert Space: The Weak Greedy Algorithm with Parameter $b$