1. Introduction
This paper is devoted to the theoretical study of the efficiency of some greedy algorithms. Greedy algorithms are very useful in applications; in particular, adaptive methods are used in PDE solvers, and sparse approximation is used in image/signal/data processing, as well as in the design of neural networks, and in convex optimization. This fact motivated deep theoretical study of a variety of greedy algorithms. In this paper, we study two the most popular greedy algorithms—the Pure Greedy Algorithm (PGA) and the Orthogonal Greedy Algorithm (OGA) and their natural modifications. The reader can find other important greedy algorithms in the book [
1] and in the papers (Regularized Orthogonal Matching Pursuit) [
2], (Compressive Sampling Matching Pursuit) [
3], (Subspace Pursuit) [
4], (Rescaled Pure Greedy Algorithm) [
5], and (Biorthogonal Greedy Algorithm) [
6]. The reader can find some results on the application of greedy algorithms in convex optimization in the papers [
7,
8,
9,
10,
11,
12,
13,
14].
There are different criteria for the theoretical efficiency of greedy algorithms. All of them are based on the accuracy of the algorithm (the error after the
mth iteration of the algorithm) and take different forms. One of those criteria uses the worst error over a given class of incomes (elements, which we approximate by the algorithm). We discuss a variant of this criterion in this paper. There is another, a more delicate criterion, which is based on the Lebesgue-type inequalities for individual elements. We do not discuss this criterion here. The reader can find a survey of the corresponding results in [
15], Ch. 8. We now proceed to a detailed discussion of our results.
Let us begin with a general description of the problem. Let
X be a Banach space with the norm
and
be a subspace of
X with a stronger norm
,
. Consider a homogeneous approximation operator (linear or nonlinear)
,
,
,
, and the error of approximation:
Then, for any
, we have:
The characteristic plays an important role in approximation theory, with many classical examples of spaces X and Y; for instance, and Y is one the smoothness spaces such as Sobolev, Nikol’skii, or Besov space.
In this paper, we focus on the following version of the inequality (
1): find the best
such that the inequality:
holds for all
. Clearly,
. Additionally, it is clear that under assumption:
,
, we obtain the trivial bound:
In this paper, we discuss greedy approximation with respect to a given dictionary and prove some nontrivial inequalities for
both in the case of
X being a Hilbert space and
X being a Banach space. In particular, we establish that some greedy algorithms (Pure Greedy Algorithm (PGA) and its generalizations) are as good as the Orthogonal Greedy Algorithm (OGA) in the sense of inequality (
2), while it is known that the the PGA is much worse than the OGA in the sense of the inequality (
1) (for definitions and precise formulations, see below).
Let
H be a real Hilbert space with the inner product
and norm
. We say that a set of elements (functions)
from
H is a dictionary (symmetric dictionary) if each
has norm one (
) and
. In addition, we assume for convenience the property of symmetry:
We define the Pure Greedy Algorithm (PGA). We describe this algorithm for a general dictionary
. If
, we let
be an element from
which maximizes
. We assume for simplicity that such a maximizer exists; if not, suitable modifications are necessary (see Weak Greedy Algorithm below) in the algorithm that follows. We define:
Pure Greedy Algorithm (PGA). We define
and
. Then, for each
, we inductively define
Note that for a given element f, the sequence may not be unique.
This algorithm is well studied from the point of view of convergence and rate of convergence. The reader can find the corresponding results and historical comments in [
1], Ch. 2. In this paper, we focus on the rate of convergence. Typically, in approximation theory, we define the rate of convergence for specific classes. In classical approximation theory, these are smoothness classes. Clearly, in the general setting with arbitrary
H and
, we do not have a concept of smoothness similar to the classical smoothness of functions. It turns out that the geometrically defined class, namely, the closure of the convex hull of
, which we denote by
, is a very natural class. For each
, we associate the following norm:
Clearly,
. Then, the problem of the rate of convergence of the PGA can be formulated as follows (see [
1], p. 95). Find the order of decay of the sequence:
where the supremum is taken over all possible choices of
, over all elements
,
,
, and over all dictionaries
. This problem is a central theoretical problem in greedy approximation in Hilbert spaces and it is still open. We mention some of the known results here and refer the reader, for the detailed history of the problem, to [
1], Ch. 2. It is clear that for any
, such that
we have:
In this paper, we discuss the following extension of the asymptotic characteristic
: for
define:
The first upper bound on
was obtained in [
16]:
Actually, the proof in [
16] (see also [
1], pp. 92–93) gives:
We establish here the following bounds:
Additionally, in
Section 2, we find the right behavior of the asymptotic characteristic similar to
for a more general algorithm than the PGA, namely, for the Weak Greedy Algorithm with parameter
b.
It is interesting to compare the rates of convergence of the PGA and the Orthogonal Greedy Algorithm (OGA). We now give a brief definition of the OGA. We define
,
and for
, we inductively define
to be the orthogonal projection of
f onto the span of
,…,
and set
. The analogs of the characteristics
and
for the OGA are denoted by
and
. The following bound is proved in [
16] (see also [
1], p. 93):
It is known (see [
17]) that
decays slower than
. Therefore, from the point of view of the characteristics
and
, the OGA is much better than the PGA. We establish here the following bounds:
This means that from the point of view of the characteristics
and
, the OGA is the same (in the sense of order) as PGA for
. This is a very surprising fact.
We do not know if the upper bound in (
4) holds for
. However, the inequality in (
3) and the lower bound for the
show that:
Therefore, the upper bound in (
4) cannot be extended beyond
.
Section 3 deals with the case of a Banach space
X. The results for the Banach space case are similar to those for Hilbert spaces, but are not as sharp as their counterparts.
Novelty. In this paper, we suggest a new criterion for the evaluation of the theoretical efficiency of a greedy algorithm. The classical criterion uses the worst error (for instance, ) of approximation of elements from the class by our algorithm (for instance, the PGA). In other words, this criterion uses the norm for estimating the error of approximation of f. Our new criterion uses two norms, and ; more precisely, the weighted product of these norms , . The most important qualitative discovery of this paper is that the PGA and its natural modifications have the same theoretical efficiency for some as the OGA and its modifications. It is known that in accordance with the old criterion, the PGA is much worse than the OGA.
Method. The standard way of analyzing the accuracy of a greedy algorithm is based on estimating from below the difference
and then solving (estimating) the corresponding recurrent inequalities. For instance, this method works very well for the OGA. In the paper [
16], this standard method was modified in such a way that it allowed us to obtain some new nontrivial upper bounds for
. The method from [
16] simultaneously analyzes two sequences, the
and the sequence of sums of absolute values of the coefficients of the
. In this paper, we further develop the method from [
16].
The results of this paper (see Theorem 2 and lower bounds in
Section 2) show that our method, which is a development of the method from [
16], is optimal for proving the upper bounds for
in the case
. It is known that the bound
, which was obtained in [
16], is not optimal. The bound
was proved in [
18] by a method distinct from the one in [
16]. The method from [
18] was further developed in [
19,
20]. It would be interesting to understand if the method from [
18] and its further developments allow us to prove an analog of Theorem 2 for
.
Conclusion. The PGA at the
mth iteration searches for an element
, which maximizes the inner product
over all
. Then, the update is very easy:
. The OGA, like the PGA at the
mth iteration, searches for an element
, which maximizes the inner product
over all
. However, the second steps of the PGA and OGA at the
mth iteration are different—the OGA makes an orthogonal projection onto the span of
, …,
. Clearly, this step of the OGA is more difficult than the corresponding step of the PGA. Moreover, it is clear from the definition of the PGA that it provides an expansion of
f into a series with respect to
. The OGA does not provide an expansion. Thus, the advantage of the PGA over the OGA is that it is simpler and provides an expansion. The advantage of the OGA over the PGA is that in accordance with the old criterion, we can guarantee better accuracy for an
. The results of this paper show that in accordance with the new criterion, the OGA does not have an advantage in the sense of accuracy for some parameters
. Similar results are obtained for the modifications of the PGA—the Weak Greedy Algorithm with parameter
b in Hilbert spaces (see
Section 2) and the Dual Greedy Algorithm with parameters
in Banach spaces (see
Section 3).
2. Hilbert Space: The Weak Greedy Algorithm with Parameter
Let a sequence , , and a parameter be given. We define the Weak Greedy Algorithm with parameter b.
Weak Greedy Algorithm with parameter b (WGA()). We define . Then, for each , we inductively define:
- (1)
is any satisfying:
- (2)
- (3)
In the case , , we write t in the notation instead of .
We proceed to the rate of convergence. The following Theorem 1 was proved in [
21].
Theorem 1. Let be an arbitrary dictionary in H. Assume is a nonincreasing sequence and . Then, for , we have:where: Theorem 1 implies the following inequality for any
f and any
:
We now extend Theorem 1 to provide a bound for:
We prove the following Theorem 2 in the case , .
Theorem 2. For any Hilbert space H, we have:provided . Proof. The proof of this theorem goes along the lines of the proof of Theorem 1 in [
21]. Let
be a dictionary in
H. We introduce some notations:
and consider the sequence
defined as follows:
It is clear that
,
. By Lemma 3.5 from [
16] (see also [
1], p. 91, Lemma 2.17), we get:
From here and from the equality:
we obtain the following relations:
From (
13) and (
15), we obtain:
Using that
, we derive from here:
We shall need the following simple known Lemma (see, for example, [
1], p. 91, in case
).
Lemma 1. Let be a sequence of non-negative numbers satisfying the inequalities: Then, we have for each m: Proof. The proof is by induction on
m. For
, the statement is true by assumption. We assume
and prove that
. If
, this statement is obvious. Assume, therefore, that
. Then, we have:
which implies
. □
We apply Lemma 1 with
. Then, the inequality
implies that we can take
. We set
and obtain from (
16) and Lemma 1:
Relations (
13) and (
15) imply:
We now need the following simple inequality: for any
and any
, we have:
Rewriting (
14) in the form:
and using the inequality (
19) with
and
, we get from (
18) and (
20) that:
Combining (
17) and (
21), we obtain:
which completes the proof of Theorem 2 with
. The case
follows from Lemma 2 below. □
Lower bounds. Let H be an infinite dimensional Hilbert space and be an orthonormal system in H. Suppose that our symmetric dictionary consists of , , and other elements have the property , . We present an example in the case . Let and m be given. We consider two cases (I) and (II) .
(I). Set
and:
Then, at each iteration, the WGA(
) will pick one of the
,
with the largest coefficient. After
m iterations, we will get:
Thus, for any
, we find:
Thus, for any
, we find:
Bounds (
22) and (
23) show that in the case
, inequality (
11) is sharp in the sense of dependence on
m and
b, when
m goes to
∞ and
b goes to 0.
Proof of (6). The lower bound in (
6) follows from (
23) because in the case of an orthonormal system, the PGA and OGA algorithms coincide. We now prove the upper bound. □
Lemma 2. Let denote either or . Suppose that for some , we have: Then, for any , we have: Proof. By the definition of the PGA and OGA definitions, we have
and
. For both PGA and OGA algorithms, the proof is identical. We will carry it out for the PGA. Our assumption gives for any
f, any dictionary
, and any realization of
:
Therefore, for any
, we have:
Choosing
, we obtain (
24) from (
25). □
The upper bound in (
6) follows from (
5) and Lemma 2.
3. Greedy Expansions in Banach Spaces
In this section, we extend the results from
Section 2 to the case of a Banach space instead of a Hilbert space. We begin with some definitions. Let
X be a real Banach space with norm
. As above, we say that a set of elements (functions)
from
X is a dictionary (symmetric dictionary) if each
has norm one (
) and
. In addition, we assume for convenience that the dictionary is symmetric:
In this paper, we study greedy algorithms with regard to
that provide greedy expansions. For a nonzero element
, we denote by
a norming (peak) functional for
f:
The existence of such a functional is guaranteed by Hahn–Banach theorem. Denote
We note that, in general, a norming functional is not unique. This is why we take over all norming functionals of f in the definition of . It is known that in the case of uniformly smooth Banach spaces (our primary object here), the norming functional is unique. In such a case, we do not need in the definition of .
We consider here approximation in uniformly smooth Banach spaces. For a Banach space
X, we define the modulus of smoothness:
A uniformly smooth Banach space is one with the property:
It is well known (see, for instance, [
22], Lemma B.1) that in the case
,
, we have:
We now give a definition of the DGA
,
,
introduced in [
21] (see also [
1], Ch. 6).
Dual Greedy Algorithm with parameters (DGA). Let X be a uniformly smooth Banach space with the modulus of smoothness and let be a majorant of : , . For a sequence , and a parameter , we define sequences , , , and inductively. Let and . If for , then we set for and stop. If , then we conduct the following three steps:
- (1)
take any
such that:
- (2)
choose
from the equation:
- (3)
Along with the algorithm DGA
, we consider a slight modification of it; when at step (2), we find
from the equation (see [
21], Remark 3.1):
We denote this modification by DGA
.
We proceed to studying the rate of convergence of the DGA
in the uniformly smooth Banach spaces with the power type majorant of the modulus of smoothness:
,
. The following Theorem 3 is from [
21] (see also [
1], p. 372).
Theorem 3. Let be a nonincreasing sequence and . Assume that X has a modulus of smoothness , . Denote . Then, for any dictionary and any , the rate of convergence of the DGA is given by: Remark 1. It is pointed out in [21], Remark 3.2, that Theorem 3 holds for the algorithm DGA as well. Theorem 3 is an analog of Theorem 1. We now prove an analog of Theorem 2. We extend Theorem 3 to provide a bound for:
The corresponding characteristic for the algorithm DGA
is denoted by
. We prove the following Theorem 4 in the case
,
.
Theorem 4. For any Banach space X with modulus of smoothness , , , we have:provided . The same inequality holds for the . Proof. The proof is identical for both characteristics
and
. We carry it out for the
. From the definition of the modulus of smoothness, we have:
Using the definition of
:
we get:
Combining (
33) and (
35), we get:
Using the choice of
, we get from here:
Thus, we need to estimate from below
. It is clear that:
Denote
. Then, by (
38), we have:
Next, by Lemma 6.10 from [
1], p. 343, we obtain:
Substituting (
39) into (
37), we get:
From the definition of
, we find:
Using the inequality:
we obtain:
Multiplying (
40) and (
41), we get:
The function
is increasing on
. Therefore, the
from (
28) is greater than or equal to
from the following Equation (
43) (see (
39)):
Using notations:
we get from (
37), (
39), (
44):
Noting that
, we derive from (
45) that:
Taking into account that
, we obtain from (
46) by Lemma 1 with
,
:
Combining (
42) and (
47), we get:
This completes the proof of Theorem 4 for . The case follows from the case and the corresponding analog of Lemma 2. □
Let us discuss an application of Theorem 4 in the case of a Hilbert space. It is well known and easy to check that for a Hilbert space
H, one has:
Let us figure out how the DGA
works in a Hilbert space. Consider the
mth step of it. Let
be from (
27) with
(we assume existence in case
). Then, it is clear that for
, we have:
The WGA(
) would use
with the coefficient
at this step. The DGA
, like WGA(
), uses the same
and only a fraction of
:
Thus, the choice
in (
48) corresponds to the WGA. However, it is clear from the above considerations that our technique, designed for general Banach spaces, does not work in the case
. By Theorem 4 with
, the DGA
and DGA
provide the following error estimate:
Note that the inequality (
49) is similar to the corresponding inequality, which follows from Theorem 2, for
. It is easy to check that
, which means that Theorem 2 gives a stronger result than the corresponding corollary of Theorem 4.
A remark on lower bounds. In
Section 2, we obtained the lower bounds, which are sharp in both parameters
m and
b. Clearly, the most important parameter is
m. Here, we obtain the lower bounds in
m, which apply to any algorithm providing
m-term approximation after
m iterations. Recall the definition of the concept of
m-term approximation with respect to a given dictionary
. Given an integer
, we denote by
the set of all
m-term approximants with respect to
:
Define for a Banach space
X:
to be the best
m-term approximation of
in the
X-norm with respect to
.
Let
. Consider
. It is known ([
23], p. 67) that
,
, is a uniformly smooth Banach space with a modulus of smoothness
of power type
q:
. Choose
as a symmetrized standard basis
,
, for
. For a given
, set:
Then, the following relations are obvious:
Therefore, for any
:
This means that the upper bounds provided by Theorem 4 are sharp in
m for any fixed parameters
t and
b. More precisely, for every
, there exists a Banach space
X with
, a dictionary
, and
such that the following inequality holds:
Inequality (
51) follows directly from (
50) with
and
.
Remark 2. Inequality (51) gives the lower bound for the best m-term approximation. It is known (see [1], Ch. 6) that there are greedy-type algorithms; for instance, the Weak Chebyshev Greedy Algorithm and the Weak Greedy Algorithm with Free Relaxation with the weakness parameter , which provide the following rate of convergence for with , ,This means that the corresponding upper bound in (51) (in the sense of order) can be realized by a greedy-type algorithm. Note that for specific
X and
, the inequality (
51) may be improved. We illustrate it on the example of
with
and
. Without loss of generality, we can assume that:
We now estimate from above:
Our monotonicity assumption on
implies:
and, therefore, for any
:
Setting
, we obtain from here with
:
Therefore, for any
, we have for all
:
Note that it is known (see (
26) above) that the space
with
has the modulus of smoothness of the power type 2. Thus, Theorem 4 gives an analog of (
55) with a weaker rate of decay
than
in (
55).
We briefly discuss another example of the
-type spaces with
. Consider the space
of real
-periodic functions and take
to be the real trigonometric system. For a given
, set:
Then, it is well known that:
Therefore, for any
, we obtain:
4. Discussion
In this paper, we propose and study a new criterion for the evaluation of efficiency of a greedy algorithm. This criterion takes into account two characteristics of an element
f, which we approximate: its norm
(either in a Hilbert or in a Banach space) and the
. We test this new criterion on two standard classes of algorithms. The first class consists of algorithms which provide for each element
f an expansion with respect to a given dictionary
. In this paper, we discuss two such algorithms: the PGA and the WGA(
) for Hilbert spaces, and the DGA(
) for Banach spaces. Using our new criterion, we compare the efficiency of these algorithms with algorithms from the second class, which include the OGA (Hilbert spaces) and its generalization for Banach spaces—the Weak Chebyshev Greedy Algorithm. Algorithms from the second class are known for providing optimal in the sense of order rate of
m-term approximation for the class
. These algorithms do not provide expansions. In this paper, we show that from the point of view of our new criterion algorithms, PGA, WGA(
), and DGA(
) are optimal in the sense of order. We illustrate this fact on the example of WGA(
). Theorem 2 with
gives the following upper bound for
:
Inequality (
22) shows that this bound cannot be improved in the sense of order. Moreover, inequality (
51) with
, which corresponds to a Hilbert space, shows that (
57) cannot be improved in the sense of order with respect to
m even if instead of the algorithm WGA(
), we use the best
m-term approximations.
We point out that our new results provide better bounds for the accuracy than known results in the case of small
. For simplicity, we illustrate that on the example of the PGA. Suppose that
. Then, Ref [
16] gives the bound
, which does not include the norm
. The upper bound in (
4) with
gives:
which is better than
, when
is small.
In this paper, we only applied our new criterion to a few important greedy algorithms and established a new qualitative effect on optimality (in the sense of order) of some of the algorithms, which provide expansions. It will be interesting to apply this criterion to other greedy algorithms.