Almost Optimality of the Orthogonal Super Greedy Algorithm for μ-Coherent Dictionaries

Shao, Chunfang; Chang, Jincai; Ye, Peixin; Zhang, Wenhui; Xing, Shuo

doi:10.3390/axioms11050186

Open AccessArticle

Almost Optimality of the Orthogonal Super Greedy Algorithm for μ-Coherent Dictionaries^†

by

Chunfang Shao

^1,*,

Jincai Chang

¹,

Peixin Ye

²,

Wenhui Zhang

²

and

Shuo Xing

²

¹

College of Science, North China University of Science and Technology, Tangshan 063210, China

²

School of Mathematics and LPMC, Nankai University, Tianjin 300071, China

^*

Author to whom correspondence should be addressed.

^†

This work was supported by National Natural Science Foundation of China under Grant 11671213.

Axioms 2022, 11(5), 186; https://doi.org/10.3390/axioms11050186

Submission received: 29 January 2022 / Revised: 7 April 2022 / Accepted: 12 April 2022 / Published: 20 April 2022

(This article belongs to the Special Issue Numerical Computation, Approximation of Functions and Applied Mathematics)

Download

Browse Figures

Versions Notes

Abstract

:

We study the approximation capability of the orthogonal super greedy algorithm (OSGA) with respect to

μ

-coherent dictionaries in Hilbert spaces. We establish the Lebesgue-type inequalities for OSGA, which show that the OSGA provides an almost optimal approximation on the first

[1 / (18 μ s)]

steps. Moreover, we improve the asymptotic constant in the Lebesgue-type inequality of OGA obtained by Livshitz E D.

Keywords:

orthogonal super greedy algorighm; coherence; best n-term approximation; Lebesgue-type inequality

MSC:

41A65; 41A50; 41A46

1. Introduction

Approximation by the sparse linear combination of elements from a fixed redundant system continues to develop actively, which is driven not only by theoretical interest but also by frequent applications from areas such as signal processing and machine learning, cf. [1,2,3,4,5,6,7]. This type of approximation is called highly nonlinear approximation. Greedy-type algorithms have been used as a tool for generating such approximations. Among others, the orthogonal greedy algorithm (OGA) has been widely used in practice. In fact, the OGA is regarded as the most powerful algorithm to solve the problem of approximation with respect to redundant systems, cf. [8,9,10].

We recall some notations and definitions from the theory of greedy algorithms. Let

H

be a Hilbert space with an inner product

〈\cdot, \cdot〉

and the norm

∥ x ∥ : = {〈x, x〉}^{\frac{1}{2}} .

We say that a set

D

of elements from

H

is a dictionary if

g \in D \Rightarrow ∥ g ∥ = 1, and \bar{span} D = H .

We consider redundant dictionaries, which have been utilized frequently in the field of signal processing. Here, a redundant dictionary means that the elements of the dictionary may be linearly dependent.

We now recall the definition of the OGA from [1].

ORTHOGONAL GREEDY ALGORITHM (OGA)

Set

f_{0} : = f \in H, G_{0}^{O G A} (f, D) : = 0 .

For each

m \geq 0

, we inductively find

g_{m + 1} \in D

such that

| 〈f_{m}, g_{m + 1}〉 | = sup_{g \in D} | 〈f_{m}, g〉 |

and define

G_{m}^{O G A} (f, D) : = P_{span {g_{1}, g_{2}, \dots, g_{m}}} (f),

f_{m + 1} : = f - G_{m + 1}^{O G A} (f, D),

where

P_{span {g_{1}, g_{2}, \dots, g_{m}}}

is the operator of the orthogonal projection onto

span {g_{1}, g_{2}, \cdot \cdot \cdot, g_{m}}

.

In [11], Liu and Temlyakov proposed the orthogonal super greedy algorithm (OSGA). The OSGA selects more than one element from a dictionary in each iteration step and hence reduces the computational burden of the conventional OGA. Therefore, the OSGA is more efficient than the OGA from the viewpoint of the computational complexity.

ORTHOGONAL SUPER GREEDY ALGORITHM (OSGA(s))

Set

f_{0} : = f \in H, G_{0}^{O S G A} (f, D) : = 0 .

For a natural number

s \geq 1

and each

m \geq 0

, we inductively define:

(1): $g_{(m - 1) s + 1}, g_{(m - 1) s + 2}, \dots, g_{m s} \in D$ are elements of the dictionary $D,$ satisfying the following inequality. Denote $I_{m} = [(m - 1) s + 1, m s]$ and assume that

$min_{i \in I_{m}} | 〈f_{m - 1}, g_{i}〉 | \geq sup_{g \in D, g \neq g_{i}, i \in I_{m}} | 〈f_{m - 1}, g〉 | .$
(2): Let $H_{m} : = H_{m} (f) : = span {g_{1}, g_{1}, \dots, g_{m s}}$ and $P_{H_{m}}$ denote the operator of the orthogonal projection onto $H_{m} .$ Define

$G_{m} (f) : = G_{m} (f, D) : = G_{m}^{s} (f, D) : = P_{H_{m}} (f) .$
(3): Define the residual after the m-th iteration of the algorithm

$f_{m} : = f_{m}^{s} : = f - G_{m} (f, D) .$

Note that, in the case

s = 1,

OSGA(s) coincides with OGA.

In this paper, we study the approximation capability of the OSGA with respect to

μ

-coherent dictionaries in Hilbert spaces. We denote by

μ = μ (D) = sup_{g \neq h, g, h \in D} | 〈g, h〉 |

the coherence of a dictionary. The coherence

μ

is a blunt instrument to measure the redundancy of dictionaries. It is clear that if

D

is an orthonormal basis, then

μ (D) = 0

. The smaller the

μ (D),

the more the

D

resembles an orthonormal basis. We study dictionaries with small values of coherence

μ (D) > 0,

and call them

μ

-coherent dictionaries.

In [11], the authors found that such computational burden reduction of OSGA does not degrade the approximation capability if f belongs to the closure of the convex hull of the symmetrized dictionary

D^{\pm} : = {\pm g, g \in D}

, which is denoted by

A_{1} (D)

.

Theorem 1.

Let

D

be a dictionary with coherence parameter

μ : = μ (D) .

Then, for

s \leq {(2 μ)}^{- 1},

the algorithm OSGA(s) provides an approximation of

f \in A_{1} (D)

with the following error bound:

∥ f_{m}^{s} ∥^{2} \leq 40.5 {(s m)}^{- 1}, m = 1, 2, \dots .

It seems that a dimensional independent convergence rate was deduced, but the condition that the target element belongs to

A_{1} (D)

becomes more and more stringent as the number of the elements in

D

grows, cf. [2].

Fang, Lin, and Xu [12] studied the behavior of OSGA for

f \in H .

They defined

L_{1} = {f : f = \sum_{g \in D} a_{g} g}

and

{∥ f ∥}_{L_{1}} : = inf {\sum_{g \in D} | a_{g} | : f = \sum_{g \in D} a_{g} g}

for

f \in L_{1}

, and obtained the following theorem.

Theorem 2.

Let

D

be a dictionary with coherence

μ .

Then, for all

f \in H,

h \in L_{1}

and arbitrary

s \leq {(2 μ)}^{- 1} + 1,

the OSGA(s) provides an approximation of f with the error bound:

∥ f_{m}^{s} ∥^{2} \leq {∥ f - h ∥}^{2} + \frac{27}{2} {∥ h ∥}_{L_{1}}^{2} {(s m)}^{- 1}, k = 1, 2, \dots .

The

μ

-coherence of a dictionary is used in OSGA, which implies that computational burden reduction does not degenerate the approximation capability. Moreover, if

μ > \frac{1}{2},

then OSGA coincides with OGA.

Let

Σ_{m}

denote the collection of elements in

H

, which can be expressed as a linear combination of, at most, m elements of the dictionary

D

, namely

Σ_{m} : = Σ_{m} (D) = {g : g = \sum_{i \in Λ} c_{i} g_{i}, g_{i} \in D, Λ \subset N, # (Λ) \leq m} .

For an element

f \in H

, we define its best m-term approximation error by

σ_{m} (f) : = σ_{m} (f, D) : = inf_{g \in Σ_{m}} ∥ f - g ∥ .

The inequality connecting the error of greedy approximation and the error of best m-term approximation is called the Lebesgue-type inequality, cf. [13,14,15]. In this paper, we will establish the Lebesgue-type inequalities for OSGA with respect to

μ

-coherent dictionaries.

We first recall some results on the efficiency of OGA with respect to

μ

-coherent dictionaries. These results relate the error of OGA’s

A (m)

-th approximation to the error of the best m-term approximation with an extra multiplier:

∥ f_{A (m)} (f, D) ∥ \leq B (m) σ_{m} (f, D) for m \leq C (μ),

(1)

where

A (m) \in N, B (m), C (μ) \in R .

Gillbert, Muthukrishnan, and Strauss [16] gave the first Lebesgue-type inequality for OGA. They proved

∥ f_{m} ∥ \leq 8 m^{\frac{1}{2}} σ_{m} (f, D) for 1 \leq m \leq \frac{1}{8 \sqrt{2} μ} - 1 .

The constant in the above inequality was improved by Tropp in [17]:

∥ f_{m} {∥ \leq (1 + 6 m)}^{\frac{1}{2}} σ_{m} (f) for 1 \leq m \leq \frac{1}{3 μ} .

Donoho, Elad, and Temlyakov [18] dramatically improved the factor in front of

σ_{m}

and obtained that

∥ f_{[m log m]} ∥ \leq 24 σ_{m} (f) for 1 \leq m \leq \frac{1}{20 μ^{\frac{2}{3}}},

where the constant 24 is not the best. Many researchers have sought to improve the factor

B (m) .

Temlyakov and Zheltov improved the above inequality in [4]. They obtained

∥ f_{m ⌊ 2^{\sqrt{log m}} ⌋} (f, D) ∥ \leq 3 σ_{m} (f, D) for m ⌊ 2^{\sqrt{log m}} ⌋ \leq \frac{1}{26 μ} .

Livshitz [19] took the parameters

A (m) : = 2 m, B (m) : = 2.7, C (μ) : = \frac{1}{20}

in (1) and obtained the following profound result.

Theorem 3.

For every μ-coherent dictionary

D

and any

f \in H,

the OGA applied to f provides

∥ f_{2 m} ∥ \leq 2.7 σ_{m} (f, D) for 1 \leq m \leq \frac{1}{20 μ} .

By using the same method as in [19], Ye and Wei [20] improved slightly the constant 2.7.

Based on the above works, we give the error bound of the form (1) for OSGA with respect to dictionaries with small but non-vanishing coherence.

Theorem 4.

Let

D

be a dictionary with coherence

μ .

Then, for any

f \in H

and any

ϵ > 0,

the OSGA(s) applied to f provides

∥ f_{A m} ∥ \leq 2.24 (1 + ϵ) σ_{m} (f)

(2)

for all

1 \leq m \leq \frac{1}{18 μ s}, \frac{3}{100} \leq μ \leq \frac{1}{18}

and an absolute constant

A \geq 2 .

Remark 1.

1.: We remark that the values of μ and A for which (2) holds are coupled. For example, it is possible to obtain a smaller value of μ at the price of a larger value of A. Moreover, for sufficiently large A, μ can be arbitrarily close to zero.
2.: Our results improve Theorem 3 only in the asymptotic constant and not in the rate. Under the condition of Theorem 4, for $s = 1,$ taking $(A, ϵ, μ)$ as $(2, 0.1, 0.03),$ we can obtain $∥ f_{2 m} ∥ \leq 2.5 σ_{m} (f, D) .$ Comparing it with Theorem 3, the constant that we obtain is better.
3.: The specific constant 2.24 in (2) is not the best. By adjusting parameters A and $μ,$ we can obtain a more general estimation:

$∥ f_{A m} ∥ \leq C (A) (1 + ϵ) σ_{m} (f)$

for $α_{A} \leq μ \leq \frac{1}{18},$ where $C (A)$ and $α_{A}$ are interdependent. Thus, Theorem 4 shows that OSGA(s) can achieve an almost optimal approximation on the first $[1 / (18 μ s)]$ steps for dictionaries with small but non-vanishing coherence.

The paper is organized as follows. In Section 2, we establish several preliminary lemmas. In Section 3, for some closed subspace L of

H,

as defined below, we first give the estimations of

P_{L}^{⊥} (f_{n})

in different situations based on the lemmas in Section 2. Then, we estimate the

P_{L} (f_{n}) .

Finally, combining the above two estimations, we provide the detailed proof of Theorem 4. In Section 4, we test the performance of the OSGA in the case of finite dimensional Euclidean space. In Section 5, we make some concluding remarks on our work.

2. Preliminary Lemmas

In this section, we will introduce several quantities and discuss their properties, which are important to the proof of our main result. By the condition of Theorem 4, we have

μ \leq m μ s \leq A m μ s \leq \frac{1}{18} .

We establish three preliminary lemmas.

Lemma 1.

Let

n \leq A m s, h \in H, g_{i} \in D, 1 \leq i \leq n .

Assume that

P_{span {g_{1}, g_{2}, \dots, g_{n}}} h = \sum_{i = 1}^{n} c_{i} g_{i},

then, we have

max_{1 \leq i \leq n} | c_{i} | \leq k_{1} max_{1 \leq i \leq n} | 〈h, g_{i}〉 |,

where

k_{1} = \frac{1}{1 - μ A m s} \leq \frac{18}{17} .

Proof.

For any

g_{l} \in D, 1 \leq l \leq n,

we have

\begin{matrix} | 〈h, g_{l}〉 | & = | 〈P_{span {g_{1}, g_{2}, \dots, g_{n}}} h, g_{l}〉 | \\ = | 〈\sum_{i = 1}^{n} c_{i} g_{i}, g_{l}〉 | \\ \geq | c_{l} | - \sum_{i \neq l} | c_{i} | | 〈g_{i}, g_{l}〉 | \\ \geq | c_{l} | - μ (max_{1 \leq i \leq n} | c_{i} |) (n - 1) . \end{matrix}

This implies

max_{1 \leq l \leq n} | c_{l} | \leq \frac{1}{1 - μ (n - 1)} max_{1 \leq l \leq n} | 〈h, g_{l}〉 | \leq \frac{1}{1 - μ A m s} max_{1 \leq l \leq n} | 〈h, g_{l}〉 |,

where

k_{1} = \frac{1}{1 - μ A m s} \leq \frac{18}{17} .

□

For

ϵ > 0,

by the definition of

σ_{m} (f),

there exist

b_{j} \in R, ψ_{j} \in D, 1 \leq j \leq m

such that

∥ f - \sum_{j = 1}^{m} b_{j} ψ_{j} ∥ \leq (1 + ϵ) σ_{m} (f) .

(3)

For

1 \leq n \leq A m,

we set

d_{n} : = \sum_{j \in I_{n}} | 〈f_{n - 1}, g_{j}〉 | .

(4)

Assume that

x_{i, n}, n \geq 1, 1 \leq i \leq n s,

satisfying the equation

f_{n} = f_{n - 1} - P_{H_{n}} (f_{n - 1}) = f_{n - 1} - \sum_{i = 1}^{n s} x_{i, n} g_{i} .

(5)

Next, we give the estimates of

{x_{i, n}}_{i = 1}^{n s}

and

{d_{n}}

for

n \geq 1

in turn. Applying Lemma 1, we have the following estimates for

x_{i, n}

.

Lemma 2.

For

n \leq A m,

we have

| x_{i, n} | \leq k_{1} μ d_{n} for 1 \leq i \leq (n - 1) s;

| x_{i, n} - 〈f_{n - 1}, g_{i}〉 | \leq k_{1} μ d_{n} for (n - 1) s + 1 \leq i \leq n s .

Proof.

Define

h = f_{n - 1} - \sum_{i \in I_{n}} 〈f_{n - 1}, g_{i}〉 g_{i} .

Since

\begin{matrix} f_{n} & = f_{n - 1} - P_{H_{n}} (f_{n - 1}) \\ = f_{n - 1} - \sum_{i \in I_{n}} 〈f_{n - 1}, g_{i}〉 g_{i} - P_{H_{n}} (f_{n - 1} - \sum_{i \in I_{n}} 〈f_{n - 1}, g_{i}〉 g_{i}) \\ = h - P_{H_{n}} (h) \end{matrix}

(6)

and, for any

1 \leq j \leq (n - 1) s,

〈f_{n - 1}, g_{j}〉 = 0,

we have

| 〈h, g_{j}〉 | = | 〈f_{n - 1} - \sum_{i \in I_{n}} 〈f_{n - 1}, g_{i}〉 g_{i}, g_{j}〉 | \leq \sum_{i \in I_{n}} | 〈f_{n - 1}, g_{i}〉 | μ = μ d_{n} .

(7)

For

(n - 1) s + 1 \leq j \leq n s,

we have

\begin{matrix} | 〈h, g_{j}〉 | & = | 〈f_{n - 1} - \sum_{i \in I_{n}} < f_{n - 1}, g_{i} > g_{i}, g_{j}〉 | \\ = | 〈f_{n - 1}, g_{j}〉 - \sum_{i \in I_{n}} 〈f_{n - 1}, g_{i}〉 〈g_{i}, g_{j}〉 | \\ = | 〈f_{n - 1}, g_{j}〉 - 〈f_{n - 1}, g_{j}〉 - \sum_{i, j \in I_{n}, i \neq j} 〈f_{n - 1}, g_{i}〉 〈g_{i}, g_{j}〉 | \\ \leq \sum_{i, j \in I_{n}, i \neq j} | 〈f_{n - 1}, g_{i}〉 〈g_{i}, g_{j}〉 | \\ \leq μ d_{n} . \end{matrix}

(8)

Let

P_{H_{n}} (h) = \sum_{i = 1}^{n s} x_{i, n}^{^{'}} g_{i} .

Combining (5) with (6), we have

f_{n - 1} - \sum_{i \in I_{n}} 〈f_{n - 1}, g_{i}〉 g_{i} - \sum_{i = 1}^{n s} x_{i, n}^{^{'}} g_{i} = f_{n} = f_{n - 1} - \sum_{i = 1}^{n s} x_{i, n} g_{i} .

Thus, for

1 \leq i \leq (n - 1) s, x_{i, n}^{^{'}} = x_{i, n};

for

(n - 1) s + 1 \leq i \leq n s, 〈f_{n - 1}, g_{i}〉 + x_{i, n}^{^{'}} = x_{i, n} .

By Lemma 1 and inequalities (7) and (8), we obtain

max_{1 \leq i \leq n s} | x_{i, n}^{^{'}} | \leq k_{1} max_{1 \leq i \leq n s} | 〈h, g_{i}〉 | \leq k_{1} μ d_{n} .

Thus, for

1 \leq i \leq (n - 1) s,

| x_{i, n} | = | x_{i, n}^{^{'}} | \leq max_{1 \leq i \leq n s} | x_{i, n}^{^{'}} | \leq k_{1} μ d_{n};

for

(n - 1) s + 1 \leq i \leq n s,

| x_{i, n} - 〈f_{n - 1}, g_{i}〉 | = | x_{i, n}^{^{'}} | \leq max_{1 \leq i \leq n s} | x_{i, n}^{^{'}} | \leq k_{1} μ d_{n} .

□

We proceed to the estimate of

{d_{n}} .

Lemma 3.

For any

1 \leq l \leq n \leq A m + 1,

we have

d_{n} \leq k_{2} d_{l},

where

k_{2} = exp (A m s μ (1 + \frac{A m s μ}{1 - A m s μ})) \leq exp (\frac{1}{17}) .

Proof.

For

1 \leq n \leq A m,

according to the definition of

d_{n}

, we have

\begin{matrix} d_{n + 1} & = \sum_{i \in I_{n + 1}} | 〈f_{n}, g_{i}〉 | \leq s | 〈f_{n}, g_{n s + 1}〉 | \\ = s | 〈f_{n - 1} - P_{H_{n}} (f_{n - 1}), g_{n s + 1}〉 | \\ = s | 〈f_{n - 1} - \sum_{i = 1}^{n s} x_{i, n} g_{i}, g_{n s + 1}〉 | \\ \leq s (| 〈f_{n - 1}, g_{n s + 1}〉 | + \sum_{i = 1}^{n s} | x_{i, n} | | 〈g_{i}, g_{n s + 1}〉 |) . \end{matrix}

(9)

We continue to estimate the two summands of the right-hand side of the above inequality. For the first summand, the greedy step implies

s | 〈f_{n - 1}, g_{n s + 1}〉 | \leq s | 〈f_{n - 1}, g_{n s}〉 | \leq \sum_{I_{n}} | 〈f_{n - 1}, g_{i}〉 | = d_{n} .

(10)

For the second summand, by Lemma 1, we have

\begin{matrix} s \sum_{i = 1}^{n s} | x_{i, n} | | 〈g_{i}, g_{n s + 1}〉 | & \leq s μ \sum_{i = 1}^{n s} | x_{i, n} | \leq s μ (\sum_{i = 1}^{(n - 1) s} | x_{i, n} | + \sum_{i = (n - 1) s + 1}^{n s} | x_{i, n} |) \\ \leq s μ ((n - 1) s k_{1} μ d_{n} + \sum_{i \in I_{n}} (k_{1} μ d_{n} + | 〈f_{n - 1}, g_{i}〉 |)) \\ = s μ ((n - 1) s k_{1} μ d_{n} + s k_{1} μ d_{n} + d_{n}) \\ = d_{n} μ s (1 + n s k_{1} μ) \\ \leq d_{n} μ s (1 + A m s μ k_{1}) . \end{matrix}

(11)

Combining inequalities (9)–(11) with Lemma 2, we conclude that

d_{n + 1} \leq d_{n} + (1 + \frac{A m s μ}{1 - A m s μ}) d_{n} μ s = [1 + (1 + \frac{A m s μ}{1 - A m s μ}) μ s] d_{n} .

Thus, for any n and

1 \leq l \leq n \leq A m + 1,

we have

\begin{matrix} d_{n} & \leq [1 + (1 + \frac{A m s μ}{1 - A m s μ}) μ s] d_{n - 1} \leq \dots \leq {[1 + (1 + \frac{A m s μ}{1 - A m s μ}) μ s]}^{n - l} d_{l} \\ \leq {[1 + (1 + \frac{A m s μ}{1 - A m s μ}) μ s]}^{A m} d_{l} \\ = {[1 + \frac{A m (1 + \frac{A m s μ}{1 - A m s μ}) μ s}{A m}]}^{A m} d_{l} \leq exp (A m (1 + \frac{A m s μ}{1 - A m s μ}) μ s) d_{l} = k_{2} d_{l}, \end{matrix}

where

k_{2} : = exp (A m s μ (1 + \frac{A m s μ}{1 - A m s μ})) \leq exp (\frac{1}{17}) .

□

3. Proof of Theorem 4

Based on the above preliminary lemmas, we will prove Theorem 4 step by step. We first introduce some notations. Define

L : = span (ψ_{1}, \dots, ψ_{m}), f_{0} = f, ξ_{n} : = P_{L}^{⊥} (f_{n}), 0 \leq n \leq A m .

T_{1} : = {i \in {1, \dots, A m s} : g_{i} \in {ψ_{j}}_{j = 1}^{m}}, T_{2} : = {1, \dots, A m s} \ T_{1} .

For

d_{n},

we define

D : = \sum_{i : 1 \leq i \leq A m, T_{2} \cap I_{i} \neq \emptyset} d_{i}^{2} .

Let

a_{j, n} \in R, 1 \leq j \leq m, 0 \leq n \leq A m

satisfy the following equations

P_{L} (f_{n}) = \sum_{j = 1}^{m} a_{j, n} ψ_{j} .

(12)

Thus, for

f_{n}

,

0 \leq n \leq A m,

we have

f_{n} : = P_{L} (f_{n}) + P_{L}^{⊥} (f_{n}) = \sum_{j = 1}^{m} a_{j, n} ψ_{j} + ξ_{n} .

To obtain the upper bound of

∥ f_{n} ∥,

it suffices to estimate

∥ ξ_{n} ∥

and

∥ P_{L} (f_{n}) ∥ .

By the definitions of sets

T_{1}, T_{2}

and

I_{n}

in OSGA, we first give the estimate of

∥ ξ_{n} ∥

according to whether the intersection of

T_{2}

and

I_{n}

is an empty set.

Theorem 5.

Let n satisfy

1 \leq n \leq A m

and

I_{n} \cap T_{2} = \emptyset .

Then,

∥ ξ_{n} ∥ \leq ∥ ξ_{n - 1} ∥ + 0.22 D μ .

Proof.

Let

Λ_{n} : = \cup_{i = 1}^{n} I_{n}, T_{2}^{n} : = T_{2} \cap Λ_{n}, t_{n} : = | T_{2}^{n} | .

By Lemma 3, for

1 \leq l \leq n \leq A m,

d_{n} \leq k_{2} min_{l : I_{l} \cap T_{2}^{n} \neq \emptyset} d_{l} .

Then, we have

s^{- 1} t_{n} {(min_{I_{l} \cap T_{2}^{n} \neq \emptyset} d_{l})}^{2} \leq \sum_{I_{l} \cap T_{2}^{n} \neq \emptyset} (\sum_{i \in I_{l}} | 〈f_{l - 1}, g_{i}〉 {|)}^{2} = \sum_{I_{l} \cap T_{2}^{n} \neq \emptyset} d_{i}^{2} = D,

so, we can obtain that

s^{- 1} t_{n} d_{n}^{2} \leq k_{2}^{2} D .

(13)

Since

I_{n} \cap T_{2} = \emptyset,

we obtain

t_{n} = t_{n - 1} .

We define

h : = \sum_{i \in T_{2}^{n}} x_{i, n} g_{i} = \sum_{i \in T_{2}^{n - 1}} x_{i, n} g_{i} .

(14)

Note that

f_{n} = f - P_{H_{n}} (f) = f_{n - 1} + P_{H_{n - 1}} (f) - P_{H_{n}} (f_{n - 1} + P_{H_{n - 1}} (f)) = f_{n - 1} + P_{H_{n}} (f_{n - 1}) .

By the definitions of

L, T_{1}, T_{2}, Λ_{n}

and the expression of (14), we have

P_{L}^{⊥} (P_{H_{n}} (f_{n - 1})) = P_{L}^{⊥} (h)

. Then, we obtain

\begin{matrix} ∥ ξ_{n} ∥^{2} & = ∥ P_{L}^{⊥} (f_{n}) ∥^{2} = ∥ P_{L}^{⊥} (f_{n - 1} - P_{H_{n}} (f_{n - 1})) ∥^{2} = {∥ ξ_{n - 1} - P_{L}^{⊥} (h) ∥}^{2} \\ \leq ∥ ξ_{n - 1} ∥^{2} + 2 | 〈ξ_{n - 1}, P_{L}^{⊥} (h)〉 | + ∥ P_{L}^{⊥} {(h) ∥}^{2} . \end{matrix}

(15)

To obtain the final result, it suffices to estimate the upper bounds of

| 〈ξ_{n - 1}, P_{L}^{⊥} (h)〉 |

and

∥ P_{L}^{⊥} {(h) ∥}^{2} .

For

| 〈ξ_{n - 1}, P_{L}^{⊥} (h)〉 |

, by (12) and (14), we have

\begin{matrix} | 〈ξ_{n - 1}, P_{L}^{⊥} (h)〉 | & = | 〈ξ_{n - 1}, h〉 | = | 〈f_{n - 1} - P_{L} (f_{n - 1}), h〉 | = | 〈P_{L} (f_{n - 1}), h〉 | \\ = | 〈\sum_{j = 1}^{m} a_{j, n - 1} ψ_{j}, \sum_{{T_{2}}^{n - 1}} x_{i, n} g_{i}〉 | \\ \leq \sum_{j = 1}^{m} | a_{j, n - 1} | \cdot \sum_{{T_{2}}^{n - 1}} | x_{i, n} 〈ψ_{j}, g_{i}〉 |, \end{matrix}

(16)

where we have used the fact

< f_{n - 1}, h > = 0 .

On the one hand, for any

1 \leq l \leq m

and n satisfying

T_{2} \cap I_{n} = \emptyset,

we obtain

\begin{matrix} | 〈\sum_{j = 1}^{m} a_{j, n - 1} ψ_{j}, ψ_{l}〉 | & = | 〈\sum_{j = 1}^{m} a_{j, n - 1} ψ_{j} + ξ_{n - 1}, ψ_{l}〉 | = | 〈f_{n - 1}, ψ_{l}〉 | \\ \leq max_{I_{n}} | 〈f_{n - 1}, g_{i}〉 | \leq \sum_{I_{n}} | 〈f_{n - 1}, g_{i}〉 | = d_{n} . \end{matrix}

(17)

Thus, by Lemma 1 and inequality (17), we obtain

\begin{matrix} \sum_{j = 1}^{m} | a_{j, n - 1} | & \leq m \cdot max_{1 \leq j \leq m} | a_{j, n - 1} | \leq m k_{1} max_{1 \leq j \leq m} | 〈f_{n - 1}, ψ_{j}〉 | \\ \leq m k_{1} max_{1 \leq j \leq m} | 〈P_{L} (f_{n - 1}) + ξ_{n - 1}, ψ_{j}〉 | \\ = m k_{1} max_{1 \leq j \leq m} | 〈\sum_{i = 1}^{m} a_{i, n - 1} ψ_{i}, ψ_{j}〉 | \leq m k_{1} d_{n} . \end{matrix}

(18)

On the other hand, by Lemma 2, we have, for

1 \leq j \leq m,

\begin{matrix} \sum_{{T_{2}}^{n - 1}} | x_{i, n} 〈ψ_{j}, g_{i}〉 | \leq μ t_{n - 1} max_{i \in T_{2}^{n - 1}} | x_{i, n} | \leq μ t_{n - 1} k_{1} μ d_{n} . \end{matrix}

(19)

Thus, substituting (18) and (19) into (16), and then combining it with (13), we get the estimate

\begin{matrix} | 〈ξ_{n - 1}, P_{L}^{⊥} (h)〉 | & \leq \sum_{j = 1}^{m} | a_{j, n - 1} | \cdot \sum_{{T_{2}}^{n - 1}} | x_{i, n} 〈ψ_{j}, g_{i}〉 | \\ \leq m μ^{2} k_{1}^{2} d_{n}^{2} t_{n - 1} \\ \leq m μ^{2} k_{1}^{2} s s^{- 1} d_{n}^{2} t_{n - 1} \\ \leq m μ s k_{1}^{2} k_{2}^{2} D μ . \end{matrix}

(20)

Finally, we estimate

∥ P_{L}^{⊥} {(h) ∥}^{2} .

Note that

\begin{matrix} ∥ P_{L}^{⊥} {(h) ∥}^{2} & = {∥ h ∥}^{2} = {∥ \sum_{i \in T_{2}^{n - 1}} x_{i, n} g_{i} ∥}^{2} \\ = \sum_{i \in T_{2}^{n - 1}} x_{i, n}^{2} + \sum_{i, j \in T_{2}^{n - 1}, i \neq j} | x_{i, n} | | x_{j, n} | | 〈g_{i}, g_{j}〉 | \\ \leq (max_{i \in T_{2}^{n - 1}} x_{i, n}^{2}) [t_{n - 1} + t_{n - 1}^{2} μ] \\ \leq k_{1}^{2} μ^{2} d_{n}^{2} [t_{n} + t_{n}^{2} μ] . \end{matrix}

By using (13), we have

∥ P_{L}^{⊥} {(h) ∥}^{2} \leq k_{1}^{2} μ^{2} s k_{2}^{2} D (1 + A m s μ) = k_{1}^{2} μ s (1 + A m s μ) k_{2}^{2} D μ .

(21)

Combining (15) and (20) with (21), we give

\begin{matrix} ∥ ξ_{n} ∥^{2} & \leq ∥ ξ_{n - 1} ∥^{2} + 2 | 〈ξ_{n - 1}, P_{L}^{⊥} (h)〉 | + ∥ P_{L}^{⊥} {(h) ∥}^{2} \\ \leq ∥ ξ_{n - 1} ∥^{2} + [2 m μ s + μ s (1 + A m s μ)] k_{1}^{2} k_{2}^{2} D μ \\ \leq ∥ ξ_{n - 1} ∥^{2} + 0.22 D μ . \end{matrix}

□

Theorem 5 gives the estimation of

∥ ξ_{n} ∥

in the situation

I_{n} \cap T_{2} = \emptyset .

The following theorem deals with the situation

I_{n} \cap T_{2} \neq \emptyset

.

Theorem 6.

Let n satisfy

1 \leq n \leq A m

and

I_{n} \cap T_{2} = \emptyset .

Then,

∥ ξ_{n} ∥ \leq ∥ ξ_{n - 1} ∥^{2} - 0.8937 d_{n}^{2} .

Proof.

Since

\begin{matrix} ξ_{n} = P_{L}^{⊥} (f_{n}) & = P_{L}^{⊥} (f_{n - 1} - \sum_{i = 1}^{n s} x_{i, n} g_{i}) \\ = P_{L}^{⊥} (f_{n - 1} - \sum_{i \in I_{n}} x_{i, n} g_{i} - \sum_{i \in Λ_{n - 1}} x_{i, n} g_{i}) \\ = P_{L}^{⊥} (f_{n - 1} - \sum_{i \in I_{n}} x_{i, n} g_{i}) - P_{L}^{⊥} (\sum_{i \in Λ_{n - 1}} x_{i, n} g_{i}), \end{matrix}

we set

ξ_{n}^{^{'}} = P_{L}^{⊥} (f_{n - 1} - \sum_{i \in I_{n}} x_{i, n} g_{i}), h = \sum_{i \in T_{2}^{n - 1}} x_{i, n} g_{i}

and write

ξ_{n}

as

ξ_{n} = ξ_{n}^{^{'}} - P_{L}^{⊥} (h) .

According to the following inequality,

∥ ξ_{n} ∥^{2} = ∥ ξ_{n}^{^{'}} ∥^{2} - 2 〈ξ_{n}^{^{'}}, P_{L}^{⊥} (h)〉 + ∥ P_{L}^{⊥} {(h) ∥}^{2} \leq ∥ ξ_{n}^{^{'}} ∥^{2} - 2 〈ξ_{n}^{^{'}}, h〉 + {∥ h ∥}^{2},

(22)

we need to estimate

∥ ξ_{n}^{^{'}} ∥^{2}, 〈ξ_{n}^{^{'}}, h〉

and

{∥ h ∥}^{2} .

We first estimate

{∥ h ∥}^{2}

by

\begin{matrix} {∥ h ∥}^{2} & = ∥ \sum_{i \in T_{2}^{n - 1}} x_{i, n} g_{i} ∥^{2} \\ \leq k_{1}^{2} μ^{2} d_{n}^{2} (t_{n} + t_{n}^{2} μ) \\ \leq k_{1}^{2} μ^{2} d_{n}^{2} (A m s + {(A m s)}^{2} μ) \\ = k_{1}^{2} μ A m μ s (1 + A m μ s) d_{n}^{2} \\ \leq {(\frac{18}{17})}^{2} \frac{1}{18} \frac{1}{18} \frac{19}{18} d_{n}^{2} = \frac{19}{5202} d_{n}^{2} \leq 0.0037 d_{n}^{2} . \end{matrix}

(23)

Next, we continue to estimate

∥ ξ_{n}^{^{'}} ∥^{2} .

It is not difficult to see that

\begin{matrix} ∥ ξ_{n}^{^{'}} ∥^{2} & = ∥ P_{L}^{⊥} (f_{n - 1} - \sum_{i \in I_{n}} x_{i, n} g_{i}) ∥^{2} \\ = ∥ ξ_{n - 1} - P_{L}^{⊥} (\sum_{i \in I_{n}} x_{i, n} g_{i}) ∥^{2} \\ = ∥ ξ_{n - 1} ∥^{2} - 2 〈ξ_{n - 1}, \sum_{i \in I_{n}} x_{i, n} g_{i}〉 + {∥ P_{L}^{⊥} (\sum_{i \in I_{n}} x_{i, n} g_{i}) ∥}^{2} \\ \leq ∥ ξ_{n - 1} ∥^{2} - 2 \sum_{i \in I_{n}} x_{i, n} 〈ξ_{n - 1}, g_{i}〉 + {∥ \sum_{i \in I_{n}} x_{i, n} g_{i} ∥}^{2} . \end{matrix}

(24)

Note that

\begin{matrix} \sum_{i \in I_{n}} x_{i, n} 〈ξ_{n - 1}, g_{i}〉 \\ = & \sum_{i \in I_{n}} [(x_{i, n} - 〈f_{n - 1}, g_{i}〉) + 〈f_{n - 1}, g_{i}〉] [(〈ξ_{n - 1}, g_{i}〉 - 〈f_{n - 1}, g_{i}〉) + 〈f_{n - 1}, g_{i}〉] . \end{matrix}

(25)

By (18), for any

i \in T_{2} \cap I_{n},

we have

\begin{matrix} | 〈ξ_{n - 1}, g_{i}〉 - 〈f_{n - 1}, g_{i}〉 | \\ = & | 〈f_{n - 1} - \sum_{j = 1}^{m} a_{j, n - 1} ψ_{j}, g_{i}〉 - 〈f_{n - 1}, g_{i}〉 | \\ = & | 〈\sum_{j = 1}^{m} a_{j, n - 1} ψ_{j}, g_{i}〉 | \leq \sum_{j = 1}^{m} | a_{j, n - 1} | | 〈ψ_{j}, g_{i}〉 | \\ \leq & k_{1} m μ d_{n} . \end{matrix}

(26)

Combining Lemma 2 with inequality (26), we obtain

\begin{matrix} \sum_{i \in I_{n}} x_{i, n} 〈ξ_{n - 1}, g_{i}〉 \\ \geq & \sum_{i \in I_{n}} (〈f_{n - 1}, g_{i}〉 - k_{1} μ d_{n}) (〈f_{n - 1}, g_{i}〉 - m k_{1} μ d_{n}) \\ = & \sum_{i \in I_{n}} ({〈f_{n - 1}, g_{i}〉}^{2} - (k_{1} μ d_{n} + m k_{1} μ d_{n}) 〈f_{n - 1}, g_{i}〉 + m k_{1}^{2} μ^{2} d_{n}^{2}) \\ = & \sum_{i \in I_{n}} {〈f_{n - 1}, g_{i}〉}^{2} - \sum_{i \in I_{n}} (k_{1} μ d_{n} + m k_{1} μ d_{n}) 〈f_{n - 1}, g_{i}〉 + \sum_{i \in I_{n}} m k_{1}^{2} μ^{2} d_{n}^{2} \\ \geq & \sum_{i \in I_{n}} {〈f_{n - 1}, g_{i}〉}^{2} - \sum_{i \in I_{n}} (k_{1} μ d_{n} + m k_{1} μ d_{n}) 〈f_{n - 1}, g_{i}〉 + m s k_{1}^{2} μ^{2} d_{n}^{2} \\ \geq & \sum_{i \in I_{n}} {〈f_{n - 1}, g_{i}〉}^{2} - (k_{1} μ + m k_{1} μ) d_{n}^{2} + m s k_{1}^{2} μ^{2} d_{n}^{2} \\ \geq & s^{- 1} d_{n}^{2} - (1 + m) k_{1} μ d_{n}^{2} \\ \geq & (9 A - k_{1}) (1 + m) μ d_{n}^{2} \geq 2 μ (9 A - k_{1}) d_{n}^{2} \geq \frac{576}{17} μ d_{n}^{2} \end{matrix}

(27)

for

0 \leq s \leq \frac{1}{18 A m μ} \leq \frac{1}{9 (1 + m) A μ} .

and

m \geq 1, A \geq 2 .

For the last summand of the right-hand side of the inequality in (24), we have

\begin{matrix} (\sum_{i \in I_{n}} | x_{i, n} {|)}^{2} & \leq (\sum_{i \in I_{n}} (| 〈f_{n - 1}, g_{i}〉 | + k_{1} μ d_{n} {))}^{2} \\ \leq {(1 + k_{1} μ s)}^{2} d_{n}^{2} \\ \leq {(\frac{18}{17})}^{2} d_{n}^{2} . \end{matrix}

(28)

Thus, combining (27) with (28), for

\frac{3}{100} \leq μ \leq \frac{1}{18},

we have

\begin{matrix} ∥ ξ_{n}^{^{'}} ∥^{2} & \leq ∥ ξ_{n - 1} ∥^{2} - 2 \sum_{i \in I_{n}} x_{i, n} 〈ξ_{n - 1}, g_{i}〉 + (\sum_{i \in I_{n}} | x_{i, n} {|)}^{2} \\ \leq ∥ ξ_{n - 1} ∥^{2} + ({(\frac{18}{17})}^{2} - 2 \frac{576}{17} μ) d_{n}^{2} \\ \leq ∥ ξ_{n - 1} ∥^{2} - 0.9118 d_{n}^{2} . \end{matrix}

(29)

We next estimate

| 〈ξ_{n}^{^{'}}, h〉 | .

Since

\begin{matrix} | 〈ξ_{n}^{^{'}}, h〉 | & = | 〈P_{L}^{⊥} (f_{n - 1} - \sum_{i \in I_{n}} x_{i, n} g_{i}), h〉 | = | 〈ξ_{n - 1} - \sum_{i \in I_{n}} x_{i, n} P_{L}^{⊥} (g_{i}), h〉 | \\ = | 〈f_{n - 1} - \sum_{j = 1}^{m} a_{j, n - 1} ψ_{j} - \sum_{i \in I_{n}} x_{i, n} P_{L}^{⊥} (g_{i}), h〉 | \\ \leq | 〈\sum_{j = 1}^{m} a_{j, n - 1} ψ_{j}, h〉 | + | 〈\sum_{i \in I_{n}} x_{i, n} P_{L}^{⊥} (g_{i}), h〉 | \\ \leq \sum_{j = 1}^{m} | a_{j, n - 1} | | 〈ψ_{j}, h〉 | + \sum_{i \in I_{n}} | x_{i, n} | | 〈P_{L}^{⊥} (g_{i}), h〉 | = : A + B, \end{matrix}

(30)

we need to give the upper bounds of A and

B .

By (18) and (19), we have

\begin{matrix} A & = \sum_{j = 1}^{m} | a_{j, n - 1} | | 〈ψ_{j}, h〉 | \leq \sum_{j = 1}^{m} | a_{j, n - 1} | \sum_{i \in T_{2}^{n - 1}} | x_{i, n} | | 〈ψ_{j}, g_{i}〉 | \\ \leq m k_{1} d_{n} μ t_{n - 1} k_{1} μ d_{n} \leq m k_{1} d_{n} μ A m s k_{1} μ d_{n} \leq k_{1}^{2} (A m μ s) (m μ) d_{n}^{2} \\ \leq 0.0035 d_{n}^{2} . \end{matrix}

(31)

As for

B,

since for

1 \leq j \leq (n - 1) s < i \leq n s \leq A m s, i, j \in T_{2},

P_{L} (g_{i}) = \sum_{l = 1}^{m} c_{l}^{i} ψ_{l},

by Lemma 1, we know that

max_{1 \leq l \leq m} | c_{l}^{i} | \leq k_{1} max_{1 \leq l \leq m} | 〈g_{i}, ψ_{l}〉 | \leq k_{1} μ,

(32)

and

\begin{matrix} | 〈P_{L}^{⊥} (g_{i}), g_{j}〉 | & = | 〈g_{i} - P_{L} (g_{i}), g_{j}〉 | = | 〈g_{i} - \sum_{l = 1}^{m} c_{l}^{i} ψ_{l}, g_{j}〉 | \\ \leq | 〈g_{i}, g_{j}〉 | + \sum_{l = 1}^{m} | c_{l}^{i} | | 〈ψ_{l}, g_{j}〉 | . \end{matrix}

(33)

Combining (32) with (33), we have

\begin{matrix} | 〈P_{L}^{⊥} (g_{i}), g_{j}〉 | \leq μ + m max_{1 \leq l \leq m} | c_{l}^{i} | μ \leq μ + m μ k_{1} μ \leq \frac{18}{17} μ . \end{matrix}

(34)

Using Lemma 1 again, we obtain from (34) that

\begin{matrix} B & = \sum_{i \in I_{n}} | x_{i, n} | | 〈P_{L}^{⊥} (g_{i}), h〉 | = \sum_{i \in I_{n}} | x_{i, n} | \sum_{j \in T_{2}^{n - 1}} | x_{j, n} | | 〈P_{L}^{⊥} (g_{i}), g_{j}〉 | \\ \leq \sum_{i \in I_{n}} (k_{1} μ d_{n} + | 〈f_{n - 1}, g_{j}〉 |) \sum_{j \in T_{2}^{n - 1}} | x_{j, n} | \frac{18}{17} μ \\ \leq (k_{1} μ d_{n} s + d_{n}) k_{1} μ d_{n} (n - 1) s \frac{18}{17} μ \\ \leq (k_{1} μ s + 1) (k_{1} μ) (A m s μ) \frac{18}{17} d_{n}^{2} \\ \leq (1 + \frac{18}{17} \frac{1}{18}) \frac{18}{17} \frac{1}{18} \frac{1}{18} \frac{18}{17} d_{n}^{2} \leq 0.0037 d_{n}^{2} . \end{matrix}

(35)

Thus, we get the upper bound of

| 〈ξ_{n}^{^{'}}, h〉 |

by (30), (31) and (35), i.e.,

| 〈ξ_{n}^{^{'}}, h〉 | \leq A + B \leq 0.0072 d_{n}^{2} .

(36)

Combining (22), (23) and (29) with (36), we have

\begin{matrix} ∥ ξ_{n} ∥^{2} & = ∥ ξ_{n}^{^{'}} ∥^{2} - 2 〈ξ_{n}^{^{'}}, h〉 + {∥ h ∥}^{2} \\ \leq ∥ ξ_{n}^{^{'}} ∥^{2} + 2 | 〈ξ_{n}^{^{'}}, h〉 {| + ∥ h ∥}^{2} \\ \leq ∥ ξ_{n - 1} ∥^{2} - 0.9118 d_{n}^{2} + 2 (0.0072 d_{n}^{2}) + 0.0037 d_{n}^{2} \\ = ∥ ξ_{n - 1} ∥^{2} - 0.8937 d_{n}^{2} . \end{matrix}

□

It remains to estimate

∥ P_{L} (f_{n}) ∥

. We first recall a lemma proven by Fang, Lin and Xu in [12].

Lemma 4.

Assume that a dictionary

D

has coherence

μ .

Then, we have, for any distinct

g_{i} \in D, a_{i} \in R, i = 1, 2, \dots, s,

the inequalities

(1 - μ (s - 1)) \sum_{i = 1}^{s} a_{i}^{2} \leq {∥ \sum_{i = 1}^{s} a_{i} g_{i} ∥}^{2} \leq (1 + μ (s - 1)) \sum_{i = 1}^{s} a_{i}^{2} .

Theorem 7.

For any

1 \leq n \leq A m,

we have

∥ P_{L} (f_{n}) ∥^{2} = {∥ \sum_{j = 1}^{m} a_{j, n} ψ_{j} ∥}^{2} \leq 1.34 D .

Proof.

From Lemma 4, we know that

\begin{matrix} ∥ P_{L} (f_{n}) ∥^{2} & = ∥ \sum_{j = 1}^{m} a_{j, n} ψ_{j} ∥^{2} \leq \sum_{j = 1}^{m} {| a_{j, n} |}^{2} (1 + m μ) . \end{matrix}

(37)

From Lemmas 1 and 2, we have, for any

1 \leq l \leq n + 1,

\begin{matrix} max_{1 \leq j \leq m} | a_{j, n} | & \leq k_{1} max_{1 \leq j \leq m} | 〈f_{n}, ψ_{j}〉 | \leq k_{1} \sum_{i \in I_{n + 1}} | 〈f_{n}, g_{i}〉 | = k_{1} d_{n + 1} \leq k_{1} k_{2} d_{l} . \end{matrix}

Thus,

\begin{matrix} \sum_{j = 1}^{m} | a_{j, n} |^{2} \leq m max_{1 \leq j \leq m} {| a_{j, n} |}^{2} \leq {(k_{1} k_{2})}^{2} \sum_{i \in I_{l} \cap T_{2} \neq \emptyset} d_{l}^{2} \leq {(k_{1} k_{2})}^{2} D . \end{matrix}

(38)

Combining (37) with (38), we have

\begin{matrix} ∥ P_{L} (f_{n}) ∥^{2} & \leq \sum_{j = 1}^{m} {| a_{j, m} |}^{2} (1 + m μ) \leq {(k_{1} k_{2})}^{2} D \frac{19}{18} = {(k_{1} k_{2})}^{2} \frac{19}{18} D \leq 1.34 D . \end{matrix}

(39)

□

Next, using Theorems 5 and 6, we give the estimation of

D .

Theorem 8.

For

A \geq 1

and any positive integer

m,

the following inequalities hold.

D^{\frac{1}{2}} \leq 1.07 (1 + ϵ) σ_{m} (f), \forall ϵ > 0,

∥ ξ_{A m} ∥ \leq ∥ ξ_{0} ∥ .

Proof.

From (3), we have

∥ ξ_{0} ∥^{2} = ∥ P_{L}^{⊥} {(f) ∥}^{2} = ∥ f - P_{L} {(f) ∥}^{2} \leq {∥ f - \sum_{j = 1}^{m} b_{j} ψ_{j} ∥}^{2} \leq {(1 + ϵ)}^{2} σ_{m}^{2} (f) .

By using Theorems 5 and 6, we derive

\begin{matrix} {(1 + ϵ)}^{2} σ_{m}^{2} (f) & \geq ∥ ξ_{0} ∥^{2} \geq ∥ ξ_{0} ∥^{2} - ∥ ξ_{A m} ∥^{2} = \sum_{n = 1}^{A m} (∥ ξ_{n - 1} ∥^{2} - ∥ ξ_{n} ∥^{2}) \\ = \sum_{I_{n} \cap T_{2} = \emptyset} (∥ ξ_{n - 1} ∥^{2} - ∥ ξ_{n} ∥^{2}) + \sum_{I_{n} \cap T_{2} \neq \emptyset} (∥ ξ_{n - 1} ∥^{2} - ∥ ξ_{n} ∥^{2}) \\ \geq \sum_{I_{n} \cap T_{2} = \emptyset} (- 0.22 D μ) + \sum_{I_{n} \cap T_{2} \neq \emptyset} 0.8937 d_{n}^{2} \\ \geq - 0.22 D A m μ + 0.8937 D \\ \geq - 0.0123 D + 0.8937 D \\ = 0.8814 D > 0, \end{matrix}

which is equivalent to

D^{\frac{1}{2}} \leq {(0.8814)}^{- \frac{1}{2}} (1 + ϵ) σ_{m} (f) \leq 1.07 (1 + ϵ) σ_{m} (f) .

Furthermore, we also have

\begin{matrix} ∥ ξ_{A m} ∥ \leq ∥ ξ_{0} ∥ . \end{matrix}

□

Now, we can give the proof of our main result.

Proof of Theorem 4.

Note that

∥ f_{A m} ∥ = ∥ P_{L} (f_{A m}) + ξ_{A m} ∥ \leq ∥ P_{L} (f_{A m}) ∥ + ∥ ξ_{A m} ∥ .

From Theorem 7 and Theorem 8, we obtain that

\begin{matrix} ∥ f_{A m} ∥ & \leq \sqrt{1.34} D^{\frac{1}{2}} + ∥ ξ_{0} ∥ \\ \leq \sqrt{1.34} D^{\frac{1}{2}} + (1 + ϵ) σ_{m} (f) \\ \leq \sqrt{1.34} 1.07 (1 + ϵ) σ_{m} (f) + (1 + ϵ) σ_{m} (f) \\ \leq 2.24 (1 + ϵ) σ_{m} (f) . \end{matrix}

Thus, we complete the proof of Theorem 4. □

4. Simulation Results

It is known from Theorem 4 that if

f \in Σ_{m}

, then

σ_{m} (f) = 0

, and hence

f = G_{m} (f)

. In this spirit, the OSGA can be used to recover sparse signals in compressed sensing, which is a new field of signal processing. We remark that in the field of signal processing, the orthogonal super greedy algorithm (OSGA) is also known as orthogonal multi-matching pursuit (OMMP). For the reader’s convenience, we will use the term OMMP instead of OSGA in what follows.

In this section, we test the performance of the orthogonal multi-matching pursuit with parameter s (OMMP(s)). We consider the following model. Suppose that

x \in R^{N}

is an unknown N-dimensional signal and we wish to recover it by the given data

y = Φ x,

(40)

where

Φ \in R^{M \times N}

is a known measurement matrix with

M ≪ N

. Furthermore, since

M ≪ N

, the column vectors of

Φ

are linearly dependent and the collection of these columns can be viewed as a redundant dictionary.

For arbitrary

x, y \in R^{N}

, define

\begin{matrix} 〈x, y〉 = \sum_{j = 1}^{N} x_{j} y_{j}, \end{matrix}

and

\begin{matrix} {∥ x ∥}_{2} = {(\sum_{j = 1}^{N} {| x_{j} |}^{2})}^{1 / 2}, \end{matrix}

where

x = {(x_{j})}_{j = 1}^{N}

and

y = {(y_{j})}_{j = 1}^{N} .

Obviously,

R^{N}

is a Hilbert space with the inner product

〈\cdot, \cdot〉

.

A signal

x \in R^{N}

is said to be K-sparse if

{∥ x ∥}_{0} : = # supp (x) = # {i : x_{i} \neq 0} \leq K < N

. We will recover the support of a K-sparse signal via OMMP(s) under the model (40). It is well known that OMMP takes the following form; see, for instance, [3].

ORTHOGONAL MULTI MATCHING PURSUIT (OMMP(s))

Input: Measurement matrix

Φ

, vector y, and s, the stopping criterion.

Step 1: Set the residual

r^{0} = y

, an initial approximation

x^{0} : = 0

, the index set

Λ^{0} : = \emptyset

, and the iteration counter

l = 0

.

Step 2: Define

Λ^{l + 1} : = Λ^{l} \cup {i_{1}, \dots, i_{s}}

such that

| 〈r^{l}, φ_{i_{1}}〉 | \geq \dots \geq | 〈r^{l}, φ_{i_{s}}〉 | \geq sup_{φ \in Φ, φ \neq φ_{i_{k}}, k = 1, \dots, s} | 〈r^{l}, φ〉 | .

Then,

x^{l} : = \underset{z : supp (z) \in Λ^{l + 1}}{arg min} {∥ y - Φ z ∥}_{2}

and update the residual

r^{l + 1} : = y - Φ x^{l + 1} .

End if the stopping condition is achieved. Otherwise, we set

l : = l + 1

and turn to step 2.

Output: If the algorithm stops at the kth iteration, then output

Λ^{k}

and

{\hat{x}}_{Λ^{k}} = x^{k}

.

In the experiment, we set the measurement matrix

Φ

to be a Gaussian matrix where each entry is selected from the

N (0, M^{- 1})

distribution and the density function of this distribution is

p (x) : = \frac{1}{\sqrt{2 π M}} e^{- x^{2} M / 2}

. We execute OMMP(s) with the data vector

y = Φ x

and stop the algorithm when

# Λ^{l} \geq K

. The mean square error(MSE) of x is defined as follows:

\begin{matrix} MSE = \frac{1}{N} \sum_{j = 1}^{N} {(x_{j} - {\hat{x}}_{Λ_{k}, j})}^{2} . \end{matrix}

Figure 1 shows the performance of OMMP(s) with

s = 5

for an input signal in dimension

N = 512

with sparsity level

K = 50

and number of measurements

M = 200

, where the red line represents the original signal and the black squares represent the approximation. By repeating the test 1000 times, we calculate the mean square error: MSE =

1.1894 \times 10^{- 8}

.

Figure 2 describes the case of the dimension

N = 256

. It displays which percentage (the average of 100 input signals) of the elements in support can be found correctly as a function of M with

s = 3

. If the percentage equals

100 %

, it means that all the elements in support can be found, which implies that the input signal can be exactly recovered. As expected, Figure 2 shows that when the sparsity level K increases, more measurements are necessary to guarantee signal recovery.

5. Concluding Remarks

This paper investigates the error behavior of the orthogonal super greedy algorithm OSGA with respect to

μ

-coherent dictionaries. The OSGA is simpler than the OGA from the viewpoint of the computational complexity. Under the assumption that the coherence parameter

μ

has a lower bound, we establish the ideal Lebesgue-type inequality for the OSGA, which shows that the OSGA provides an almost optimal approximation on the first

[1 / (18 μ s)]

steps. Moreover, we improve the asymptotic constant in the Lebesgue-type inequality of the OGA obtained in [19]. We develop some new techniques to obtain our results. We found that there exists a strong dependency between the constant A and the coherence parameter

μ

in (2). The specific constant 2.24 is not the best; we can change it by adjusting the values of A and

μ,

but the best one is still unknown. In fact, we do not even know if such a constant exists. We will continue to study the improvement of the Lebesgue constant in our future work. As for the applications of the OSGA, our simulation results show that OSGA is very efficient for recovering sparse signals.

Author Contributions

Authors contribute evenly in this paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

The authors would like to thank the referees, the editors, Zhang Haizhang and Xu Xu for their very useful suggestions, which significantly improved this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

DeVore, R.A. Nonlinear approximation. Acta Numer. 1998, 7, 51–150. [Google Scholar] [CrossRef] [Green Version]
Barron, A.R.; Cohen, A.; Dahmen, W.; DeVore, R. Approximation and learning by greedy algorithms. Ann. Statist. 2008, 36, 64–94. [Google Scholar] [CrossRef]
Wei, D. Analysis of orthogonal multi-matching pursuit under restricted isometry property. Sci. China Math. 2014, 57, 2179–2188. [Google Scholar]
Temlyakov, V.N.; Zheltov, P. On performance of greedy algorithms. J. Approx. Theory. 2011, 163, 1134–1145. [Google Scholar] [CrossRef] [Green Version]
Tropp, J.A.; Wright, S. Computational methods for sparse solution of linear inverse problems. P. IEEE. 2010, 98, 948–958. [Google Scholar] [CrossRef] [Green Version]
Donoho, D.L.; Tsaig, T.; Drori, O.; Starck, J.L. Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit. IEEE Trans. Inf. Theory. 2012, 58, 1094–1121. [Google Scholar] [CrossRef]
Wu, R.; Huang, W.; Chen, D.R. The exact support recovery of sparse signals with noise via orthogonal matching pursuit. IEEE Signal Proc. Let. 2013, 20, 403–406. [Google Scholar] [CrossRef]
Cai, T.; Wang, L. Orthogonal matching pursuit for sparse signal recovery with noise. IEEE Trans. Inf. Theory. 2011, 57, 4680–4688. [Google Scholar] [CrossRef]
Lin, J.H.; Li, S. Nonuniform support recovery from noisy measurements by orthogonal matching pursuit. J. Approx. Theory. 2013, 165, 20–40. [Google Scholar] [CrossRef] [Green Version]
Cohen, A.; Dahmen, W.; DeVore, A. Orthogonal matching pursuit under the restricted isometry property. Constr. Approx. 2017, 45, 113–127. [Google Scholar] [CrossRef] [Green Version]
Liu, E.; Temlyakov, V.N. The orthogonal super greedy algorithm and applications in compressed sensing. IEEE Trans. Inf. Theory. 2012, 58, 2040–2047. [Google Scholar] [CrossRef]
Fang, J.; Lin, S.B.; Xu, Z.B. Learning and approximation capabilities of orthogonal super greedy algorithm. Knowl-Based Syst. 2016, 95, 86–98. [Google Scholar] [CrossRef]
Berná, P.M.; Garrigós, G.; Óscar, B. Lebesgue inequalities for the greedy algorithm in general bases. Rev. Matem Compl. 2017, 30, 369–392. [Google Scholar] [CrossRef] [Green Version]
Shao, C.F.; Ye, P.X. Lebesgue constants for Chebyshev thresholding greedy algorithms. J. Inequal Appl. 2018, 2018, 102–124. [Google Scholar] [CrossRef] [Green Version]
Berná, P.M.; Blasco, Ó.; Garrigós, G.; Hernández, E.; Oikhberg, T. Lebesgue inequalities for Chebyshev thresholding greedy algorithms. Rev. Mat. Complut. 2020, 33, 695–722. [Google Scholar] [CrossRef] [Green Version]
Gilbert, A.J.; Muthukrishnan, S.; Strauss, M.J. Approximation of functions over redundant dictionaries using coherence. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, 11 January 2003; ACM: New York, NY, USA, 2003; pp. 234–252. [Google Scholar]
Tropp, J.A. Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inf. Theory. 2004, 50, 2231–2242. [Google Scholar] [CrossRef] [Green Version]
Donoho, D.L.; Elad, M.; Temlyakov, V.N. On Lebesgue-type inequalities for greedy approximation. J. Approx. Theory. 2007, 147, 185–195. [Google Scholar] [CrossRef] [Green Version]
Livshitz, E.D. On the optimality of the Orthogonal Greedy Algorithm for μ-coherent dictionaries. J. Approx. Theory. 2012, 164, 668–681. [Google Scholar] [CrossRef] [Green Version]
Ye, P.X.; Wei, X.J. Lebesgue-type inequality for Orthogonal Matching Pursuit for μ-coherent dictionaries. TELKOMNIKA. Indo. J. Elec. Eng. 2013, 11, 213–226. [Google Scholar]

Figure 1. The recovery of an input signal in dimension

N = 512

with sparsity level

K = 50

, number of measurements

M = 200

and

s = 5

.

Figure 1. The recovery of an input signal in dimension

N = 512

with sparsity level

K = 50

, number of measurements

M = 200

and

s = 5

.

Figure 2. The average percentage of elements in support found correctly (100 input signals) as a function of the number of measurements M for different sparsity levels K in dimension

N = 256

with

s = 3

.

Figure 2. The average percentage of elements in support found correctly (100 input signals) as a function of the number of measurements M for different sparsity levels K in dimension

N = 256

with

s = 3

.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shao, C.; Chang, J.; Ye, P.; Zhang, W.; Xing, S. Almost Optimality of the Orthogonal Super Greedy Algorithm for μ-Coherent Dictionaries. Axioms 2022, 11, 186. https://doi.org/10.3390/axioms11050186

AMA Style

Shao C, Chang J, Ye P, Zhang W, Xing S. Almost Optimality of the Orthogonal Super Greedy Algorithm for μ-Coherent Dictionaries. Axioms. 2022; 11(5):186. https://doi.org/10.3390/axioms11050186

Chicago/Turabian Style

Shao, Chunfang, Jincai Chang, Peixin Ye, Wenhui Zhang, and Shuo Xing. 2022. "Almost Optimality of the Orthogonal Super Greedy Algorithm for μ-Coherent Dictionaries" Axioms 11, no. 5: 186. https://doi.org/10.3390/axioms11050186

APA Style

Shao, C., Chang, J., Ye, P., Zhang, W., & Xing, S. (2022). Almost Optimality of the Orthogonal Super Greedy Algorithm for μ-Coherent Dictionaries. Axioms, 11(5), 186. https://doi.org/10.3390/axioms11050186

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Almost Optimality of the Orthogonal Super Greedy Algorithm for μ-Coherent Dictionaries^†

Abstract

1. Introduction

2. Preliminary Lemmas

3. Proof of Theorem 4

4. Simulation Results

5. Concluding Remarks

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Almost Optimality of the Orthogonal Super Greedy Algorithm for μ-Coherent Dictionaries †

Abstract

1. Introduction

2. Preliminary Lemmas

3. Proof of Theorem 4

4. Simulation Results

5. Concluding Remarks

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Almost Optimality of the Orthogonal Super Greedy Algorithm for μ-Coherent Dictionaries^†