Next Article in Journal
Waves in a Hyperbolic Predator–Prey System
Next Article in Special Issue
Improvement of the WENO-NIP Scheme for Hyperbolic Conservation Laws
Previous Article in Journal / Special Issue
Exponential Multistep Methods for Stiff Delay Differential Equations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Almost Optimality of the Orthogonal Super Greedy Algorithm for μ-Coherent Dictionaries †

1
College of Science, North China University of Science and Technology, Tangshan 063210, China
2
School of Mathematics and LPMC, Nankai University, Tianjin 300071, China
*
Author to whom correspondence should be addressed.
This work was supported by National Natural Science Foundation of China under Grant 11671213.
Axioms 2022, 11(5), 186; https://doi.org/10.3390/axioms11050186
Submission received: 29 January 2022 / Revised: 7 April 2022 / Accepted: 12 April 2022 / Published: 20 April 2022

Abstract

:
We study the approximation capability of the orthogonal super greedy algorithm (OSGA) with respect to μ -coherent dictionaries in Hilbert spaces. We establish the Lebesgue-type inequalities for OSGA, which show that the OSGA provides an almost optimal approximation on the first [ 1 / ( 18 μ s ) ] steps. Moreover, we improve the asymptotic constant in the Lebesgue-type inequality of OGA obtained by Livshitz E D.

1. Introduction

Approximation by the sparse linear combination of elements from a fixed redundant system continues to develop actively, which is driven not only by theoretical interest but also by frequent applications from areas such as signal processing and machine learning, cf. [1,2,3,4,5,6,7]. This type of approximation is called highly nonlinear approximation. Greedy-type algorithms have been used as a tool for generating such approximations. Among others, the orthogonal greedy algorithm (OGA) has been widely used in practice. In fact, the OGA is regarded as the most powerful algorithm to solve the problem of approximation with respect to redundant systems, cf. [8,9,10].
We recall some notations and definitions from the theory of greedy algorithms. Let H be a Hilbert space with an inner product · , · and the norm x : = x , x 1 2 . We say that a set D of elements from H is a dictionary if
g D g = 1 , and span ¯ D = H .
We consider redundant dictionaries, which have been utilized frequently in the field of signal processing. Here, a redundant dictionary means that the elements of the dictionary may be linearly dependent.
We now recall the definition of the OGA from [1].
ORTHOGONAL GREEDY ALGORITHM (OGA)
Set f 0 : = f H , G 0 O G A ( f , D ) : = 0 . For each m 0 , we inductively find g m + 1 D such that
| f m , g m + 1 | = sup g D | f m , g |
and define
G m O G A ( f , D ) : = P span { g 1 , g 2 , , g m } ( f ) ,
f m + 1 : = f G m + 1 O G A ( f , D ) ,
where P span { g 1 , g 2 , , g m } is the operator of the orthogonal projection onto span { g 1 , g 2 , · · · , g m } .
In [11], Liu and Temlyakov proposed the orthogonal super greedy algorithm (OSGA). The OSGA selects more than one element from a dictionary in each iteration step and hence reduces the computational burden of the conventional OGA. Therefore, the OSGA is more efficient than the OGA from the viewpoint of the computational complexity.
ORTHOGONAL SUPER GREEDY ALGORITHM (OSGA(s))
Set f 0 : = f H , G 0 O S G A ( f , D ) : = 0 . For a natural number s 1 and each m 0 , we inductively define:
(1)
g ( m 1 ) s + 1 , g ( m 1 ) s + 2 , , g m s D are elements of the dictionary D , satisfying the following inequality. Denote I m = [ ( m 1 ) s + 1 , m s ] and assume that
min i I m | f m 1 , g i | sup g D , g g i , i I m | f m 1 , g | .
(2)
Let H m : = H m ( f ) : = span { g 1 , g 1 , , g m s } and P H m denote the operator of the orthogonal projection onto H m . Define
G m ( f ) : = G m ( f , D ) : = G m s ( f , D ) : = P H m ( f ) .
(3)
Define the residual after the m-th iteration of the algorithm
f m : = f m s : = f G m ( f , D ) .
Note that, in the case s = 1 , OSGA(s) coincides with OGA.
In this paper, we study the approximation capability of the OSGA with respect to μ -coherent dictionaries in Hilbert spaces. We denote by
μ = μ ( D ) = sup g h , g , h D | g , h |
the coherence of a dictionary. The coherence μ is a blunt instrument to measure the redundancy of dictionaries. It is clear that if D is an orthonormal basis, then μ ( D ) = 0 . The smaller the μ ( D ) , the more the D resembles an orthonormal basis. We study dictionaries with small values of coherence μ ( D ) > 0 , and call them μ -coherent dictionaries.
In [11], the authors found that such computational burden reduction of OSGA does not degrade the approximation capability if f belongs to the closure of the convex hull of the symmetrized dictionary D ± : = { ± g , g D } , which is denoted by A 1 ( D ) .
Theorem 1.
Let D be a dictionary with coherence parameter μ : = μ ( D ) . Then, for s ( 2 μ ) 1 , the algorithm OSGA(s) provides an approximation of f A 1 ( D ) with the following error bound:
f m s 2 40.5 ( s m ) 1 , m = 1 , 2 , .
It seems that a dimensional independent convergence rate was deduced, but the condition that the target element belongs to A 1 ( D ) becomes more and more stringent as the number of the elements in D grows, cf. [2].
Fang, Lin, and Xu [12] studied the behavior of OSGA for f H . They defined L 1 = { f : f = g D a g g } and f L 1 : = inf { g D | a g | : f = g D a g g } for f L 1 , and obtained the following theorem.
Theorem 2.
Let D be a dictionary with coherence μ . Then, for all f H , h L 1 and arbitrary s ( 2 μ ) 1 + 1 , the OSGA(s) provides an approximation of f with the error bound:
f m s 2 f h 2 + 27 2 h L 1 2 ( s m ) 1 , k = 1 , 2 , .
The μ -coherence of a dictionary is used in OSGA, which implies that computational burden reduction does not degenerate the approximation capability. Moreover, if μ > 1 2 , then OSGA coincides with OGA.
Let Σ m denote the collection of elements in H , which can be expressed as a linear combination of, at most, m elements of the dictionary D , namely
Σ m : = Σ m ( D ) = { g : g = i Λ c i g i , g i D , Λ N , # ( Λ ) m } .
For an element f H , we define its best m-term approximation error by
σ m ( f ) : = σ m ( f , D ) : = inf g Σ m f g .
The inequality connecting the error of greedy approximation and the error of best m-term approximation is called the Lebesgue-type inequality, cf. [13,14,15]. In this paper, we will establish the Lebesgue-type inequalities for OSGA with respect to μ -coherent dictionaries.
We first recall some results on the efficiency of OGA with respect to μ -coherent dictionaries. These results relate the error of OGA’s A ( m ) -th approximation to the error of the best m-term approximation with an extra multiplier:
f A ( m ) ( f , D ) B ( m ) σ m ( f , D ) for m C ( μ ) ,
where A ( m ) N , B ( m ) , C ( μ ) R . Gillbert, Muthukrishnan, and Strauss [16] gave the first Lebesgue-type inequality for OGA. They proved
f m 8 m 1 2 σ m ( f , D ) for 1 m 1 8 2 μ 1 .
The constant in the above inequality was improved by Tropp in [17]:
f m ( 1 + 6 m ) 1 2 σ m ( f ) for 1 m 1 3 μ .
Donoho, Elad, and Temlyakov [18] dramatically improved the factor in front of σ m and obtained that
f [ m log m ] 24 σ m ( f ) for 1 m 1 20 μ 2 3 ,
where the constant 24 is not the best. Many researchers have sought to improve the factor B ( m ) . Temlyakov and Zheltov improved the above inequality in [4]. They obtained
f m 2 log m ( f , D ) 3 σ m ( f , D ) for m 2 log m 1 26 μ .
Livshitz [19] took the parameters A ( m ) : = 2 m , B ( m ) : = 2.7 , C ( μ ) : = 1 20 in (1) and obtained the following profound result.
Theorem 3.
For every μ-coherent dictionary D and any f H , the OGA applied to f provides
f 2 m 2.7 σ m ( f , D ) for 1 m 1 20 μ .
By using the same method as in [19], Ye and Wei [20] improved slightly the constant 2.7.
Based on the above works, we give the error bound of the form (1) for OSGA with respect to dictionaries with small but non-vanishing coherence.
Theorem 4.
Let D be a dictionary with coherence μ . Then, for any f H and any ϵ > 0 , the OSGA(s) applied to f provides
f A m 2.24 ( 1 + ϵ ) σ m ( f )
for all 1 m 1 18 μ s , 3 100 μ 1 18 and an absolute constant A 2 .
Remark 1.
1. 
We remark that the values of μ and A for which (2) holds are coupled. For example, it is possible to obtain a smaller value of μ at the price of a larger value of A. Moreover, for sufficiently large A, μ can be arbitrarily close to zero.
2. 
Our results improve Theorem 3 only in the asymptotic constant and not in the rate. Under the condition of Theorem 4, for s = 1 , taking ( A , ϵ , μ ) as ( 2 , 0.1 , 0.03 ) , we can obtain f 2 m 2.5 σ m ( f , D ) . Comparing it with Theorem 3, the constant that we obtain is better.
3. 
The specific constant 2.24 in (2) is not the best. By adjusting parameters A and μ , we can obtain a more general estimation:
f A m C ( A ) ( 1 + ϵ ) σ m ( f )
for α A μ 1 18 , where C ( A ) and α A are interdependent. Thus, Theorem 4 shows that OSGA(s) can achieve an almost optimal approximation on the first [ 1 / ( 18 μ s ) ] steps for dictionaries with small but non-vanishing coherence.
The paper is organized as follows. In Section 2, we establish several preliminary lemmas. In Section 3, for some closed subspace L of H , as defined below, we first give the estimations of P L ( f n ) in different situations based on the lemmas in Section 2. Then, we estimate the P L ( f n ) . Finally, combining the above two estimations, we provide the detailed proof of Theorem 4. In Section 4, we test the performance of the OSGA in the case of finite dimensional Euclidean space. In Section 5, we make some concluding remarks on our work.

2. Preliminary Lemmas

In this section, we will introduce several quantities and discuss their properties, which are important to the proof of our main result. By the condition of Theorem 4, we have
μ m μ s A m μ s 1 18 .
We establish three preliminary lemmas.
Lemma 1.
Let n A m s , h H , g i D , 1 i n . Assume that
P span { g 1 , g 2 , , g n } h = i = 1 n c i g i ,
then, we have
max 1 i n | c i | k 1 max 1 i n | h , g i | ,
where k 1 = 1 1 μ A m s 18 17 .
Proof. 
For any g l D , 1 l n , we have
| h , g l | = | P span { g 1 , g 2 , , g n } h , g l | = | i = 1 n c i g i , g l | | c l | i l | c i | | g i , g l | | c l | μ ( max 1 i n | c i | ) ( n 1 ) .
This implies
max 1 l n | c l | 1 1 μ ( n 1 ) max 1 l n | h , g l | 1 1 μ A m s max 1 l n | h , g l | ,
where k 1 = 1 1 μ A m s 18 17 .
For ϵ > 0 , by the definition of σ m ( f ) , there exist b j R , ψ j D , 1 j m such that
f j = 1 m b j ψ j ( 1 + ϵ ) σ m ( f ) .
For 1 n A m , we set
d n : = j I n | f n 1 , g j | .
Assume that x i , n , n 1 , 1 i n s , satisfying the equation
f n = f n 1 P H n ( f n 1 ) = f n 1 i = 1 n s x i , n g i .
Next, we give the estimates of { x i , n } i = 1 n s and { d n } for n 1 in turn. Applying Lemma 1, we have the following estimates for x i , n .
Lemma 2.
For n A m , we have
| x i , n | k 1 μ d n for 1 i ( n 1 ) s ;
| x i , n f n 1 , g i | k 1 μ d n for ( n 1 ) s + 1 i n s .
Proof. 
Define
h = f n 1 i I n f n 1 , g i g i .
Since
f n = f n 1 P H n ( f n 1 ) = f n 1 i I n f n 1 , g i g i P H n ( f n 1 i I n f n 1 , g i g i ) = h P H n ( h )
and, for any 1 j ( n 1 ) s ,   f n 1 , g j = 0 , we have
| h , g j | = | f n 1 i I n f n 1 , g i g i , g j | i I n | f n 1 , g i | μ = μ d n .
For ( n 1 ) s + 1 j n s , we have
| h , g j | = | f n 1 i I n < f n 1 , g i > g i , g j | = | f n 1 , g j i I n f n 1 , g i g i , g j | = | f n 1 , g j f n 1 , g j i , j I n , i j f n 1 , g i g i , g j | i , j I n , i j | f n 1 , g i g i , g j | μ d n .
Let P H n ( h ) = i = 1 n s x i , n g i . Combining (5) with (6), we have
f n 1 i I n f n 1 , g i g i i = 1 n s x i , n g i = f n = f n 1 i = 1 n s x i , n g i .
Thus, for 1 i ( n 1 ) s , x i , n = x i , n ; for ( n 1 ) s + 1 i n s , f n 1 , g i + x i , n = x i , n . By Lemma 1 and inequalities (7) and (8), we obtain
max 1 i n s | x i , n | k 1 max 1 i n s | h , g i | k 1 μ d n .
Thus, for 1 i ( n 1 ) s ,
| x i , n | = | x i , n | max 1 i n s | x i , n | k 1 μ d n ;
for ( n 1 ) s + 1 i n s ,
| x i , n f n 1 , g i | = | x i , n | max 1 i n s | x i , n | k 1 μ d n .
We proceed to the estimate of { d n } .
Lemma 3.
For any 1 l n A m + 1 , we have
d n k 2 d l ,
where k 2 = exp A m s μ 1 + A m s μ 1 A m s μ exp 1 17 .
Proof. 
For 1 n A m , according to the definition of d n , we have
d n + 1 = i I n + 1 | f n , g i | s | f n , g n s + 1 | = s | f n 1 P H n ( f n 1 ) , g n s + 1 | = s | f n 1 i = 1 n s x i , n g i , g n s + 1 | s ( | f n 1 , g n s + 1 | + i = 1 n s | x i , n | | g i , g n s + 1 | ) .
We continue to estimate the two summands of the right-hand side of the above inequality. For the first summand, the greedy step implies
s | f n 1 , g n s + 1 | s | f n 1 , g n s | I n | f n 1 , g i | = d n .
For the second summand, by Lemma 1, we have
s i = 1 n s | x i , n | | g i , g n s + 1 | s μ i = 1 n s | x i , n | s μ ( i = 1 ( n 1 ) s | x i , n | + i = ( n 1 ) s + 1 n s | x i , n | ) s μ ( ( n 1 ) s k 1 μ d n + i I n ( k 1 μ d n + | f n 1 , g i | ) ) = s μ ( ( n 1 ) s k 1 μ d n + s k 1 μ d n + d n ) = d n μ s ( 1 + n s k 1 μ ) d n μ s ( 1 + A m s μ k 1 ) .
Combining inequalities (9)–(11) with Lemma 2, we conclude that
d n + 1 d n + 1 + A m s μ 1 A m s μ d n μ s = 1 + 1 + A m s μ 1 A m s μ μ s d n .
Thus, for any n and 1 l n A m + 1 , we have
d n 1 + 1 + A m s μ 1 A m s μ μ s d n 1 1 + 1 + A m s μ 1 A m s μ μ s n l d l 1 + 1 + A m s μ 1 A m s μ μ s A m d l = 1 + A m 1 + A m s μ 1 A m s μ μ s A m A m d l exp A m 1 + A m s μ 1 A m s μ μ s d l = k 2 d l ,
where k 2 : = exp A m s μ 1 + A m s μ 1 A m s μ exp 1 17 .

3. Proof of Theorem 4

Based on the above preliminary lemmas, we will prove Theorem 4 step by step. We first introduce some notations. Define
L : = span ( ψ 1 , , ψ m ) , f 0 = f , ξ n : = P L ( f n ) , 0 n A m .
T 1 : = { i { 1 , , A m s } : g i { ψ j } j = 1 m } , T 2 : = { 1 , , A m s } \ T 1 .
For d n , we define
D : = i : 1 i A m , T 2 I i d i 2 .
Let a j , n R , 1 j m , 0 n A m satisfy the following equations
P L ( f n ) = j = 1 m a j , n ψ j .
Thus, for f n , 0 n A m , we have
f n : = P L ( f n ) + P L ( f n ) = j = 1 m a j , n ψ j + ξ n .
To obtain the upper bound of f n , it suffices to estimate ξ n and P L ( f n ) . By the definitions of sets T 1 , T 2 and I n in OSGA, we first give the estimate of ξ n according to whether the intersection of T 2 and I n is an empty set.
Theorem 5.
Let n satisfy 1 n A m and I n T 2 = . Then,
ξ n ξ n 1 + 0.22 D μ .
Proof. 
Let
Λ n : = i = 1 n I n , T 2 n : = T 2 Λ n , t n : = | T 2 n | .
By Lemma 3, for 1 l n A m ,
d n k 2 min l : I l T 2 n d l .
Then, we have
s 1 t n ( min I l T 2 n d l ) 2 I l T 2 n ( i I l | f l 1 , g i | ) 2 = I l T 2 n d i 2 = D ,
so, we can obtain that
s 1 t n d n 2 k 2 2 D .
Since I n T 2 = , we obtain t n = t n 1 . We define
h : = i T 2 n x i , n g i = i T 2 n 1 x i , n g i .
Note that
f n = f P H n ( f ) = f n 1 + P H n 1 ( f ) P H n ( f n 1 + P H n 1 ( f ) ) = f n 1 + P H n ( f n 1 ) .
By the definitions of L , T 1 , T 2 , Λ n and the expression of (14), we have P L ( P H n ( f n 1 ) ) = P L ( h ) . Then, we obtain
ξ n 2 = P L ( f n ) 2 = P L ( f n 1 P H n ( f n 1 ) ) 2 = ξ n 1 P L ( h ) 2 ξ n 1 2 + 2 | ξ n 1 , P L ( h ) | + P L ( h ) 2 .
To obtain the final result, it suffices to estimate the upper bounds of | ξ n 1 , P L ( h ) | and P L ( h ) 2 .
For | ξ n 1 , P L ( h ) | , by (12) and (14), we have
| ξ n 1 , P L ( h ) | = | ξ n 1 , h | = | f n 1 P L ( f n 1 ) , h | = | P L ( f n 1 ) , h | = | j = 1 m a j , n 1 ψ j , T 2 n 1 x i , n g i | j = 1 m | a j , n 1 | · T 2 n 1 | x i , n ψ j , g i | ,
where we have used the fact < f n 1 , h > = 0 .
On the one hand, for any 1 l m and n satisfying T 2 I n = , we obtain
| j = 1 m a j , n 1 ψ j , ψ l | = | j = 1 m a j , n 1 ψ j + ξ n 1 , ψ l | = | f n 1 , ψ l | max I n | f n 1 , g i | I n | f n 1 , g i | = d n .
Thus, by Lemma 1 and inequality (17), we obtain
j = 1 m | a j , n 1 | m · max 1 j m | a j , n 1 | m k 1 max 1 j m | f n 1 , ψ j | m k 1 max 1 j m | P L ( f n 1 ) + ξ n 1 , ψ j | = m k 1 max 1 j m | i = 1 m a i , n 1 ψ i , ψ j | m k 1 d n .
On the other hand, by Lemma 2, we have, for 1 j m ,
T 2 n 1 | x i , n ψ j , g i | μ t n 1 max i T 2 n 1 | x i , n | μ t n 1 k 1 μ d n .
Thus, substituting (18) and (19) into (16), and then combining it with (13), we get the estimate
| ξ n 1 , P L ( h ) | j = 1 m | a j , n 1 | · T 2 n 1 | x i , n ψ j , g i | m μ 2 k 1 2 d n 2 t n 1 m μ 2 k 1 2 s s 1 d n 2 t n 1 m μ s k 1 2 k 2 2 D μ .
Finally, we estimate P L ( h ) 2 .
Note that
P L ( h ) 2 = h 2 = i T 2 n 1 x i , n g i 2 = i T 2 n 1 x i , n 2 + i , j T 2 n 1 , i j | x i , n | | x j , n | | g i , g j | ( max i T 2 n 1 x i , n 2 ) [ t n 1 + t n 1 2 μ ] k 1 2 μ 2 d n 2 [ t n + t n 2 μ ] .
By using (13), we have
P L ( h ) 2 k 1 2 μ 2 s k 2 2 D ( 1 + A m s μ ) = k 1 2 μ s ( 1 + A m s μ ) k 2 2 D μ .
Combining (15) and (20) with (21), we give
ξ n 2 ξ n 1 2 + 2 | ξ n 1 , P L ( h ) | + P L ( h ) 2 ξ n 1 2 + [ 2 m μ s + μ s ( 1 + A m s μ ) ] k 1 2 k 2 2 D μ ξ n 1 2 + 0.22 D μ .
Theorem 5 gives the estimation of ξ n in the situation I n T 2 = . The following theorem deals with the situation I n T 2 .
Theorem 6.
Let n satisfy 1 n A m and I n T 2 = . Then,
ξ n ξ n 1 2 0.8937 d n 2 .
Proof. 
Since
ξ n = P L ( f n ) = P L ( f n 1 i = 1 n s x i , n g i ) = P L ( f n 1 i I n x i , n g i i Λ n 1 x i , n g i ) = P L ( f n 1 i I n x i , n g i ) P L ( i Λ n 1 x i , n g i ) ,
we set ξ n = P L ( f n 1 i I n x i , n g i ) , h = i T 2 n 1 x i , n g i and write ξ n as
ξ n = ξ n P L ( h ) .
According to the following inequality,
ξ n 2 = ξ n 2 2 ξ n , P L ( h ) + P L ( h ) 2 ξ n 2 2 ξ n , h + h 2 ,
we need to estimate ξ n 2 , ξ n , h and h 2 . We first estimate h 2 by
h 2 = i T 2 n 1 x i , n g i 2 k 1 2 μ 2 d n 2 ( t n + t n 2 μ ) k 1 2 μ 2 d n 2 ( A m s + ( A m s ) 2 μ ) = k 1 2 μ A m μ s ( 1 + A m μ s ) d n 2 18 17 2 1 18 1 18 19 18 d n 2 = 19 5202 d n 2 0.0037 d n 2 .
Next, we continue to estimate ξ n 2 . It is not difficult to see that
ξ n 2 = P L ( f n 1 i I n x i , n g i ) 2 = ξ n 1 P L ( i I n x i , n g i ) 2 = ξ n 1 2 2 ξ n 1 , i I n x i , n g i + P L ( i I n x i , n g i ) 2 ξ n 1 2 2 i I n x i , n ξ n 1 , g i + i I n x i , n g i 2 .
Note that
i I n x i , n ξ n 1 , g i = i I n [ ( x i , n f n 1 , g i ) + f n 1 , g i ] [ ( ξ n 1 , g i f n 1 , g i ) + f n 1 , g i ] .
By (18), for any i T 2 I n , we have
| ξ n 1 , g i f n 1 , g i | = | f n 1 j = 1 m a j , n 1 ψ j , g i f n 1 , g i | = | j = 1 m a j , n 1 ψ j , g i | j = 1 m | a j , n 1 | | ψ j , g i | k 1 m μ d n .
Combining Lemma 2 with inequality (26), we obtain
i I n x i , n ξ n 1 , g i i I n ( f n 1 , g i k 1 μ d n ) ( f n 1 , g i m k 1 μ d n ) = i I n ( f n 1 , g i 2 ( k 1 μ d n + m k 1 μ d n ) f n 1 , g i + m k 1 2 μ 2 d n 2 ) = i I n f n 1 , g i 2 i I n ( k 1 μ d n + m k 1 μ d n ) f n 1 , g i + i I n m k 1 2 μ 2 d n 2 i I n f n 1 , g i 2 i I n ( k 1 μ d n + m k 1 μ d n ) f n 1 , g i + m s k 1 2 μ 2 d n 2 i I n f n 1 , g i 2 ( k 1 μ + m k 1 μ ) d n 2 + m s k 1 2 μ 2 d n 2 s 1 d n 2 ( 1 + m ) k 1 μ d n 2 ( 9 A k 1 ) ( 1 + m ) μ d n 2 2 μ ( 9 A k 1 ) d n 2 576 17 μ d n 2
for 0 s 1 18 A m μ 1 9 ( 1 + m ) A μ . and m 1 , A 2 .
For the last summand of the right-hand side of the inequality in (24), we have
( i I n | x i , n | ) 2 ( i I n ( | f n 1 , g i | + k 1 μ d n ) ) 2 ( 1 + k 1 μ s ) 2 d n 2 18 17 2 d n 2 .
Thus, combining (27) with (28), for 3 100 μ 1 18 , we have
ξ n 2 ξ n 1 2 2 i I n x i , n ξ n 1 , g i + ( i I n | x i , n | ) 2 ξ n 1 2 + 18 17 2 2 576 17 μ d n 2 ξ n 1 2 0.9118 d n 2 .
We next estimate | ξ n , h | . Since
| ξ n , h | = | P L ( f n 1 i I n x i , n g i ) , h | = | ξ n 1 i I n x i , n P L ( g i ) , h | = | f n 1 j = 1 m a j , n 1 ψ j i I n x i , n P L ( g i ) , h | | j = 1 m a j , n 1 ψ j , h | + | i I n x i , n P L ( g i ) , h | j = 1 m | a j , n 1 | | ψ j , h | + i I n | x i , n | | P L ( g i ) , h | = : A + B ,
we need to give the upper bounds of A and B . By (18) and (19), we have
A = j = 1 m | a j , n 1 | | ψ j , h | j = 1 m | a j , n 1 | i T 2 n 1 | x i , n | | ψ j , g i | m k 1 d n μ t n 1 k 1 μ d n m k 1 d n μ A m s k 1 μ d n k 1 2 ( A m μ s ) ( m μ ) d n 2 0.0035 d n 2 .
As for B , since for 1 j ( n 1 ) s < i n s A m s , i , j T 2 ,   P L ( g i ) = l = 1 m c l i ψ l , by Lemma 1, we know that
max 1 l m | c l i | k 1 max 1 l m | g i , ψ l | k 1 μ ,
and
| P L ( g i ) , g j | = | g i P L ( g i ) , g j | = | g i l = 1 m c l i ψ l , g j | | g i , g j | + l = 1 m | c l i | | ψ l , g j | .
Combining (32) with (33), we have
| P L ( g i ) , g j | μ + m max 1 l m | c l i | μ μ + m μ k 1 μ 18 17 μ .
Using Lemma 1 again, we obtain from (34) that
B = i I n | x i , n | | P L ( g i ) , h | = i I n | x i , n | j T 2 n 1 | x j , n | | P L ( g i ) , g j | i I n ( k 1 μ d n + | f n 1 , g j | ) j T 2 n 1 | x j , n | 18 17 μ ( k 1 μ d n s + d n ) k 1 μ d n ( n 1 ) s 18 17 μ ( k 1 μ s + 1 ) ( k 1 μ ) ( A m s μ ) 18 17 d n 2 1 + 18 17 1 18 18 17 1 18 1 18 18 17 d n 2 0.0037 d n 2 .
Thus, we get the upper bound of | ξ n , h | by (30), (31) and (35), i.e.,
| ξ n , h | A + B 0.0072 d n 2 .
Combining (22), (23) and (29) with (36), we have
ξ n 2 = ξ n 2 2 ξ n , h + h 2 ξ n 2 + 2 | ξ n , h | + h 2 ξ n 1 2 0.9118 d n 2 + 2 ( 0.0072 d n 2 ) + 0.0037 d n 2 = ξ n 1 2 0.8937 d n 2 .
It remains to estimate P L ( f n ) . We first recall a lemma proven by Fang, Lin and Xu in [12].
Lemma 4.
Assume that a dictionary D has coherence μ . Then, we have, for any distinct g i D , a i R , i = 1 , 2 , , s , the inequalities
( 1 μ ( s 1 ) ) i = 1 s a i 2 i = 1 s a i g i 2 ( 1 + μ ( s 1 ) ) i = 1 s a i 2 .
Theorem 7.
For any 1 n A m , we have
P L ( f n ) 2 = j = 1 m a j , n ψ j 2 1.34 D .
Proof. 
From Lemma 4, we know that
P L ( f n ) 2 = j = 1 m a j , n ψ j 2 j = 1 m | a j , n | 2 ( 1 + m μ ) .
From Lemmas 1 and 2, we have, for any 1 l n + 1 ,
max 1 j m | a j , n | k 1 max 1 j m | f n , ψ j | k 1 i I n + 1 | f n , g i | = k 1 d n + 1 k 1 k 2 d l .
Thus,
j = 1 m | a j , n | 2 m max 1 j m | a j , n | 2 ( k 1 k 2 ) 2 i I l T 2 d l 2 ( k 1 k 2 ) 2 D .
Combining (37) with (38), we have
P L ( f n ) 2 j = 1 m | a j , m | 2 ( 1 + m μ ) ( k 1 k 2 ) 2 D 19 18 = ( k 1 k 2 ) 2 19 18 D 1.34 D .
Next, using Theorems 5 and 6, we give the estimation of D .
Theorem 8.
For A 1 and any positive integer m , the following inequalities hold.
D 1 2 1.07 ( 1 + ϵ ) σ m ( f ) , ϵ > 0 ,
ξ A m ξ 0 .
Proof. 
From (3), we have
ξ 0 2 = P L ( f ) 2 = f P L ( f ) 2 f j = 1 m b j ψ j 2 ( 1 + ϵ ) 2 σ m 2 ( f ) .
By using Theorems 5 and 6, we derive
( 1 + ϵ ) 2 σ m 2 ( f ) ξ 0 2 ξ 0 2 ξ A m 2 = n = 1 A m ( ξ n 1 2 ξ n 2 ) = I n T 2 = ( ξ n 1 2 ξ n 2 ) + I n T 2 ( ξ n 1 2 ξ n 2 ) I n T 2 = ( 0.22 D μ ) + I n T 2 0.8937 d n 2 0.22 D A m μ + 0.8937 D 0.0123 D + 0.8937 D = 0.8814 D > 0 ,
which is equivalent to
D 1 2 ( 0.8814 ) 1 2 ( 1 + ϵ ) σ m ( f ) 1.07 ( 1 + ϵ ) σ m ( f ) .
Furthermore, we also have
ξ A m ξ 0 .
Now, we can give the proof of our main result.
Proof of Theorem 4.
Note that
f A m = P L ( f A m ) + ξ A m P L ( f A m ) + ξ A m .
From Theorem 7 and Theorem 8, we obtain that
f A m 1.34 D 1 2 + ξ 0 1.34 D 1 2 + ( 1 + ϵ ) σ m ( f ) 1.34 1.07 ( 1 + ϵ ) σ m ( f ) + ( 1 + ϵ ) σ m ( f ) 2.24 ( 1 + ϵ ) σ m ( f ) .
Thus, we complete the proof of Theorem 4. □

4. Simulation Results

It is known from Theorem 4 that if f Σ m , then σ m ( f ) = 0 , and hence f = G m ( f ) . In this spirit, the OSGA can be used to recover sparse signals in compressed sensing, which is a new field of signal processing. We remark that in the field of signal processing, the orthogonal super greedy algorithm (OSGA) is also known as orthogonal multi-matching pursuit (OMMP). For the reader’s convenience, we will use the term OMMP instead of OSGA in what follows.
In this section, we test the performance of the orthogonal multi-matching pursuit with parameter s (OMMP(s)). We consider the following model. Suppose that x R N is an unknown N-dimensional signal and we wish to recover it by the given data
y = Φ x ,
where Φ R M × N is a known measurement matrix with M N . Furthermore, since M N , the column vectors of Φ are linearly dependent and the collection of these columns can be viewed as a redundant dictionary.
For arbitrary x , y R N , define
x , y = j = 1 N x j y j ,
and
x 2 = j = 1 N | x j | 2 1 / 2 ,
where x = ( x j ) j = 1 N and y = ( y j ) j = 1 N . Obviously, R N is a Hilbert space with the inner product · , · .
A signal x R N is said to be K-sparse if x 0 : = # supp ( x ) = # { i : x i 0 } K < N . We will recover the support of a K-sparse signal via OMMP(s) under the model (40). It is well known that OMMP takes the following form; see, for instance, [3].
ORTHOGONAL MULTI MATCHING PURSUIT (OMMP(s))
Input: Measurement matrix Φ , vector y, and s, the stopping criterion.
Step 1: Set the residual r 0 = y , an initial approximation x 0 : = 0 , the index set Λ 0 : = , and the iteration counter l = 0 .
Step 2: Define Λ l + 1 : = Λ l { i 1 , , i s } such that
| r l , φ i 1 | | r l , φ i s | sup φ Φ , φ φ i k , k = 1 , , s | r l , φ | .
Then,
x l : = arg min z : supp ( z ) Λ l + 1 y Φ z 2
and update the residual
r l + 1 : = y Φ x l + 1 .
End if the stopping condition is achieved. Otherwise, we set l : = l + 1 and turn to step 2.
Output: If the algorithm stops at the kth iteration, then output Λ k and x ^ Λ k = x k .
In the experiment, we set the measurement matrix Φ to be a Gaussian matrix where each entry is selected from the N ( 0 , M 1 ) distribution and the density function of this distribution is p ( x ) : = 1 2 π M e x 2 M / 2 . We execute OMMP(s) with the data vector y = Φ x and stop the algorithm when # Λ l K . The mean square error(MSE) of x is defined as follows:
MSE = 1 N j = 1 N ( x j x ^ Λ k , j ) 2 .
Figure 1 shows the performance of OMMP(s) with s = 5 for an input signal in dimension N = 512 with sparsity level K = 50 and number of measurements M = 200 , where the red line represents the original signal and the black squares represent the approximation. By repeating the test 1000 times, we calculate the mean square error: MSE = 1.1894 × 10 8 .
Figure 2 describes the case of the dimension N = 256 . It displays which percentage (the average of 100 input signals) of the elements in support can be found correctly as a function of M with s = 3 . If the percentage equals 100 % , it means that all the elements in support can be found, which implies that the input signal can be exactly recovered. As expected, Figure 2 shows that when the sparsity level K increases, more measurements are necessary to guarantee signal recovery.

5. Concluding Remarks

This paper investigates the error behavior of the orthogonal super greedy algorithm OSGA with respect to μ -coherent dictionaries. The OSGA is simpler than the OGA from the viewpoint of the computational complexity. Under the assumption that the coherence parameter μ has a lower bound, we establish the ideal Lebesgue-type inequality for the OSGA, which shows that the OSGA provides an almost optimal approximation on the first [ 1 / ( 18 μ s ) ] steps. Moreover, we improve the asymptotic constant in the Lebesgue-type inequality of the OGA obtained in [19]. We develop some new techniques to obtain our results. We found that there exists a strong dependency between the constant A and the coherence parameter μ in (2). The specific constant 2.24 is not the best; we can change it by adjusting the values of A and μ , but the best one is still unknown. In fact, we do not even know if such a constant exists. We will continue to study the improvement of the Lebesgue constant in our future work. As for the applications of the OSGA, our simulation results show that OSGA is very efficient for recovering sparse signals.

Author Contributions

Authors contribute evenly in this paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

The authors would like to thank the referees, the editors, Zhang Haizhang and Xu Xu for their very useful suggestions, which significantly improved this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. DeVore, R.A. Nonlinear approximation. Acta Numer. 1998, 7, 51–150. [Google Scholar] [CrossRef] [Green Version]
  2. Barron, A.R.; Cohen, A.; Dahmen, W.; DeVore, R. Approximation and learning by greedy algorithms. Ann. Statist. 2008, 36, 64–94. [Google Scholar] [CrossRef]
  3. Wei, D. Analysis of orthogonal multi-matching pursuit under restricted isometry property. Sci. China Math. 2014, 57, 2179–2188. [Google Scholar]
  4. Temlyakov, V.N.; Zheltov, P. On performance of greedy algorithms. J. Approx. Theory. 2011, 163, 1134–1145. [Google Scholar] [CrossRef] [Green Version]
  5. Tropp, J.A.; Wright, S. Computational methods for sparse solution of linear inverse problems. P. IEEE. 2010, 98, 948–958. [Google Scholar] [CrossRef] [Green Version]
  6. Donoho, D.L.; Tsaig, T.; Drori, O.; Starck, J.L. Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit. IEEE Trans. Inf. Theory. 2012, 58, 1094–1121. [Google Scholar] [CrossRef]
  7. Wu, R.; Huang, W.; Chen, D.R. The exact support recovery of sparse signals with noise via orthogonal matching pursuit. IEEE Signal Proc. Let. 2013, 20, 403–406. [Google Scholar] [CrossRef]
  8. Cai, T.; Wang, L. Orthogonal matching pursuit for sparse signal recovery with noise. IEEE Trans. Inf. Theory. 2011, 57, 4680–4688. [Google Scholar] [CrossRef]
  9. Lin, J.H.; Li, S. Nonuniform support recovery from noisy measurements by orthogonal matching pursuit. J. Approx. Theory. 2013, 165, 20–40. [Google Scholar] [CrossRef] [Green Version]
  10. Cohen, A.; Dahmen, W.; DeVore, A. Orthogonal matching pursuit under the restricted isometry property. Constr. Approx. 2017, 45, 113–127. [Google Scholar] [CrossRef] [Green Version]
  11. Liu, E.; Temlyakov, V.N. The orthogonal super greedy algorithm and applications in compressed sensing. IEEE Trans. Inf. Theory. 2012, 58, 2040–2047. [Google Scholar] [CrossRef]
  12. Fang, J.; Lin, S.B.; Xu, Z.B. Learning and approximation capabilities of orthogonal super greedy algorithm. Knowl-Based Syst. 2016, 95, 86–98. [Google Scholar] [CrossRef]
  13. Berná, P.M.; Garrigós, G.; Óscar, B. Lebesgue inequalities for the greedy algorithm in general bases. Rev. Matem Compl. 2017, 30, 369–392. [Google Scholar] [CrossRef] [Green Version]
  14. Shao, C.F.; Ye, P.X. Lebesgue constants for Chebyshev thresholding greedy algorithms. J. Inequal Appl. 2018, 2018, 102–124. [Google Scholar] [CrossRef] [Green Version]
  15. Berná, P.M.; Blasco, Ó.; Garrigós, G.; Hernández, E.; Oikhberg, T. Lebesgue inequalities for Chebyshev thresholding greedy algorithms. Rev. Mat. Complut. 2020, 33, 695–722. [Google Scholar] [CrossRef] [Green Version]
  16. Gilbert, A.J.; Muthukrishnan, S.; Strauss, M.J. Approximation of functions over redundant dictionaries using coherence. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, 11 January 2003; ACM: New York, NY, USA, 2003; pp. 234–252. [Google Scholar]
  17. Tropp, J.A. Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inf. Theory. 2004, 50, 2231–2242. [Google Scholar] [CrossRef] [Green Version]
  18. Donoho, D.L.; Elad, M.; Temlyakov, V.N. On Lebesgue-type inequalities for greedy approximation. J. Approx. Theory. 2007, 147, 185–195. [Google Scholar] [CrossRef] [Green Version]
  19. Livshitz, E.D. On the optimality of the Orthogonal Greedy Algorithm for μ-coherent dictionaries. J. Approx. Theory. 2012, 164, 668–681. [Google Scholar] [CrossRef] [Green Version]
  20. Ye, P.X.; Wei, X.J. Lebesgue-type inequality for Orthogonal Matching Pursuit for μ-coherent dictionaries. TELKOMNIKA. Indo. J. Elec. Eng. 2013, 11, 213–226. [Google Scholar]
Figure 1. The recovery of an input signal in dimension N = 512 with sparsity level K = 50 , number of measurements M = 200 and s = 5 .
Figure 1. The recovery of an input signal in dimension N = 512 with sparsity level K = 50 , number of measurements M = 200 and s = 5 .
Axioms 11 00186 g001
Figure 2. The average percentage of elements in support found correctly (100 input signals) as a function of the number of measurements M for different sparsity levels K in dimension N = 256 with s = 3 .
Figure 2. The average percentage of elements in support found correctly (100 input signals) as a function of the number of measurements M for different sparsity levels K in dimension N = 256 with s = 3 .
Axioms 11 00186 g002
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Shao, C.; Chang, J.; Ye, P.; Zhang, W.; Xing, S. Almost Optimality of the Orthogonal Super Greedy Algorithm for μ-Coherent Dictionaries. Axioms 2022, 11, 186. https://doi.org/10.3390/axioms11050186

AMA Style

Shao C, Chang J, Ye P, Zhang W, Xing S. Almost Optimality of the Orthogonal Super Greedy Algorithm for μ-Coherent Dictionaries. Axioms. 2022; 11(5):186. https://doi.org/10.3390/axioms11050186

Chicago/Turabian Style

Shao, Chunfang, Jincai Chang, Peixin Ye, Wenhui Zhang, and Shuo Xing. 2022. "Almost Optimality of the Orthogonal Super Greedy Algorithm for μ-Coherent Dictionaries" Axioms 11, no. 5: 186. https://doi.org/10.3390/axioms11050186

APA Style

Shao, C., Chang, J., Ye, P., Zhang, W., & Xing, S. (2022). Almost Optimality of the Orthogonal Super Greedy Algorithm for μ-Coherent Dictionaries. Axioms, 11(5), 186. https://doi.org/10.3390/axioms11050186

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop