Abstract
We study the approximation capability of the orthogonal super greedy algorithm (OSGA) with respect to -coherent dictionaries in Hilbert spaces. We establish the Lebesgue-type inequalities for OSGA, which show that the OSGA provides an almost optimal approximation on the first steps. Moreover, we improve the asymptotic constant in the Lebesgue-type inequality of OGA obtained by Livshitz E D.
Keywords:
orthogonal super greedy algorighm; coherence; best n-term approximation; Lebesgue-type inequality MSC:
41A65; 41A50; 41A46
1. Introduction
Approximation by the sparse linear combination of elements from a fixed redundant system continues to develop actively, which is driven not only by theoretical interest but also by frequent applications from areas such as signal processing and machine learning, cf. [1,2,3,4,5,6,7]. This type of approximation is called highly nonlinear approximation. Greedy-type algorithms have been used as a tool for generating such approximations. Among others, the orthogonal greedy algorithm (OGA) has been widely used in practice. In fact, the OGA is regarded as the most powerful algorithm to solve the problem of approximation with respect to redundant systems, cf. [8,9,10].
We recall some notations and definitions from the theory of greedy algorithms. Let be a Hilbert space with an inner product and the norm We say that a set of elements from is a dictionary if
We consider redundant dictionaries, which have been utilized frequently in the field of signal processing. Here, a redundant dictionary means that the elements of the dictionary may be linearly dependent.
We now recall the definition of the OGA from [1].
ORTHOGONAL GREEDY ALGORITHM (OGA)
Set For each , we inductively find such that
and define
where is the operator of the orthogonal projection onto .
In [11], Liu and Temlyakov proposed the orthogonal super greedy algorithm (OSGA). The OSGA selects more than one element from a dictionary in each iteration step and hence reduces the computational burden of the conventional OGA. Therefore, the OSGA is more efficient than the OGA from the viewpoint of the computational complexity.
ORTHOGONAL SUPER GREEDY ALGORITHM (OSGA(s))
Set For a natural number and each , we inductively define:
- (1)
- are elements of the dictionary satisfying the following inequality. Denote and assume that
- (2)
- Let and denote the operator of the orthogonal projection onto Define
- (3)
- Define the residual after the m-th iteration of the algorithm
Note that, in the case OSGA(s) coincides with OGA.
In this paper, we study the approximation capability of the OSGA with respect to -coherent dictionaries in Hilbert spaces. We denote by
the coherence of a dictionary. The coherence is a blunt instrument to measure the redundancy of dictionaries. It is clear that if is an orthonormal basis, then . The smaller the the more the resembles an orthonormal basis. We study dictionaries with small values of coherence and call them -coherent dictionaries.
In [11], the authors found that such computational burden reduction of OSGA does not degrade the approximation capability if f belongs to the closure of the convex hull of the symmetrized dictionary , which is denoted by .
Theorem 1.
Let be a dictionary with coherence parameter Then, for the algorithm OSGA(s) provides an approximation of with the following error bound:
It seems that a dimensional independent convergence rate was deduced, but the condition that the target element belongs to becomes more and more stringent as the number of the elements in grows, cf. [2].
Fang, Lin, and Xu [12] studied the behavior of OSGA for They defined and for , and obtained the following theorem.
Theorem 2.
Let be a dictionary with coherence Then, for all and arbitrary the OSGA(s) provides an approximation of f with the error bound:
The -coherence of a dictionary is used in OSGA, which implies that computational burden reduction does not degenerate the approximation capability. Moreover, if then OSGA coincides with OGA.
Let denote the collection of elements in , which can be expressed as a linear combination of, at most, m elements of the dictionary , namely
For an element , we define its best m-term approximation error by
The inequality connecting the error of greedy approximation and the error of best m-term approximation is called the Lebesgue-type inequality, cf. [13,14,15]. In this paper, we will establish the Lebesgue-type inequalities for OSGA with respect to -coherent dictionaries.
We first recall some results on the efficiency of OGA with respect to -coherent dictionaries. These results relate the error of OGA’s -th approximation to the error of the best m-term approximation with an extra multiplier:
where Gillbert, Muthukrishnan, and Strauss [16] gave the first Lebesgue-type inequality for OGA. They proved
The constant in the above inequality was improved by Tropp in [17]:
Donoho, Elad, and Temlyakov [18] dramatically improved the factor in front of and obtained that
where the constant 24 is not the best. Many researchers have sought to improve the factor Temlyakov and Zheltov improved the above inequality in [4]. They obtained
Livshitz [19] took the parameters in (1) and obtained the following profound result.
Theorem 3.
For every μ-coherent dictionary and any the OGA applied to f provides
By using the same method as in [19], Ye and Wei [20] improved slightly the constant 2.7.
Based on the above works, we give the error bound of the form (1) for OSGA with respect to dictionaries with small but non-vanishing coherence.
Theorem 4.
Let be a dictionary with coherence Then, for any and any the OSGA(s) applied to f provides
for all and an absolute constant
Remark 1.
- 1.
- We remark that the values of μ and A for which (2) holds are coupled. For example, it is possible to obtain a smaller value of μ at the price of a larger value of A. Moreover, for sufficiently large A, μ can be arbitrarily close to zero.
- 2.
- Our results improve Theorem 3 only in the asymptotic constant and not in the rate. Under the condition of Theorem 4, for taking as we can obtain Comparing it with Theorem 3, the constant that we obtain is better.
- 3.
- The specific constant 2.24 in (2) is not the best. By adjusting parameters A and we can obtain a more general estimation:for where and are interdependent. Thus, Theorem 4 shows that OSGA(s) can achieve an almost optimal approximation on the first steps for dictionaries with small but non-vanishing coherence.
The paper is organized as follows. In Section 2, we establish several preliminary lemmas. In Section 3, for some closed subspace L of as defined below, we first give the estimations of in different situations based on the lemmas in Section 2. Then, we estimate the Finally, combining the above two estimations, we provide the detailed proof of Theorem 4. In Section 4, we test the performance of the OSGA in the case of finite dimensional Euclidean space. In Section 5, we make some concluding remarks on our work.
2. Preliminary Lemmas
In this section, we will introduce several quantities and discuss their properties, which are important to the proof of our main result. By the condition of Theorem 4, we have
We establish three preliminary lemmas.
Lemma 1.
Let Assume that
then, we have
where
Proof.
For any we have
This implies
where
□
For by the definition of there exist such that
For we set
Assume that satisfying the equation
Next, we give the estimates of and for in turn. Applying Lemma 1, we have the following estimates for .
Lemma 2.
For we have
Proof.
Define
Since
and, for any we have
For we have
Let Combining (5) with (6), we have
Thus, for for By Lemma 1 and inequalities (7) and (8), we obtain
Thus, for
for
□
We proceed to the estimate of
Lemma 3.
For any we have
where
Proof.
For according to the definition of , we have
We continue to estimate the two summands of the right-hand side of the above inequality. For the first summand, the greedy step implies
For the second summand, by Lemma 1, we have
Combining inequalities (9)–(11) with Lemma 2, we conclude that
Thus, for any n and we have
where □
3. Proof of Theorem 4
Based on the above preliminary lemmas, we will prove Theorem 4 step by step. We first introduce some notations. Define
For we define
Let satisfy the following equations
Thus, for , we have
To obtain the upper bound of it suffices to estimate and By the definitions of sets and in OSGA, we first give the estimate of according to whether the intersection of and is an empty set.
Theorem 5.
Let n satisfy and Then,
Proof.
Let
By Lemma 3, for
Then, we have
so, we can obtain that
Since we obtain We define
Note that
By the definitions of and the expression of (14), we have . Then, we obtain
To obtain the final result, it suffices to estimate the upper bounds of and
For , by (12) and (14), we have
where we have used the fact
On the one hand, for any and n satisfying we obtain
Thus, by Lemma 1 and inequality (17), we obtain
On the other hand, by Lemma 2, we have, for
Thus, substituting (18) and (19) into (16), and then combining it with (13), we get the estimate
Finally, we estimate
Note that
By using (13), we have
Combining (15) and (20) with (21), we give
□
Theorem 5 gives the estimation of in the situation The following theorem deals with the situation .
Theorem 6.
Let n satisfy and Then,
Proof.
Since
we set and write as
According to the following inequality,
we need to estimate and We first estimate by
Next, we continue to estimate It is not difficult to see that
Note that
By (18), for any we have
Combining Lemma 2 with inequality (26), we obtain
for and
For the last summand of the right-hand side of the inequality in (24), we have
Thus, combining (27) with (28), for we have
We next estimate Since
we need to give the upper bounds of A and By (18) and (19), we have
As for since for by Lemma 1, we know that
and
Combining (32) with (33), we have
Using Lemma 1 again, we obtain from (34) that
Thus, we get the upper bound of by (30), (31) and (35), i.e.,
Combining (22), (23) and (29) with (36), we have
□
It remains to estimate . We first recall a lemma proven by Fang, Lin and Xu in [12].
Lemma 4.
Assume that a dictionary has coherence Then, we have, for any distinct the inequalities
Theorem 7.
For any we have
Proof.
From Lemma 4, we know that
From Lemmas 1 and 2, we have, for any
Thus,
Combining (37) with (38), we have
□
Next, using Theorems 5 and 6, we give the estimation of
Theorem 8.
For and any positive integer the following inequalities hold.
Proof.
From (3), we have
By using Theorems 5 and 6, we derive
which is equivalent to
Furthermore, we also have
□
Now, we can give the proof of our main result.
Proof of Theorem 4.
Note that
From Theorem 7 and Theorem 8, we obtain that
Thus, we complete the proof of Theorem 4. □
4. Simulation Results
It is known from Theorem 4 that if , then , and hence . In this spirit, the OSGA can be used to recover sparse signals in compressed sensing, which is a new field of signal processing. We remark that in the field of signal processing, the orthogonal super greedy algorithm (OSGA) is also known as orthogonal multi-matching pursuit (OMMP). For the reader’s convenience, we will use the term OMMP instead of OSGA in what follows.
In this section, we test the performance of the orthogonal multi-matching pursuit with parameter s (OMMP(s)). We consider the following model. Suppose that is an unknown N-dimensional signal and we wish to recover it by the given data
where is a known measurement matrix with . Furthermore, since , the column vectors of are linearly dependent and the collection of these columns can be viewed as a redundant dictionary.
For arbitrary , define
and
where and Obviously, is a Hilbert space with the inner product .
A signal is said to be K-sparse if . We will recover the support of a K-sparse signal via OMMP(s) under the model (40). It is well known that OMMP takes the following form; see, for instance, [3].
ORTHOGONAL MULTI MATCHING PURSUIT (OMMP(s))
Input: Measurement matrix , vector y, and s, the stopping criterion.
Step 1: Set the residual , an initial approximation , the index set , and the iteration counter .
Step 2: Define such that
Then,
and update the residual
End if the stopping condition is achieved. Otherwise, we set and turn to step 2.
Output: If the algorithm stops at the kth iteration, then output and .
In the experiment, we set the measurement matrix to be a Gaussian matrix where each entry is selected from the distribution and the density function of this distribution is . We execute OMMP(s) with the data vector and stop the algorithm when . The mean square error(MSE) of x is defined as follows:
Figure 1 shows the performance of OMMP(s) with for an input signal in dimension with sparsity level and number of measurements , where the red line represents the original signal and the black squares represent the approximation. By repeating the test 1000 times, we calculate the mean square error: MSE = .
Figure 1.
The recovery of an input signal in dimension with sparsity level , number of measurements and .
Figure 2 describes the case of the dimension . It displays which percentage (the average of 100 input signals) of the elements in support can be found correctly as a function of M with . If the percentage equals , it means that all the elements in support can be found, which implies that the input signal can be exactly recovered. As expected, Figure 2 shows that when the sparsity level K increases, more measurements are necessary to guarantee signal recovery.
Figure 2.
The average percentage of elements in support found correctly (100 input signals) as a function of the number of measurements M for different sparsity levels K in dimension with .
5. Concluding Remarks
This paper investigates the error behavior of the orthogonal super greedy algorithm OSGA with respect to -coherent dictionaries. The OSGA is simpler than the OGA from the viewpoint of the computational complexity. Under the assumption that the coherence parameter has a lower bound, we establish the ideal Lebesgue-type inequality for the OSGA, which shows that the OSGA provides an almost optimal approximation on the first steps. Moreover, we improve the asymptotic constant in the Lebesgue-type inequality of the OGA obtained in [19]. We develop some new techniques to obtain our results. We found that there exists a strong dependency between the constant A and the coherence parameter in (2). The specific constant 2.24 is not the best; we can change it by adjusting the values of A and but the best one is still unknown. In fact, we do not even know if such a constant exists. We will continue to study the improvement of the Lebesgue constant in our future work. As for the applications of the OSGA, our simulation results show that OSGA is very efficient for recovering sparse signals.
Author Contributions
Authors contribute evenly in this paper. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Acknowledgments
The authors would like to thank the referees, the editors, Zhang Haizhang and Xu Xu for their very useful suggestions, which significantly improved this paper.
Conflicts of Interest
The authors declare no conflict of interest.
References
- DeVore, R.A. Nonlinear approximation. Acta Numer. 1998, 7, 51–150. [Google Scholar] [CrossRef] [Green Version]
- Barron, A.R.; Cohen, A.; Dahmen, W.; DeVore, R. Approximation and learning by greedy algorithms. Ann. Statist. 2008, 36, 64–94. [Google Scholar] [CrossRef]
- Wei, D. Analysis of orthogonal multi-matching pursuit under restricted isometry property. Sci. China Math. 2014, 57, 2179–2188. [Google Scholar]
- Temlyakov, V.N.; Zheltov, P. On performance of greedy algorithms. J. Approx. Theory. 2011, 163, 1134–1145. [Google Scholar] [CrossRef] [Green Version]
- Tropp, J.A.; Wright, S. Computational methods for sparse solution of linear inverse problems. P. IEEE. 2010, 98, 948–958. [Google Scholar] [CrossRef] [Green Version]
- Donoho, D.L.; Tsaig, T.; Drori, O.; Starck, J.L. Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit. IEEE Trans. Inf. Theory. 2012, 58, 1094–1121. [Google Scholar] [CrossRef]
- Wu, R.; Huang, W.; Chen, D.R. The exact support recovery of sparse signals with noise via orthogonal matching pursuit. IEEE Signal Proc. Let. 2013, 20, 403–406. [Google Scholar] [CrossRef]
- Cai, T.; Wang, L. Orthogonal matching pursuit for sparse signal recovery with noise. IEEE Trans. Inf. Theory. 2011, 57, 4680–4688. [Google Scholar] [CrossRef]
- Lin, J.H.; Li, S. Nonuniform support recovery from noisy measurements by orthogonal matching pursuit. J. Approx. Theory. 2013, 165, 20–40. [Google Scholar] [CrossRef] [Green Version]
- Cohen, A.; Dahmen, W.; DeVore, A. Orthogonal matching pursuit under the restricted isometry property. Constr. Approx. 2017, 45, 113–127. [Google Scholar] [CrossRef] [Green Version]
- Liu, E.; Temlyakov, V.N. The orthogonal super greedy algorithm and applications in compressed sensing. IEEE Trans. Inf. Theory. 2012, 58, 2040–2047. [Google Scholar] [CrossRef]
- Fang, J.; Lin, S.B.; Xu, Z.B. Learning and approximation capabilities of orthogonal super greedy algorithm. Knowl-Based Syst. 2016, 95, 86–98. [Google Scholar] [CrossRef]
- Berná, P.M.; Garrigós, G.; Óscar, B. Lebesgue inequalities for the greedy algorithm in general bases. Rev. Matem Compl. 2017, 30, 369–392. [Google Scholar] [CrossRef] [Green Version]
- Shao, C.F.; Ye, P.X. Lebesgue constants for Chebyshev thresholding greedy algorithms. J. Inequal Appl. 2018, 2018, 102–124. [Google Scholar] [CrossRef] [Green Version]
- Berná, P.M.; Blasco, Ó.; Garrigós, G.; Hernández, E.; Oikhberg, T. Lebesgue inequalities for Chebyshev thresholding greedy algorithms. Rev. Mat. Complut. 2020, 33, 695–722. [Google Scholar] [CrossRef] [Green Version]
- Gilbert, A.J.; Muthukrishnan, S.; Strauss, M.J. Approximation of functions over redundant dictionaries using coherence. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, 11 January 2003; ACM: New York, NY, USA, 2003; pp. 234–252. [Google Scholar]
- Tropp, J.A. Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inf. Theory. 2004, 50, 2231–2242. [Google Scholar] [CrossRef] [Green Version]
- Donoho, D.L.; Elad, M.; Temlyakov, V.N. On Lebesgue-type inequalities for greedy approximation. J. Approx. Theory. 2007, 147, 185–195. [Google Scholar] [CrossRef] [Green Version]
- Livshitz, E.D. On the optimality of the Orthogonal Greedy Algorithm for μ-coherent dictionaries. J. Approx. Theory. 2012, 164, 668–681. [Google Scholar] [CrossRef] [Green Version]
- Ye, P.X.; Wei, X.J. Lebesgue-type inequality for Orthogonal Matching Pursuit for μ-coherent dictionaries. TELKOMNIKA. Indo. J. Elec. Eng. 2013, 11, 213–226. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).