Next Article in Journal
Scattering in the Energy Space for Solutions of the Damped Nonlinear Schrödinger Equation on Rd×T
Previous Article in Journal
Numerical Method for Band Gap Structure and Dirac Point of Photonic Crystals Based on Recurrent Neural Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Stability of Weak Rescaled Pure Greedy Algorithms

by
Wan Li
1,
Man Lu
1,
Peixin Ye
1,* and
Wenhui Zhang
2
1
Department of Applied Mathematics, School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
2
School of Mathematics and Statistics, Guangxi Normal University, Guilin 541006, China
*
Author to whom correspondence should be addressed.
Axioms 2025, 14(6), 446; https://doi.org/10.3390/axioms14060446
Submission received: 8 April 2025 / Revised: 23 May 2025 / Accepted: 4 June 2025 / Published: 6 June 2025
(This article belongs to the Section Mathematical Analysis)

Abstract

We study the stability of Weak Rescaled Pure Greedy Algorithms for convex optimization, WRPGA(co), in general Banach spaces. We obtain the convergence rates of WRPGA(co) with noise and errors under a weaker assumption for the modulus of smoothness of the objective function. The results show that the rate is almost the same as that of WRPGA(co) without noise and errors, which is optimal and independent of the spatial dimension. This makes WRPGA(co) more practically applicable and scalable for high-dimensional data. Furthermore, we apply WRPGA(co) with errors to the problem of m-term approximation and derive the optimal convergence rate. This indicates the flexibility of WRPGA(co) and its wide utility across machine learning and signal processing. Our numerical experiments verify the stability of WRPGA(co). Thus, WRPGA(co) is a desirable choice for practical implementation.

1. Introduction

It is well known that problems with m-term approximation arise naturally in image processing, statistical learning, artificial neural networks, numerical solutions of PDEs, and other fields. It is also well known that greedy-type algorithms can effectively generate such approximations. We provide an overview of some important results for greedy algorithms for m-term approximation.
Let H be a real Hilbert space with an inner product · , · . The norm induced by this inner product is x H = x , x 1 / 2 . A subset D of H is called a dictionary if φ H = 1 for any φ D and the closure of span of D is H. For any element f H , we define its best m-term approximation error by
σ m ( f ) : = inf # Λ = m , c k R , φ k D f k Λ c k φ k H .
We use greedy algorithms to obtain an m-term approximation. The core idea behind greedy algorithms is to reduce the problem of best m-term approximation to that of best one-term approximation. It is not difficult to see that the best one-term approximation of f is
f , φ φ ,
where φ = arg max g D f , g . Then, by iterating the best one-term approximation m times, one can obtain an m-term approximation of f. This algorithm is called the Pure Greedy Algorithm (PGA), which was defined in [1] as follows:
  • PGA ( H , D )
  • Step 0: Given f H , set f 0 = 0 .
  • Step m:
    -
    If f = f m 1 , then finish the algorithm and define f k = f m 1 = f for k m .
    -
    If f f m 1 , then choose an element φ m D such that
    | f f m 1 , φ m | = sup φ D | f f m 1 , φ | .
Define the next approximant to be
f m = f m 1 + f f m 1 , φ m φ m ,
and proceed to Step m + 1 .
To obtain the convergence rate of the PGA, we always assume that the target element f belongs to some basic sparse classes; see [2]. Let D H be a dictionary and M be a positive real number. We define the class
A 1 0 ( D , M ) : = f H : f = k Λ c k φ k , φ k D , # ( Λ ) < , k Λ | c k | M
and define A 1 ( D , M ) to be its closure in H. Let
A 1 ( D ) = M > 0 A 1 ( D , M ) .
For f A 1 ( D ) , we define its norm by f A 1 ( D ) : = inf M : f A 1 ( D , M ) .
It is known from [1] that the optimal convergence rate of the best m-term approximation on A 1 ( D ) for all dictionaries D H is O ( m 1 / 2 ) . Livshitz and Temlyakov in [3] proved that there is a dictionary D H , a constant c > 0 , and an element f A 1 ( D , 1 ) such that
f f m PGA H c m 0.27 , m = 1 , 2 , .
Thus, the PGA fails to reach the optimal convergence rate O ( m 1 / 2 ) on A 1 ( D , 1 ) for all dictionaries D . This leads to various modifications of the PGA. Among others, the Orthogonal Greedy Algorithm (OGA) and the Rescaled Pure Greedy Algorithm (RPGA) were shown to achieve the optimal convergence rate O ( m 1 / 2 ) .
We first recall the OGA from [1].
  • OGA ( H , D )
  • Step 0: Given f H , set f 0 = 0 .
  • Step m:
    -
    If f = f m 1 , then finish the algorithm and define f k = f m 1 = f for k m .
    -
    If f f m 1 , then choose an element φ m D such that
    | f f m 1 , φ m | = sup φ D | f f m 1 , φ | .
Denote H m : = span { φ 1 , φ 2 , · · · , φ m } . Define the next approximant to be
f m = P m f ,
where P m f is the orthogonal projection of f on H m , and proceed to Step m + 1 .
Clearly, f m OGA is the best approximation of f on H m . DeVore and Temlyakov in [1] obtained the following convergence rate of the OGA for f A 1 ( D ) :
f f m OGA H f A 1 ( D ) m 1 / 2 , m = 1 , 2 , .
Then, we recall from [4] another modification of the PGA, which is called the Rescaled Pure Greedy Algorithm (RPGA).
  • RPGA ( H , D )
  • Step 0: Given f H , set f 0 = 0 .
  • Step m:
    -
    If f = f m 1 , then finish the algorithm and define f k = f m 1 = f for k m .
    -
    If f f m 1 , then choose an element φ m D such that
    | f f m 1 , φ m | = sup φ D | f f m 1 , φ | .
With
λ m : = f f m 1 , φ m , f ^ m : = f m 1 + λ m φ m , s m : = f , f ^ m f ^ m H 2 ,
define the next approximant to be
f m : = s m f ^ m
and proceed to Step m + 1 .
Note that if the output at each Step m was f ^ m and not f m = s m f ^ m , this would be the PGA. However, the RPGA does not use f ^ m but the best approximation of f from the one-dimensional space span { f ^ m } , that is, s m f ^ m . It is clear that the RPGA is simpler than the OGA. At Step m, the OGA needs to solve the m-dimensional optimization problem, while the RPGA only needs to solve the one-dimensional optimization problem. This implies that the RPGA has less computational complexity.
Moreover, Petrova in [4] derived the following convergence rate of the RPGA ( H , D ) :
Theorem 1 
([4]). If f A 1 ( D ) H , then the output { f m } m 0 of the RPGA ( H , D ) satisfies
f f m H f A 1 ( D ) ( m + 1 ) 1 / 2 , m = 0 , 1 , .
Theorem 1 shows that the RPGA can achieve the optimal convergence rate on A 1 ( D ) . Based on these advantages, the RPGA has been successfully applied to machine learning and signal processing, see [5,6,7,8,9]. Moreover, a generalization of RPGA can be used to solve the convex optimization problem; see [10].
Note that when selecting φ m in the above algorithms, the supremum in the equality
| f f m 1 , φ m | = sup φ D | f f m 1 , φ |
may not be attainable. Thus, we replace the greedy condition by a weaker condition:
| f f m 1 , φ m | t m sup φ D | f f m 1 , φ | ,
where 0 < t m 1 . Greedy algorithms that use this weak condition are called weak greedy algorithms. In this way, we can obtain the Weak Rescaled Pure Greedy Algorithm (WRPGA).
The following convergence rate of the WRPGA ( H , D ) was obtained in [4]:
Theorem 2 
([4]). If f A 1 ( D ) H , then the output { f m } m 0 of the WRPGA ( H , D ) satisfies the inequality
f f m H f A 1 ( D ) k = 1 m t k 2 1 / 2 , m 1 .
Some greedy algorithms have been extended to the setting of Banach spaces. Let X be a real Banach space with norm · X . A set D X is called a dictionary if each φ D has norm φ X = 1 , and the closure of s p a n D is X. Let X * denote the set of all continuous linear functionals on X. It is known from the Hahn–Banach theorem that for every non-zero x X , there exists F x X * such that
F x = 1 and F x ( x ) = x X .
The functional F x is called the norming functional of x. The modulus of smoothness of X is defined as
ρ ( u ) : = sup x , y X , x X = y X = 1 x + u y X + x u y X 2 1 , u > 0 .
If lim u 0 ρ ( u ) u = 0 , then the Banach space X is called uniformly smooth. We say that the modulus of smoothness ρ ( u ) is of power type 1 < q 2 if ρ ( u ) γ u q with some γ > 0 . The sparse class A 1 ( D ) in Banach space X is defined in the same way as in Hilbert space H.
We recall from [4] the definition of the WRPGA in the setting of Banach spaces.
  • WRPGA ( X , D )
  • Step 0: Given f H , set f 0 = 0 .
  • Step m:
    -
    If f = f m 1 , then finish the algorithm and define f k = f m 1 = f for k m .
    -
    If f f m 1 , then choose an element φ m D such that
    F f f m 1 ( φ m ) t m sup φ D | F f f m 1 ( φ ) | .
With
λ m : = sign { F f f m 1 ( φ m ) } f f m 1 X ( 2 γ q ) 1 1 q | F f f m 1 ( φ m ) | 1 q 1 , f ^ m : = f m 1 + λ m φ m ,
choose s m such that
f s m f ^ m X : = min s R f s f ^ m X .
Define the next approximant to be
f m : = s m f ^ m
and proceed to Step m + 1 .
Petrova in [4] derived the following convergence rate of the WRPGA ( X , D ) :
Theorem 3 
([4]). Let X be a Banach space with ρ ( u ) γ u q , 1 < q 2 . If f A 1 ( D ) X , then the output { f m } m 0 of the WRPGA ( X , D ) satisfies
f f m X c f A 1 ( D ) k = 1 m t k q q 1 1 / q 1 .
Observe that the best m-term approximation problem with respect to a given dictionary D is essentially an optimization problem for a norm function E ( x ) : = f x X . In fact, the norming functional F f f m 1 used in the selection of φ m in the greedy algorithms is closely related to the function E ( x ) : = f x X . Their relationship is given in Section 3. Naturally, one can use a similar technique, such as a greedy strategy used in m-term approximation, to handle optimization problems, especially convex optimization problems.
Now, we turn to the problem of convex optimization. Convex optimization occurs in many fields of modern science and engineering, such as automatic control systems, data analysis and modeling, statistical estimation and learning, and communications and networks; see [11,12,13,14,15,16] and the references therein. The problem of convex optimization can be formulated as follows:
Let X be a Banach space and Ω be a bounded convex subset of X. Let E be a function on X satisfying
E ( α x + β y ) α E ( x ) + β E ( y ) , x , y Ω , α , β 0 , and α + β = 1 .
We aim to find an approximate solution to the problem
inf x Ω E ( x ) .
Usually, the domain X of the objective function E in classical convex optimization is finite-dimensional, while some applications of convex optimization require that X has a large dimension, even an infinite dimension, which may require that the rate of convergence of a numerical algorithm to be independent of the dimension of X. Otherwise, it may suffer from the curse of dimensionality. It is well known that the convergence rates of greedy-type algorithms are independent of the dimension of X. Thus, greedy-type algorithms are increasingly used in solving (5); see [17,18,19]. The main idea is that the algorithm outputs x m after m iterations such that E ( x m ) is a good approximation of
inf x Ω E ( x ) .
Generally, x m is a linear combination of m elements from a dictionary of X. These algorithms usually choose x 0 = 0 as an initial approximant and select Ω as
Ω : = x X : E ( x ) E ( 0 ) = E ( x 0 ) .
Some greedy-type convex optimization algorithms, such as the Weak Chebychev Greedy Algorithm (WCGA(co)), the Weak Greedy Algorithm with Free Relaxation (WGAFR(co)), and the Relaxed Weak Relaxed Greedy Algorithm (RWRGA(co)), have been successfully used to solve convex optimization problems; see [17,20,21,22]. Furthermore, Gao and Petrova in [10] proposed the Weak Rescaled Pure Greedy Algorithm (WRPGA(co)) for convex optimization in Banach space X. This algorithm is a generation of WRPGA ( X , D ) for m-term approximation. To give the definition of the WRPGA(co), we first recall some concepts and notations.
We assume that the function E is Fréchet-differentiable at any x X . That is, there is a continuous linear functional E ( x ) in Banach space X such that
lim h 0 | E ( x + h ) E ( x ) E ( x ) , h | h X = 0 ,
where E ( x ) , h : = E ( h ) . We call the functional E ( x ) the Fréchet derivative of E. Under the assumptions that E is convex and Fréchet-differentiable, it is well known that the minimizer of E exists and is attainable; see [17]. Throughout this article, we assume that the minimizer of E belongs to the basic sparse class A 1 ( D ) .
We call the convex function E : X R satisfying Condition 0 if E has a Fréchet derivative E ( x ) at every x Ω , and E ( x ) M 0 .
If there exist constants α > 0 , M 1 > 0 , and 1 < q 2 , such that for all x , x with x x X M 1 , and x Ω ,
E ( x ) E ( x ) E ( x ) , x x α x x X q ,
then we say that E satisfies the Uniform Smoothness (US) condition on Ω .
The Weak Rescaled Pure Greedy Algorithm with a weakness sequence { t m } m = 1 ,   0 < t m 1 , and a parameter sequence { μ m } m = 1 satisfying
μ m > max { 1 , α 1 M 0 M 1 1 q } , m = 1 , ,
is defined as follows:
  • Weak Rescaled Pure Greedy Algorithm (WRPGA(co) ( { t m } , { μ m } , D ) )
  • Step 0:
  • Define x 0 w : = 0 . If E ( x 0 w ) = 0 , then finish the algorithm and define x k w : = x 0 w , k 1 .
  • Step m:
  • Assume x m 1 w has been defined and E ( x m 1 w ) 0 . Perform the following steps:
    -
    Choose an element φ m D such that
    | E ( x m 1 w ) , φ m | t m sup φ D | E ( x m 1 w ) , φ | .
    -
    Compute λ m , given by the formula
    λ m : = sign E ( x m 1 w ) , φ m ( α μ m ) 1 q 1 | E ( x m 1 w ) , φ m | 1 q 1 .
    -
    Compute
    x ^ m w : = x m 1 w λ m φ m , b m : = arg min b R E ( b x ^ m w ) .
    -
    Define the next approximant as
    x m w = b m x ^ m w .
  • If E ( x m w ) = 0 , then finish the algorithm and define x k w : = x m w , for k > m .
  • If E ( x m w ) 0 , then proceed to Step m + 1 .
The following convergence rate of the WRPGA(co) was also obtained in [10]:
Theorem 4 
([10]). Let E be a convex function defined on X. It satisfies Condition 0 and the US condition. Moreover, we assume its minimizer x ¯ A 1 ( D ) . Then, the output of the WRPGA(co) ( { t m } , { μ m } , D ) with μ m > max { 1 , α 1 M 0 M 1 1 q } satisfies the following inequality:
E ( x m w ) E ( x ¯ ) α x ¯ A 1 ( D ) q C 1 ( q , α , E ) + j = 1 m ( μ j 1 ) t j μ j q q 1 1 q , m 1 ,
where C 1 ( q , α , E ) = α x ¯ A 1 ( D ) q E ( x 0 w ) E ( x ¯ ) 1 q 1 .
It follows from the definition of the WRPGA(co) ( { t m } , { μ m } , D ) and Theorem 4 that the weakness sequence { t m } m 1 and the parameter sequence { μ m } m 1 are important for the algorithm. The weakness sequence { t m } m 1 allows us to have more freedom to choose element φ m in the dictionary D . The parameter sequence { μ m } gives more choice in how much to advance along the selected element φ m . We remark that the function
f ( μ ) = ( μ 1 ) μ q q 1
is increasing on ( 1 , q ) and decreasing on ( q , ) , with a global maximum at μ = q . Thus, if we set t k = 1 and μ k = q for k = 1 , , m , then WRPGA(co) ( { t m } , { μ m } , D ) satisfies the error
E ( x m w ) E ( x ¯ ) C q q ( q 1 ) 1 q m 1 q ,
where C is a constant. This setting of t k and μ k ensures that the value on the right-hand side of inequality (7) is minimized. For any particular target function E, we can optimize some constants related to parameter μ m to obtain the optimal convergence results in terms of optimal constants. For example, the authors in [23] selected μ = q in Hilbert space as an example.
Very often, noise and errors are inevitable during the algorithm implementation. Thus, taking them into account and exploring their dependence on algorithms are of great importance. If the convergence rate of an algorithm changes relatively little when influenced by noise and errors, then we say that this algorithm exhibits a certain level of stability, which allows us to portray and analyze the behavior of the algorithm in a certain way, providing a theoretical basis for the design and improvement of the algorithm. The study of the stability of greedy algorithms for m-term approximation in Banach spaces began in [24] and was further developed in [25]. In [17,20,22], the stability of WCGA(co) and some relaxed-type greedy algorithms for convex optimization was studied, where computational inaccuracies were allowed in the process of error reduction and biorthogonality.
To the best of our knowledge, no relevant results exist concerning the numerical stability of the WRPGA(co). In the present article, we study the stability of the WRPGA(co). We consider the perturbation value f ε such that
E ( f ε ) inf x Ω E ( x ) + ε
instead of arg min x Ω E ( x ) . Then, we investigate the dependence of WRPGA(co) on the error in the computation of λ m in the second step of the algorithm. We obtain the convergence rates of WRPGA(co) with noise and errors. The results show that WRPGA(co) is stable in some sense, and the associated convergence rates remain independent of the spatial dimension. This addresses real-world scenarios where exact computations are rare, making the algorithm more practically applicable, especially for high-dimensional applications. We verify this conclusion through numerical experiments. Furthermore, we apply WRPGA(co) with errors to the problem of m-term approximation and derive the optimal convergence rate, which indicates its flexibility and wide utility across applied mathematics and signal processing. Notably, the condition we use diverges from that used in [10]. We just use one condition: that ρ ( E , Ω , u ) has the power type 1 < q 2 . However in [10], the authors need to use not only the US condition but also additional conditions to ensure that the US condition still holds for the sequence generated from WRPGA(co). This is also one of the strengths of our article.
We organize the article as follows: In Section 2, we state the results for the convergence rates of WRPGA(co) ( { t m } , { μ m } , D ) with noise and errors. In Section 3, we apply WRPGA(co) ( { t m } , { μ m } , D ) with errors to m-term approximation. In Section 4, we provide two numerical examples to illustrate the stability of WRPGA(co). In Section 5, we make some concluding remarks and discuss future work.

2. Convergence Rate of WRPGA(co) with Noise and Error

In this section, we will discuss the performance of WRPGA(co) in the presence of noise and computational errors. We first recall some notions. The modulus of smoothness, ρ ( E , Ω , u ) , of E on the set Ω is defined as
ρ ( E , Ω , u ) : = 1 2 sup x Ω , y X = 1 | E ( x + u y ) + E ( x u y ) 2 E ( x ) | .
Clearly, ρ ( E , Ω , u ) is an even function with respect to u. If ρ ( E , Ω , u ) = o ( u ) as u 0 , then we call E uniformly smooth on Ω . It is known from [23] that the condition that
ρ ( E , Ω , u ) α u q , 1 < q 2 ,
is closely related to the US condition (6). However, the US condition on the set Ω is not sufficient to derive the convergent result. It also requires Condition 0 so that the output sequence { x m w } of the algorithm remains in Ω , where the US condition holds. Thus, we use condition (8).
Now, we discuss the performance of WRPGA(co) with data noise. Suppose that arg min x Ω E ( x ) is influenced by noise. That is, in the implementation of the algorithm, we consider the perturbation value f ε such that
E ( f ε ) inf x Ω E ( x ) + ε , ε 0 .
We obtain the following convergence rates of WRPGA(co) in that case:
Theorem 5. 
Let E : X R be a uniformly smooth Fréchet-differentiable convex function with ρ ( E , Ω , u ) α u q , 1 < q 2 . Take ε 0 and f ε Ω such that
E ( f ε ) inf x Ω E ( x ) + ε , f ε / A A 1 ( D , 1 ) ,
with some number A > 0 . Then, for WRPGA(co) ( { t m } , { μ m } , D ) with μ m > 2 , we have
E ( x m w ) inf x Ω E ( x ) max 2 ε , α A q k = 1 m ( μ k 2 ) t k μ k q q 1 1 q .
Taking t k = t and μ k = μ for k = 1 , , m in Theorem 5, we obtain
E ( x m w ) inf x Ω E ( x ) max 2 ε , α A q t q μ q ( μ 2 ) 1 q m 1 q .
It is clear that the function
g ( μ ) = μ q ( μ 2 ) 1 q
is decreasing on ( 2 , 2 q ) and increasing on ( 2 q , ) , with a global minimum at μ = 2 q . Thus, if we set t k = 1 and μ k = 2 q for k = 1 , , m , WRPGA(co) satisfies
E ( x m w ) inf x Ω E ( x ) max 2 ε , 2 α A q q q ( q 1 ) 1 q m 1 q .
This setting of t k and μ k ensures that the value on the right-hand side of inequality (9) is minimized.
Theorem 5 implies the following result for the noiseless case ( ε = 0 ):
Corollary 1. 
Let the convex function E : X R be uniformly smooth and Fréchet-differentiable with ρ ( E , Ω , u ) α u q , 1 < q 2 . If arg min x Ω E ( x ) A 1 ( D , 1 ) , then for WRPGA(co) ( { t m } , { μ m } , D ) with μ m > 2 , the following inequality holds:
E ( x m w ) inf x Ω E ( x ) α k = 1 m ( μ k 2 ) t k μ k q q 1 1 q .
Corollary 1 gives the convergence rate of WRPGA(co) with noiseless data, which is different from the result in Theorem 4. It is known from [23] that the US Condition is equivalent to the condition ρ ( E , Ω , u ) α u q in Hilbert spaces. Theorem 5 and Corollary 1 do not require Condition 0. From this point of view, our assumption is weaker than that of Theorem 4 for general Banach spaces. By choosing an appropriate μ k , we can achieve the same error as Theorem 4.
To prove Theorem 5, we first recall some lemmas.
Lemma 1 
([18]). Let E be a convex function on X. Assume that it is Fréchet-differentiable. Then, for every x Ω , y X , and u R ,
0 E ( x + u y ) E ( x ) u E ( x ) , y 2 ρ ( E , Ω , u y X ) .
Lemma 2 
([2]). For any F X * and any dictionary D of X, we have
sup φ D F , φ = sup ϕ A 1 ( D , 1 ) F , ϕ .
Lemma 3 
([18]). Let E be a convex function on X. Assume that it is Fréchet-differentiable at every point in Ω. Then, for each x Ω and x X ,
E ( x ) , x x E ( x ) E ( x ) .
Lemma 4 
([4]). Let B > 0 , r > 0 , and > 0 and let { a m } and { r m } be sequences of positive numbers satisfying
a J B , a m a m 1 1 r m r a m 1 , m = J + 1 , J + 2 , .
Then, the following inequality holds:
a m max { 1 , 1 / } r 1 / r B + k = J + 1 m r k 1 / , m = J + 1 , J + 2 , .
The following lemma was stated in [21] without proof. Moreover, we have not found the proof of this lemma in other studies. For convenience, we provide a proof here.
Lemma 5. 
Let E be a uniformly smooth, Fréchet-differentiable convex function on X. Let L be a subspace of X with finite dimension. If x * Ω satisfies
E ( x * ) = min y L E ( y ) ,
then, for all y L , we have E ( x * ) , y = 0 .
Proof. 
We prove the lemma by contradiction. Suppose that there exists a y L such that y X = 1 and E ( x * ) , y = β > 0 . Note that x * Ω . Then, applying Lemma 1 to y and u with u > 0 , we have
E ( x * u y ) E ( x * ) + u E ( x * ) , y 2 ρ ( E , Ω , u ) ,
which implies
E ( x * ) E ( x * u y ) + u E ( x * ) , y 2 ρ ( E , Ω , u ) = E ( x * u y ) + u β 2 ρ ( E , Ω , u ) .
Since ρ ( E , Ω , u ) = o ( u ) , we can find u ˜ > 0 such that u ˜ β 2 ρ ( E , Ω , u ˜ ) > 0 . Then, we have
E ( x * ) > E ( x * u ˜ y ) .
This inequality contradicts the assumption that E reaches its minimum at x * in space L. Thus, the proof of this lemma is finished. □
Proof of Theorem 5. 
It is known from the definition of WRPGA(co) ( { t m } , { μ m } , D ) that
E ( x m w ) E ( x ^ m w ) = E ( x m 1 w λ m φ m ) , m 1 .
Then, applying Lemma 1 to (11) with x = x m 1 w , u = | λ m | , and y = sign { λ m } φ m , we obtain
E ( x m 1 w λ m φ m ) E ( x m 1 w ) + | λ m | E ( x m 1 w ) , sign { λ m } φ m + 2 ρ E , Ω , | λ m | φ m X E ( x m 1 w ) λ m E ( x m 1 w ) , φ m + 2 α | λ m | q ,
where we have used the fact that φ m X = 1 . Substituting
λ m = sign E ( x m 1 w ) , φ m ( α μ m ) 1 q 1 | E ( x m 1 w ) , φ m | 1 q 1
into
E ( x m 1 w ) λ m E ( x m 1 w ) , φ m + 2 α | λ m | q ,
after a simple calculation, we obtain
E ( x m 1 w λ m φ m ) E ( x m 1 w ) μ m 2 μ m ( α μ m ) 1 q 1 | E ( x m 1 w ) , φ m | q q 1 .
Combining (11) with (12), we have
E ( x m w ) E ( x m 1 w ) μ m 2 μ m ( α μ m ) 1 q 1 | E ( x m 1 w ) , φ m | q q 1 .
Thus, the sequence { E ( x m w ) } m = 0 is decreasing. If E ( x m 1 w ) E ( f ε ) 0 , then from the monotonicity of { E ( x m w ) } , we have E ( x m w ) E ( f ε ) 0 , and hence inequality (9) holds.
We proceed to the case that E ( x m 1 w ) E ( f ε ) > 0 . We first estimate the lower bound of | E ( x m 1 w ) , φ m | . From the choice of φ m and Lemma 2, we have
| E ( x m 1 w ) , φ m | t m sup φ D | E ( x m 1 w ) , φ | = t m sup ϕ A 1 ( D , 1 ) | E ( x m 1 w ) , ϕ | t m A 1 | E ( x m 1 w ) , f ε | .
Applying Lemma 5 to x m 1 w with L = span { x ^ m 1 w } , we have E ( x m 1 w ) , x m 1 w = 0 . Then, from Lemma 3, we have
| E ( x m 1 w ) , f ε | = | E ( x m 1 w ) , f ε x m 1 w | E ( x m 1 w ) E ( f ε ) .
Combining (13) with (14) and (15), we obtain
E ( x m w ) E ( x m 1 w ) μ m 2 μ m ( α μ m ) 1 q 1 t m q q 1 A q q 1 E ( x m 1 w ) E ( f ε ) q q 1 .
Subtracting E ( f ε ) from both sides of (16), we derive
E ( x m w ) E ( f ε ) E ( x m 1 w ) E ( f ε ) μ m 2 μ m ( α μ m ) 1 q 1 t m q q 1 A q q 1 E ( x m 1 w ) E ( f ε ) q q 1 .
For each m 0 , we define
a m = E ( x m w ) E ( f ε ) .
Then, the sequence { a m } m = 0 is decreasing and
a m a m 1 μ m 2 μ m ( α μ m ) 1 q 1 t m q q 1 A q q 1 a m 1 q q 1 = a m 1 1 μ m 2 μ m ( α μ m ) 1 q 1 t m q q 1 A q q 1 a m 1 1 q 1 .
From the above discussion, it is sufficient to consider the case that a 0 > 0 . The details are as follows:
Case 1:  0 < a 0 < α μ 1 A q μ 1 μ 1 2 q 1 t 1 q . It follows from the monotonicity of the sequence { a m } m = 0 that either all elements of { a m } m = 0 belong to 0 , α μ 1 A q μ 1 μ 1 2 q 1 t 1 q , or for some m * 1 , it holds that a m * 0 . Then, for m m * , inequality (9) clearly holds.
Applying Lemma 4 to all positive numbers in { a m } m = 0 with J = 1 , = 1 q 1 , r = ( α A q ) 1 q 1 ,
B = α μ 1 A q μ 1 μ 1 2 q 1 t 1 q and r m = ( μ m 2 ) t m μ m q q 1 ,
we have
a m α A q k = 1 m ( μ k 2 ) t k μ k q q 1 1 q ,
which implies that
E ( x m w ) inf x Ω E ( x ) ε + α A q k = 1 m ( μ k 2 ) t k μ k q q 1 1 q .
Hence, inequality (9) holds.
Case 2:  a 0 α μ 1 A q μ 1 μ 1 2 q 1 t 1 q . Taking m = 1 in inequality (17), we obtain
a 1 a 0 1 μ 1 2 μ 1 ( α μ 1 ) 1 q 1 t 1 q q 1 A q q 1 a 0 1 q 1 .
Hence, a 1 0 . Combining this with the monotonicity of { a m } m = 0 gives inequality (9). □
Proof of Corollary 1. 
If arg min x Ω E ( x ) A 1 ( D , 1 ) , then A = 1 and ε = 0 . Thus, Corollary 1 follows from Theorem 5. □
Now, we compare the performance of WRPGA(co) with the Weak Chebyshev Greedy Algorithm with errors (WCGA( Δ ,co)). Dereventsov and Temlyakov in [17] investigated the numerical stability of WCGA( Δ ,co) and established the corresponding convergence rate with data noise. We recall the definition of WCGA( Δ ,co). Let τ = { t m } m = 1 , t m ( 0 , 1 ] be a weakness sequence and Δ : = { δ m , ε m } m = 1 be an error sequence with δ m [ 0 , 1 ] and ε m 0 for m = 1 , 2 , . The WCGA( Δ ,co) is defined as follows:
  • Weak Chebyshev Greedy Algorithm with errors (WCGA( Δ ,co))
  • Set x 0 c = 0 , and for each m 1 , perform the following steps:
(1)
Select φ m c D satisfying
E ( x m 1 c ) , φ m c t m sup φ D E ( x m 1 c ) , φ .
(2)
Denote Φ m = span { φ j c } j = 1 m and define the next approximant x m c from Φ m such that
E ( x m c ) inf x Φ m E ( x ) + δ m .
(3)
The approximant x m c satisfies
| E ( x m c ) , x m c | ε m .
Note that in the definition of the WCGA( Δ ,co), the dictionary D must be symmetric; that is, if φ D , then φ D . It is well known that the WCGA(co) has the best approximation properties; see [26] for example. Let C 0 be a constant such that E ( x m c ) E ( 0 ) + C 0 . Assume that the set
Ω Ω 1 : = { x X : E ( x ) E ( 0 ) + C 0 } X
is bounded. Under this assumption, the authors in [17] obtained the following convergence rate of the WCGA( Δ ,co):
Theorem 6 
([17]). Let E be a convex function with ρ ( E , Ω , u ) α u q , 1 < q 2 . Take an element f ε Ω and a number ε 0 such that
E ( f ε ) inf x Ω 1 E ( x ) + ε , f ε / A ( ε ) A 1 ( D , 1 ) ,
with some number A ( ε ) 1 . Then, for the WCGA(Δ,co) with a constant weakness sequence τ = t ( 0 , 1 ] and an error sequence Δ = { δ m , ε m } m = 1 with δ m + ε m c m q , m = 1 , 2 , , we have
E ( x m c ) inf x Ω 1 E ( x ) ε + C ( E , q , α , t , c ) A q m 1 q .
Notice that when δ m = ε m = 0 for all m 1 , the WCGA( Δ ,co) reduces to the WCGA(co). Theorem 6 shows that the WCGA( Δ ,co) maintains the optimal convergence rate O ( m 1 q ) despite the presence of noise and computational errors.
Comparing (10) with (18), we see that the convergence rate of the WRPGA(co) with data noise is the same as that of the WCGA( Δ ,co). We remark that the rate m 1 q is independent of the dimension of the space X. Moreover, the WRPGA(co) has less computational complexity than the WCGA(co). It involves only one one-dimensional optimization problem rather than an m-dimensional optimization problem for the WCGA(co).
Next, we discuss the performance of the WRPGA(co) in the presence of computational errors. There are some results on the WRPGA for m-term approximation in this direction. In [4], Petrova investigated the WRPGA(H, D ) with some errors in the computation of the inner product f f m 1 , φ m . That is, in Step m of the WRPGA ( H , D ) , the coefficient λ m is computed according to the following formula:
λ m : = ( 1 + ε m ) f f m 1 , φ m , | ε m | < 1 .
They obtained that for any f A 1 ( D ) , the output { f m } m 0 of the WRPGA ( H , D ) with errors { ε m } m 1 satisfies
f f m H f A 1 ( D ) j = 1 m ( 1 ε j 2 ) t j 2 1 / 2 , m = 1 , 2 , .
Jiang et al. in [27] studied the stability of the WRPGA ( X , D ) for m-term approximation, which allows for computational inaccuracies in calculating the coefficient λ m . That is, λ m is calculated based on
λ m : = ( 1 + ε m ) sign { F f f m 1 ( φ m ) } f f m 1 X ( 2 γ q ) 1 1 q | F f f m 1 ( φ m ) | 1 q 1 , | ε m | < 1 .
They called this algorithm the Approximate Weak Relaxed Pure Greedy Algorithm (AWRPGA ( X , D ) ). Moreover, they derived the following convergence rate of the AWRPGA ( X , D ) .
Theorem 7 
([27]). Let X be a Banach space with ρ ( u ) γ u q , 1 < q 2 . For all f A 1 ( D ) , the AWRPGA ( X , D ) satisfies
f f m X ( 2 γ q ) 1 q f A 1 ( D ) k = 1 m t k q q 1 1 + ε k 1 q ( 1 + ε k ) q 1 / q 1 , m = 1 , 2 , .
More generally, we consider the dependence of the WRPGA(co) on the computational error in the second step of each iteration. We assume that in Step m of the algorithm, the coefficient λ m ε m is calculated by using the following formula:
λ m ε m : = ( 1 + ε m ) sign E ( x m 1 w ) , φ m ( α μ m ) 1 q 1 | E ( x m 1 w ) , φ m | 1 q 1 , | ε m | < 1 .
Such an algorithm is an approximate version of the WRPGA(co), denoted by AWRPGA(co). For simplicity, we use the notation E ( x ¯ ) = inf x X E ( x ) = inf x Ω E ( x ) . We obtain the following error bound for the AWRPGA(co):
Theorem 8. 
Let the convex function E : X R be uniformly smooth and Fréchet-differentiable with minimizer x ¯ A 1 ( D ) and ρ ( E , Ω , u ) α u q , 1 < q 2 . Assume that the error sequence { ε m } m = 1 satisfies | ε m | < 1 for any m 1 . Then, for the AWRPGA(co) with
μ m > max { 1 , 2 ( 1 + ε m ) q 1 } ,
it holds that
E ( x m w ) E ( x ¯ ) α x ¯ A 1 ( D ) q C ( α , q , E ) + k = 1 m ( μ k 2 ( 1 + ε k ) q 1 ) ( 1 + ε k ) t k μ k q q 1 1 q ,
where C ( α , q , E ) = α x ¯ A 1 ( D ) q E ( x 0 w ) E ( x ¯ ) 1 q 1 .
Proof. 
Just like in the proof of inequality (13), we obtain
E ( x m w ) E ( x m 1 w ) 1 2 ( 1 + ε m ) q 1 μ m ( 1 + ε m ) ( α μ m ) 1 q 1 | E ( x m 1 w ) , φ m | q q 1 .
Note that μ m > 2 ( 1 + ε m ) q 1 . Thus,
0 < 1 2 ( 1 + ε m ) q 1 μ m ( 1 + ε m ) < 1 + ε m < 2 .
Then, we estimate the lower bound of | E ( x m 1 w ) , φ m | . Since x ¯ A 1 ( D ) , for any fixed ε > 0 , x ¯ can be represented as x ¯ = φ D c φ φ , with
φ D | c φ | < x ¯ A 1 ( D ) + ε .
Applying Lemma 5 to x m 1 w with L = span { x ^ m 1 w } , we have E ( x m 1 w ) , x m 1 w = 0 . Therefore,
E ( x m 1 w ) , x m 1 w x ¯ = E ( x m 1 w ) , x ¯ = φ D c φ E ( x m 1 w ) , φ t m 1 | E ( x m 1 w ) , φ m | φ D | c φ | < t m 1 ( x ¯ A 1 ( D ) + ε ) | E ( x m 1 w ) , φ m | .
Let ε 0 . We have
E ( x m 1 w ) , x m 1 w x ¯ t m 1 x ¯ A 1 ( D ) | E ( x m 1 w ) , φ m | .
Then, by using Lemma 3, we obtain
E ( x m 1 w ) E ( x ¯ ) E ( x m 1 w ) , x m 1 w x ¯ t m 1 x ¯ A 1 ( D ) | E ( x m 1 w ) , φ m | ,
and hence
| E ( x m 1 w ) , φ m | t m x ¯ A 1 ( D ) 1 E ( x m 1 w ) E ( x ¯ ) .
Combining (19) with (20), we derive
E ( x m w ) E ( x m 1 w ) μ m 2 ( 1 + ε m ) q 1 α x ¯ A 1 ( D ) q 1 q 1 ( 1 + ε m ) t m μ m q q 1 E ( x m 1 w ) E ( x ¯ ) q q 1 .
Thus, it holds that
E ( x m w ) E ( x ¯ ) ( E ( x m 1 w ) E ( x ¯ ) ) ( 1 μ m 2 ( 1 + ε m ) q 1 α x ¯ A 1 ( D ) q 1 q 1 × ( 1 + ε m ) t m μ m q q 1 ( E ( x m 1 w ) E ( x ¯ ) ) 1 q 1 ) .
Applying Lemma 4 to the sequence { E ( x m w ) E ( x ¯ ) } m = 0 with J = 0 , B = E ( x 0 w ) E ( x ¯ ) ,   = 1 q 1 ,
r m = ( μ m 2 ( 1 + ε m ) q 1 ) ( 1 + ε m ) t m μ m q q 1 and r = α x ¯ A 1 ( D ) q 1 q 1 ,
we obtain
E ( x m w ) E ( x ¯ ) α x ¯ A 1 ( D ) q ( α x ¯ A 1 ( D ) q E ( x 0 w ) E ( x ¯ ) 1 q 1 + k = 1 m ( μ k 2 ( 1 + ε k ) q 1 ) ( 1 + ε k ) t k μ k q q 1 ) 1 q .
Thus, the proof is completed. □
By limiting the error ε m to a certain range [ a , b ] ( 1 , 1 ) , we derive the following corollary from Theorem 8:
Corollary 2. 
Let the convex function E : X R be uniformly smooth and Fréchet-differentiable with minimizer x ¯ A 1 ( D ) and ρ ( E , Ω , u ) α u q , 1 < q 2 . For any a , b ( 1 , 1 ) , let ε m [ a , b ] ( 1 , 1 ) for all m 1 . Then, for the AWRPGA(co) with μ m > max { 1 , 2 q ( 1 + ε m ) q 1 } , it holds that
E ( x m w ) E ( x ¯ ) α · c 1 1 q x ¯ A 1 ( D ) q C ( α , q , E ) + k = 1 m ( μ k 2 ( 1 + a ) q 1 ) t k μ k q q 1 1 q ,
where c 1 = min 1 , 1 + a and C ( α , q , E ) = α x ¯ A 1 ( D ) q E ( x 0 w ) E ( x ¯ ) 1 q 1 .
Proof. 
Since ε k [ a , b ] ( 1 , 1 ) and μ k > max { 1 , 2 q ( 1 + ε k ) q 1 } , we have μ k > 2 q ( 1 + b ) q 1 . It is easy to check that the minimum of the function
ψ ( x ) = μ k x 2 x q , x [ 1 + a , 1 + b ]
is ψ ( 1 + a ) . Taking x = 1 + ε k , we have
μ k ( 1 + ε k ) 2 ( 1 + ε k ) q μ k ( 1 + a ) 2 ( 1 + a ) q .
Then, from Theorem 8, we have
E ( x m w ) E ( x ¯ ) α · c 1 1 q x ¯ A 1 ( D ) q C ( α , q , E ) + k = 1 m ( μ k 2 ( 1 + a ) q 1 ) t k μ k q q 1 1 q ,
where c 1 = min 1 , 1 + a and C ( α , q , E ) = α x ¯ A 1 ( D ) q E ( x 0 w ) E ( x ¯ ) 1 q 1 , which completes the proof of Corollary 2. □
Observe that the above rate of convergence of the AWRPGA(co) is independent of the dimension of the space X. Comparing the convergence rates in Theorem 4 with those in Corollary 2, we can see that when the noise amplitude changes relatively little, the WRPGA(co) is stable.
We remark that when the US condition (6) holds on the whole domain X, Condition 0 in Theorem 4 is not necessary. In that case, we obtain the following theorem:
Theorem 9. 
Let the convex function E : X R be uniformly smooth and Fréchet-differentiable with minimizer x ¯ A 1 ( D ) . Assume that there are constants α > 0 and 1 < q 2 such that for all x , x X ,
E ( x ) E ( x ) E ( x ) , x x α x x X q .
Given an error sequence { ε m } m = 1 that satisfies | ε m | < 1 for any m 1 , for the AWRPGA(co) with μ m > max { 1 , ( 1 + ε m ) q 1 } , the inequality
E ( x m w ) E ( x ¯ ) α x ¯ A 1 ( D ) q C ( α , q , E ) + k = 1 m ( μ k ( 1 + ε k ) q 1 ) ( 1 + ε k ) t k μ k q q 1 1 q
holds with C ( α , q , E ) = α x ¯ A 1 ( D ) q E ( x 0 w ) E ( x ¯ ) 1 q 1 .
Proof. 
Taking x = x m 1 w λ m ε m φ m and x = x m 1 w in (21), we have
E ( x ^ m w ) = E ( x m 1 w λ m ε m φ m ) E ( x m 1 w ) λ m ε m E ( x m 1 w ) , φ m + α | λ m ε m | q = E ( x m 1 w ) 1 ( 1 + ε m ) q 1 μ m ( 1 + ε m ) ( α μ m ) 1 q 1 | E ( x m 1 w ) , φ m | q q 1 ,
where we have used the definition of λ m ε m and the assumption φ m X = 1 .
Note that E ( x m w ) E ( x ^ m w ) . Thus, using (22), we derive
E ( x m w ) E ( x m 1 w ) 1 ( 1 + ε m ) q 1 μ m ( 1 + ε m ) ( α μ m ) 1 q 1 | E ( x m 1 w ) , φ m | q q 1 .
The remaining part of the proof of this theorem is similar to that of Theorem 8. Here, we omit it. □
Note that when ε m = 0 , the above convergence rate is consistent with that of Theorem 4. Moreover, the parameter { μ m } just satisfies μ m > 1 instead of
μ m > max { 1 , α 1 M 0 M 1 1 q } .
It follows from Theorems 5, 8, and 9 and Corollaries 1 and 2 that the convergence rates of the WRPGA(co) do not change significantly despite the presence of noise and computational errors. This assertion is crucial for the practical implementation of the algorithm, which implies that the WRPGA(co) has certain robustness against noise and errors.

3. Convergence Rate of the AWRPGA ( { μ m } , X )

In this section, we will apply the AWRPGA(co) to solve the problem of m-term approximation with respect to dictionaries in Banach spaces. Let X be a Banach space and x ¯ be a fixed element in X. If we set E ( x ) : = x ¯ x X in convex optimization problem (5), then problem (5) reduces to an m-term approximation problem. Similarly, the discussion about the stability of the greedy-type convex optimization algorithms reduces to that of the greedy algorithm for m-term approximation. Since the function E ( x ) : = x ¯ x X is not a uniformly smooth function in the sense of the smoothness of a convex function, we consider E ( x , q ) : = x ¯ x X q with 1 < q 2 . Recall that the authors in [28] pointed out that if the modulus of smoothness, ρ ( u ) , of X defined as (4) satisfies ρ ( u ) γ u q , 1 < q 2 , then E ( x , q ) is uniformly smooth and satisfies ρ ( E ( · , q ) , Ω , u ) α u q . It is known from [2] that the functional F x defined as (3) is the derivative of the norming function E ( x ) : = x X . Then,
E ( x , q ) = q x ¯ x X q 1 F x ¯ x .
Thus, the conditions of Theorem 8 are satisfied.
In this case, the AWRPGA(co) is reduced to the AWRPGA ( { μ m } , X ) , which is defined as follows.
  • AWRPGA ( { μ m } , X )
  • Step 0:
  • Define x 0 w : = 0 . If x ¯ = 0 , then finish the algorithm and define x k w : = x 0 w = x ¯ , k 1 .
  • Step m:
  • Assume x m 1 w has been defined and x m 1 w x ¯ . Perform the following steps:
    -
    Choose an element φ m D such that
    | F x ¯ x m 1 w ( φ m ) | t m sup φ D | F x ¯ x m 1 w ( φ ) | .
    -
    Compute λ m , given by the formula
    λ m : = ( 1 + ε m ) sign F x ¯ x m 1 w ( φ m ) ( α μ m ) 1 q 1 q 1 q 1 x ¯ x m 1 w X | F x ¯ x m 1 w ( φ m ) | 1 q 1 .
    -
    Compute
    x ^ m w : = x m 1 w λ m φ m .
    -
    Choose b m such that
    x ¯ b m x ^ m w X q = min b R x ¯ b x ^ m w X q
    and define the next approximant as
    x m w = b m x ^ m w .
  • If x m w = x ¯ , then finish the algorithm and define x k w : = x m w = x ¯ , for k > m .
  • If x m w x ¯ , then proceed to Step m + 1 .
It is clear that the AWRPGA generates an m-term approximation of x ¯ after m iterations. Applying Theorem 8 to E ( x , q ) , we obtain the following convergence rate of this new algorithm.
Theorem 10. 
Let x ¯ A 1 ( D ) X and, for suitable γ, ρ ( u ) satisfy ρ ( u ) γ u q , 1 < q 2 . Assume that the error sequence { ε m } m = 1 satisfies | ε m | < 1 for any m 1 . Then, for the AWRPGA ( { μ m } , X ) with μ m > max { 1 , 2 ( 1 + ε m ) q 1 } , it holds that
x ¯ x m w X α 1 / q x ¯ A 1 ( D ) α 1 q 1 + k = 1 m ( μ k 2 ( 1 + ε k ) q 1 ) ( 1 + ε k ) t k μ k q q 1 1 q 1 .
When t m = 1 for all m 1 , the AWRPGA ( { μ m } , X ) reduces to the ARPGA ( { μ m } , X ) . Taking t m = 1 , μ m = μ , ε m = ε 0 , and m = 1 , 2 , , we obtain the following convergence rate of the ARPGA ( μ , X ) with error ε 0 .
Theorem 11. 
Let x ¯ A 1 ( D ) X and, for suitable γ, ρ ( u ) satisfy ρ ( u ) γ u q , 1 < q 2 . Assume that the error sequence { ε m } m = 1 satisfies ε m = ε 0 for all m 1 with | ε 0 | < 1 . Then, for the ARPGA ( μ , X ) with μ > max { 1 , 2 ( 1 + ε 0 ) q 1 } , it holds that
x ¯ x m w X α 1 / q x ¯ A 1 ( D ) α 1 q 1 + ( μ 2 ( 1 + ε 0 ) q 1 ) ( 1 + ε 0 ) μ q q 1 m 1 q 1 . C · x ¯ A 1 ( D ) m 1 / q 1 .
We proceed to remark that the rate of Theorem 11 is sharp. We take Lebesgue spaces as examples. Let ( O , M , ν ) be a measure space; that is, ν is a measure on the σ -algebra M of subsets of O. For 1 < p < , define the Lebesgue space L p ( O , M , ν ) as
L p ( O , M , ν ) : = f is measurable : f p = O | f | p d ν 1 / p .
It is known from [29] that the modulus of smoothness of L p ( O , M , ν ) satisfies
ρ ( u ) u p p , 1 < p 2 , ( p 1 ) u 2 2 , 2 < p < .
Applying Theorem 11 to L p ( O , M , ν ) , we obtain the following theorem.
Theorem 12. 
Let x ¯ A 1 ( D ) L p ( O , M , ν ) , 1 < p < . Assume that the error sequence { ε m } m = 1 satisfies ε m = ε 0 for all m 1 with | ε 0 | < 1 . Then, for the ARPGA ( μ , L p ( O , M , ν ) ) with μ > max { 1 , 2 ( 1 + ε 0 ) min { p , 2 } 1 } , it holds that
x ¯ x m w p α 1 / min { p , 2 } x ¯ A 1 ( D ) ( α 1 min { p , 2 } 1 + ( μ 2 ( 1 + ε 0 ) min { p , 2 } 1 ) × ( 1 + ε 0 ) μ min { p , 2 } min { p , 2 } 1 m ) 1 min { p , 2 } 1 C x ¯ A 1 ( D ) m 1 min { p , 2 } 1 .
Jiang et al. in [27] have shown that the rate m 1 min { p , 2 } 1 is sharp.
Now, we apply the ARPGA ( { μ m } , X ) to Hilbert spaces. When X = H , a Hilbert space H with the inner product · , · and the norm · H = · , · 1 / 2 , the AWRPGA ( { μ m } , H ) takes the following form:
  • Step 0:
  • Define x 0 w : = 0 . If x ¯ = 0 , then finish the algorithm and define x k w : = x 0 w = x ¯ , k 1 .
  • Step m:
  • Assume x m 1 w has been defined and x m 1 w x ¯ . Perform the following steps:
    -
    Choose an element φ m D such that
    | x m 1 w x ¯ , φ m | t m sup φ D | x m 1 w x ¯ , φ | .
    -
    Compute λ m ε m , given by the formula
    λ m ε m : = 2 μ m ( 1 + ε m ) x m 1 w x ¯ , φ m .
    -
    Compute
    x ^ m w : = x m 1 w λ m ε m φ m , b m : = x ¯ , x ^ m w x ^ m w H 2 .
    -
    Define the next approximant as
    x m w = b m x ^ m w .
  • If x m w = x ¯ , then finish the algorithm and define x k w : = x m w = x ¯ , for k > m .
  • If x m w x ¯ , then proceed to Step m + 1 .
For the AWRPGA ( { μ m } , H ) , we obtain the following corollary from Theorem 9:
Corollary 3. 
Let x ¯ A 1 ( D ) H . Assume that the error sequence { ε m } m = 1 satisfies | ε m | < 1 for any m 1 . Then, the output of the AWRPGA ( { μ m } , H ) with
μ m > max { 1 , 1 + ε m }
satisfies the following inequality:
x m w x ¯ H x ¯ A 1 ( D ) 1 + k = 1 m ( μ k ( 1 + ε k ) ) ( 1 + ε k ) t k μ k 2 1 / 2 .
Proof. 
It is well known that E ( x , 2 ) is Fréchet-differentiable on Hilbert space H. Moreover, its Fréchet derivative E ( x ) acts on h H as E ( x ) , h = 2 x x ¯ , h ; see [10] for instance. Therefore, we obtain
E ( x ) E ( x ) E ( x ) , x x = x x ¯ H 2 x x ¯ H 2 2 x x ¯ , x x = x x H 2 .
Thus, E ( x , 2 ) satisfies the conditions of Theorem 9 with α = 1 and q = 2 . Then, Corollary 3 follows from Theorem 9. □
We remark that the result of Corollary 3 for μ m = 2 , m 1 , was obtained in [4,27].

4. Examples and Numerical Results

Observe that the convergence results of the WRPGA(co) in Banach spaces apply to any dictionary, including those in infinite-dimensional Banach spaces. In this case, dictionaries contain infinitely many elements. This is a somewhat surprising result, which shows the power of the WRPGA(co). On the other hand, in practice, we deal with optimization problems in finite high-dimensional spaces. A typical example is sparse signal recovery in compressive sensing. Let X = R N , and for any x X , define its norm by
x 2 = j = 1 N | x j | 2 1 / 2 .
Let x be a d-dimensional compressible vector and Φ R N × d be the measurement matrix with d N . Our goal is to find the best approximation of the unknown x by utilizing Φ and the given data f = Φ x . It is well known that this problem is equivalent to solving the following convex optimization problem on R d :
min x ^ 1 , , x ^ d R f i = 1 d x ^ i φ i 2 ,
where φ 1 , , φ d denote the columns of Φ . Note that the dimension d is larger than N, which means that the column vectors of the measurement matrix Φ are linearly dependent. Thus, the column vectors of Φ can be considered a dictionary of R N . Notably, when d N , solving problem (23) is difficult. Observe that problem (23) can also be considered an approximation problem with a dictionary in space X. Naturally, one can adapt greedy approximation algorithms to handle this problem. It turns out that greedy algorithms are powerful tools for solving problem (23); see [7,30,31,32].
Next, we will present two numerical examples to illustrate the stability of the WRPGA(co) with noise and errors, which are discussed in Section 2 and Section 3. We begin with the first example to verify Theorems 5 and 8 (see Figure 1, Figure 2 and Figure 3). We will randomly generate a Banach space X = p dim and a dictionary D based on a given distribution of parameter p. We will choose a suitable target functional E : X R and then apply the WRPGA(co) to solve the convex optimization problem (5); that is, we seek to find a sparse minimizer to approximate
x ¯ = arg min x X E ( x ) = arg min x Ω E ( x ) .
In this process, we assume that the WRPGA(co) is affected by noise and error. We evaluate the performance of the WRPGA(co) using
E ( x ¯ ) inf x X E ( x ) E ( 0 ) inf x X E ( x ) [ 0 , 1 ] .
Now, we give the details of this example. Let p U ( 1 , 10 ) and X = p 1000 . We consider a dictionary with size 10,000, where every element is the normalized linear combination of the standard basis, { e i } i = 1 1000 , of X and coefficients follow a normal distribution, that is, D = { φ j } j = 1 10000 , where
φ j = i = 1 1000 c j i e i i = 1 1000 | c j i | p 1 / p , c j i N ( 0 , 1 ) .
We choose the target functional E : X R as
E ( x ) = x f 2 2 ,
where f X is randomly generated in the form of
f = k = 1 K f a k φ σ ( k ) , φ σ ( k ) D
with a k N ( 0 , 1 ) , where σ is a permutation of {1, …, 10,000} and K f U ( 100 , 300 ) . Although the example is simple, it is a classical problem in compressed sensing and has many practical backgrounds. We will use the WRPGA(co) to derive an approximate minimizer with a sparsity of 100. The following results are based on 100 simulations.
Figure 1 demonstrates the relationship between the number of iterations of the WRPGA(co) and the function value (24) in data noise and noiseless cases, which verifies Theorem 5. Here, the noise is randomly generated with small amplitude values. In both cases, we randomly choose a parameter μ m slightly larger than 2 for simplicity. It shows that the function value (24) does not change greatly under this noise, which implies that the WRPGA(co) is stable in a sense and will perform well in practice.
Furthermore, we use the WCGA(co) to obtain an approximate minimizer with a sparsity of 100 for the above problem. The corresponding setting is the same as that of the WRPGA(co). In the field of signal processing, the WCGA(co) is known as the Orthogonal Matching Pursuit (OMP). After 100 simulations, we obtained the relationship between the number of iterations of the WCGA(co) and the function value with data noise and that without noise, as shown in Figure 2. Here, the choice of noise is the same as that of the WRPGA(co).
It follows from Figure 1 and Figure 2 that the approximation performance of the WCGA(co) is better than that of the WRPGA(co) even in the presence of noise. However, unlike the WCGA(co), the difference in the function value of the WRPGA(co) with data noise is not huge. This means that the WRPGA(co) is not sensitive to the noise, showing its stability. There is a trade-off between computational complexity and approximation performance.
Figure 3 reveals the relationship between the number of iterations of the AWRPGA(co) and the function value (24) in the presence of computational error and the ideal case, which verifies Theorem 8. Here, the computational error, ε m , is randomly selected from ( 1 , 1 ) . From Figure 3, we know that in the error case, the function value fluctuates but does not change significantly, which indicates that the WRPGA(co) with errors is weakly stable.
Now, we consider the stability of the special AWRPGA ( { μ m } , H ) with t m = 1 and μ m = 2 for all m 1 , that is, the RPGA with computational error { ε m } m 1 , | ε m | < 1 . We will illustrate Corollary 3 with the following numerical experiment on regression learning. One can also refer to [8] for more details about the Rescaled Pure Greedy Algorithms in learning problems.
Let f ( x ) = cos ( π x ) , x [ 1 , 1 ] . We uniformly select 500 sample points from this function. Set x 0 = 1 . Let
X : = x i : x i = i 1 200 , 0 i < 200
be a set of 200 equally spaced points in [ 1 , 1 ] . We choose { e x x i 2 : i = 0 , 1 , , 100 } as a dictionary. We seek to approximate f ( x ) by using the RPGA based on the sampling points { x j ˜ } j = 1 500 . In the implementation of the algorithm, Gaussian noise from N ( o , δ 2 ) with δ 2 = 0.2 is allowed. The function values with noise are used as inputs to the regression. To measure the performance of the algorithm, we use the mean square error (MSE) of f on the unlabeled samples x = { x ˜ j } j = 1 500 , which is defined as
MSE = 1 500 j = 1 500 ( f ( x ˜ j ) f ρ ( x ˜ j ) ) 2 .
To make the performance of the RPGA clear, we also discuss the performance of the OGA. Moreover, we add computational inaccuracies in the implementation of the RPGA. By repeating the test 100 times, we obtain the following Table 1 and Figure 4.
Figure 4 demonstrates that the performance of the OGA on noisy data and noiseless data outperforms that of the RPGA with computational error { ε m } in the regression learning problem. However, according to the running time in Table 1, we observe that the RPGA runs more quickly than the OGA even in the noise case. This is related to the design of the algorithm. The OGA updates the approximant by solving an m-term optimization problem. On the other hand, the RPGA only needs to solve a one-dimensional optimization problem.
It is well known that the above two numerical examples can also be solved by the Least Absolute Shrinkage and Selection Operator (LASSO). The LASSO algorithm introduces sparsity through L 1 -regularization; see [33]. When the design matrix X satisfies certain conditions and the regularization parameter λ is appropriately chosen, the solution is guaranteed to be unique. However, its robustness to noise in practice critically depends on the proper selection of λ . In contrast, greedy algorithms (e.g., OGA) iteratively build sparse solutions with low per-step complexity. While they converge quickly in low-noise scenarios, their inability to revise earlier decisions makes them prone to error accumulation and less robust to noise compared to LASSO; see [34,35].

5. Conclusions and Discussions

It is well-known that the Weak Rescaled Pure Greedy Algorithm for convex optimization (WRPGA(co)) is simpler than other greedy-type convex optimization algorithms, such as the Weak Chebychev Greedy Algorithm (WCGA(co)), the Weak Greedy Algorithm with Free Relaxation (WGAFR(co)), and the Rescaled Weak Relaxed Greedy Algorithm (RWRGA(co)), and has the same approximation properties as them. The rate of convergence of all these algorithms is of the optimal order m 1 q , which is independent of the spatial dimension; see [10,17,21]. In this article, we study the stability of the WRPGA(co) in real Banach spaces and obtain the corresponding convergence rates with noise and error. We use the condition that ρ ( E , Ω , u ) has power type 1 < q 2 instead of the US condition since the US condition of E on Ω is restrictive and requires additional conditions to ensure that the US condition still holds for the sequence generated from the WRPGA(co). Our results show that the WRPGA(co) is stable in the sense that the rate of the convergence of the WRPGA(co) is basically unchanged, which demonstrates that the WRPGA(co) maintains desirable properties even in imperfect computation scenarios. Moreover, we apply the WRPGA(co) with error to the problem of m-term approximation and derive the optimal convergence rate. This indicates that the WRPGA(co) is flexible and can be widely utilized in applied mathematics and signal processing. The numerical simulation results have also verified our theoretical results. The stability of the WRPGA(co), allowing for noise and error, is crucial for practical implementation, which may produce predictable results and guarantee the consistency of the results when the algorithm is implemented repeatedly. This advantage, as well as the advantages in computational complexity and convergence rates, indicate that the WRPGA(co) is efficient in solving high-dimensional and even infinity-dimensional convex optimization problems.
In summary, the novelty of our work is as follows:
  • Under weaker conditions, ρ ( E , Ω , u ) α u q , 1 < q 2 , we show that the WRPGA(co) with noise has the same convergence rate as that of the WRPGA(co) without noise.
  • We introduce and study a new algorithm—the Approximate Weak Rescaled Pure Greedy Algorithm for convex optimization (AWRPGA(co)). We show that the AWRPGA(co) has almost the same convergence rate as that of the WRPGA(co).
  • We apply, for the first time, the WRPGA(co) with error to m-term approximation problems in Banach spaces. The convergence rate we obtained is optimal.
  • We compare the efficiency of the WRPGA(co) with that of the WCGA(co) through numerical experiments. The results show that the WCGA(co) has better approximation performance but higher computational cost than the WRPGA(co).
Next, we discuss some future work. The introduction of the weakness sequence { t m } m 1 , 0 < t m 1 , into the algorithm allows for controlled relaxation in selecting elements from the dictionary, which increases the flexibility of the algorithm. Observe that when t m = 1 for all m 1 , the algorithm can achieve the optimal error, which is sharp as a general result. However, the supremum of greedy condition (2) may not be attainable. Our results provide the same rate of convergence for the weak versions (in the case t m = t ) as for the strong versions ( t m = 1 ) of the algorithms. Thus, taking t m = t may be a good choice. However, there is no general answer to the question of how to choose the optimal t, which relies on the specific convex optimization problem. Recently, some researchers have used some strategies, such as utilizing the a posteriori information gained from previous iterations of the algorithm, to improve the convergence rate of the algorithm; see [36,37]. This inspires us to use this strategy to improve our results. In addition, it is known from [26,38] that research on the WCGA and the WGAFR for m-term approximation in real Banach spaces has been extended to complex Banach spaces. Thus, it would be interesting to extend our study to the setting of complex Banach spaces.
Above all, we will concentrate on the following problems in our future work:
  • For a given convex optimization problem, we will explore the optimal selection of the parameter t.
  • In the implementation of the WRPGA(co), we will take the a posterior information into account to further improve our convergence rates and numerical results.
  • We will investigate the performance of greedy-type algorithms in complex Banach spaces for m-term approximation or convex optimization.

Author Contributions

W.L., M.L., P.Y. and W.Z. contributed equally to this paper. All authors have read and agreed to the published version of the manuscript.

Funding

The work was supported by the National Natural Science Foundation of China (Grant No. 11671213).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors thank the four referees for their suggestions. The suggestions and comments helped the authors to improve the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. DeVore, R.A.; Temlyakov, V.N. Some remarks on greedy algorithms. Adv. Comput. Math. 1996, 5, 173–187. [Google Scholar] [CrossRef]
  2. Temlyakov, V. Greedy Approximation; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
  3. Livshitz, E.D.; Temlyakov, V.N. Two lower estimates in greedy approximation. Constr. Approx. 2003, 19, 509–524. [Google Scholar] [CrossRef]
  4. Petrova, G. Rescaled pure greedy algorithm for Hilbert and Banach spaces. Appl. Comput. Harmon. Anal. 2016, 41, 852–866. [Google Scholar] [CrossRef]
  5. Guo, Q.; Cai, B.L. Learning capability of the rescaled pure greedy algorithm with non-iid sampling. Electron. Res. Arch. 2023, 31, 1387–1404. [Google Scholar] [CrossRef]
  6. Guo, Q.; Liu, X.H.; Ye, P.X. The learning performance of the weak rescaled pure greedy algorithms. J. Inequal. Appl. 2024, 2024, 30. [Google Scholar] [CrossRef]
  7. Li, W.; Ye, P.X. Sparse signal recovery via rescaled matching pursuit. Axioms 2024, 13, 288. [Google Scholar] [CrossRef]
  8. Zhang, W.H.; Ye, P.X.; Xing, S. Optimality of the rescaled pure greedy learning algorithms. Int. J. Wavelets Multiresolut. Inf. Process. 2023, 21, 2250048. [Google Scholar] [CrossRef]
  9. Zhang, W.H.; Ye, P.X.; Xing, S.; Xu, X. Optimality of the approximation and learning by the rescaled pure super greedy algorithms. Axioms 2022, 11, 437. [Google Scholar] [CrossRef]
  10. Gao, Z.; Petrova, G. Rescaled pure greedy algorithm for convex optimization. Calcolo 2019, 56, 15. [Google Scholar] [CrossRef]
  11. Cevher, V.; Becker, S.; Schmidt, M. Convex optimization for big data. IEEE Signal Process. Mag. 2014, 31, 32–43. [Google Scholar] [CrossRef]
  12. Franc, V.; Fikar, O.; Bartos, K.; Sofka, M. Learning data discretization via convex optimization. Mach. Learn. 2018, 107, 333–355. [Google Scholar] [CrossRef]
  13. Jahvani, M.; Guay, M. A distributed convex optimization algorithm with continuous-time communication. In Proceedings of the 2022 IEEE International Symposium on Advanced Control of Industrial Processes (AdCONIP 2022), Vancouver, BC, Canada, 7–9 August 2022; pp. 313–318. [Google Scholar]
  14. Jiang, H.; Li, X.W. Parameter estimation of statistical models using convex optimization. IEEE Signal Process. Mag. 2010, 27, 115–127. [Google Scholar] [CrossRef]
  15. Rao, M.; Rini, S.; Goldsmith, A. Distributed convex optimization with limited communications. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 4604–4608. [Google Scholar]
  16. Yubai, K. Development of data-based controller synthesis by convex optimization. Electr. Commun. Jpn. 2019, 102, 27–31. [Google Scholar] [CrossRef]
  17. Dereventsov, A.; Temlyakov, V. Biorthogonal greedy algorithms in convex optimization. Appl. Comput. Harmon. Anal. 2022, 60, 489–511. [Google Scholar] [CrossRef]
  18. Temlyakov, V. Greedy expansions in convex optimization. Proc. Steklov Inst. Math. 2014, 284, 244–262. [Google Scholar] [CrossRef]
  19. Zhang, T. Sequential greedy approximation for certain convex optimization problems. IEEE Trans. Inf. Theory 2003, 49, 682–691. [Google Scholar] [CrossRef]
  20. DeVore, R.; Temlyakov, V. Convex optimization on Banach spaces. Found. Comput. Math. 2016, 16, 369–394. [Google Scholar] [CrossRef]
  21. Temlyakov, V. Greedy approximation in convex optimization. Constr. Approx. 2015, 41, 269–296. [Google Scholar] [CrossRef]
  22. Temlyakov, V. Convergence and rate of convergence of some greedy algorithms in convex optimization. Tr. Mat. Inst. Steklova 2016, 293, 333–345. [Google Scholar] [CrossRef]
  23. Nguyen, H.; Petrova, G. Greedy strategies for convex optimization. Calcolo 2017, 54, 207–224. [Google Scholar] [CrossRef]
  24. Temlyakov, V. Greedy type algorithms in Banach spaces and applications. Constr. Approx. 2005, 21, 257–292. [Google Scholar] [CrossRef]
  25. Dereventsov, A. On the approximate weak chebyshev greedy algorithm in uniformly smooth Banach spaces. J. Math. Anal. Appl. 2016, 436, 288–304. [Google Scholar] [CrossRef]
  26. Dilworth, S.; Garrigós, G.; Hernández, E.; Kutzarova, D.; Temlyakov, V.N. Lebesgue-type inequalities in greedy approximation. J. Funct. Anal. 2021, 280, 108885. [Google Scholar] [CrossRef]
  27. Jiang, B.; Ye, P.X.; Li, W.; Lu, M. Error bounds of Approximate Weak Rescaled Pure Greedy Algorithms. Int. J. Wavelets Multiresolut. Inf. Process. 2025, 23, 2540060. [Google Scholar] [CrossRef]
  28. Borwein, J.; Guirao, A.J.; Hajek, P.; Vanderwerff, J. Uniformly convex functions on Banach spaces. Proc. Am. Math. Soc. 2009, 137, 1081–1091. [Google Scholar] [CrossRef]
  29. Temlyakov, V.N. Greedy algorithms in Banach spaces. Adv. Comput. Math. 2001, 14, 277–292. [Google Scholar] [CrossRef]
  30. Davis, G.; Mallat, S.; Avellaneda, M. Adaptive greedy approximations. Constr. Approx. 1997, 13, 57–98. [Google Scholar] [CrossRef]
  31. Shao, C.F.; Wei, X.J.; Ye, P.X.; Xing, S. Efficiency of orthogonal matching pursuit for group sparse recovery. Axioms 2023, 12, 389. [Google Scholar] [CrossRef]
  32. Tropp, J.A.; Gilbert, A.C. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory 2007, 53, 4655–4666. [Google Scholar] [CrossRef]
  33. Schmidt, M. Least Squares Optimization with L1-Norm Regularization; CS542B Project Report; The University of British Columbia: Vancouver, BC, Canada, 2005; Volume 504, pp. 195–221. [Google Scholar]
  34. Carrillo, R.E.; Ramirez, A.B.; Arce, G.R.; Barner, K.E.; Sadler, B.M. Robust compressive sensing of sparse signals: A review. EURASIP J. Adv. Signal Process. 2016, 2016, 108. [Google Scholar] [CrossRef]
  35. Zhang, T. Sparse recovery with orthogonal matching pursuit under RIP. IEEE Trans. Inform. Theory 2011, 57, 6215–6221. [Google Scholar] [CrossRef]
  36. Burusheva, L.; Temlyakov, V. Sparse approximation of individual functions. J. Approx. Theory 2020, 259, 105471. [Google Scholar] [CrossRef]
  37. Gao, Y.; Qian, T.; Temlyakov, V.N.; Cao, L.F. Aspects of 2D-adaptive fourier decompositions. arXiv 2017. [Google Scholar] [CrossRef]
  38. Gasnikov, A.; Temlyakov, V. On greedy approximation in complex Banach spaces. Russ. Math. Surv. 2024, 79, 975–990. [Google Scholar] [CrossRef]
Figure 1. The relationship between the number of iterations of the WRPGA(co) and the function value in the noisy and ideal cases.
Figure 1. The relationship between the number of iterations of the WRPGA(co) and the function value in the noisy and ideal cases.
Axioms 14 00446 g001
Figure 2. The relationship between the number of iterations of the WCGA(co) and the function value in the noisy and ideal cases.
Figure 2. The relationship between the number of iterations of the WCGA(co) and the function value in the noisy and ideal cases.
Axioms 14 00446 g002
Figure 3. The relationship between the number of iterations of the AWRPGA(co) and the function value in the computational error and the ideal cases.
Figure 3. The relationship between the number of iterations of the AWRPGA(co) and the function value in the computational error and the ideal cases.
Axioms 14 00446 g003
Figure 4. The approximation performance of the OGA and the RPGA with computational error in the noise and ideal cases.
Figure 4. The approximation performance of the OGA and the RPGA with computational error in the noise and ideal cases.
Axioms 14 00446 g004
Table 1. The average MSE, running time, and relative error of the OGA and the RPGA with computation error ε m on noisy data.
Table 1. The average MSE, running time, and relative error of the OGA and the RPGA with computation error ε m on noisy data.
AlgorithmsAverage MSEAverage Running Time (s)Relative Error
OGA3.0705 × 10−40.40740.0001
RPGA with | ϵ m | = 0 0.01220.28090.0016
RPGA with | ϵ m | < 1 0.02960.13220.0030
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, W.; Lu, M.; Ye, P.; Zhang, W. Stability of Weak Rescaled Pure Greedy Algorithms. Axioms 2025, 14, 446. https://doi.org/10.3390/axioms14060446

AMA Style

Li W, Lu M, Ye P, Zhang W. Stability of Weak Rescaled Pure Greedy Algorithms. Axioms. 2025; 14(6):446. https://doi.org/10.3390/axioms14060446

Chicago/Turabian Style

Li, Wan, Man Lu, Peixin Ye, and Wenhui Zhang. 2025. "Stability of Weak Rescaled Pure Greedy Algorithms" Axioms 14, no. 6: 446. https://doi.org/10.3390/axioms14060446

APA Style

Li, W., Lu, M., Ye, P., & Zhang, W. (2025). Stability of Weak Rescaled Pure Greedy Algorithms. Axioms, 14(6), 446. https://doi.org/10.3390/axioms14060446

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop