1. Introduction
Many applications in science and engineering have shown a huge interest in solving an inverse problem of finding
$x\in {\mathbb{R}}^{n}$ satisfying
where
$b\in {\mathbb{R}}^{r}$ is the observed data and
${B}^{r\times n}$ is the corresponding nonzero matrix. Actually, the inverse problem is typically facing the ill-condition of the matrix
B so that it may have no solution. Then, an approach for finding an approximate solution by minimizing the squared norm of the residual term has been considered:
Observe that the problem (
2) has several optimal solutions; in this situation, it is not clear which of these solutions should be considered. One strategy for pursuing the best optimal solution among these many solutions is to add a regularization term to the objective function. The classical technique is to consider the celebrated Tikhonov regularization [
1] of the form
where
$\lambda >0$ is a regularization parameter. In this setting, the uniqueness of the solution to (
3) is acquired. However, note that from a practical point of view, the shortcoming of this strategy is that the unique solution to the regularization problem (
3) may probably not optimal in the original sense as in (
2), see [
2,
3] for further discussions.
To over come this, we should consider the strategy of selecting a specific solution among optimal solutions to (
2) by minimizing an additional prior function over these optimal solutions. This brings the framework of the following bilevel optimization problem,
where
$f:{\mathbb{R}}^{n}\to \mathbb{R}$ and
$h:{\mathbb{R}}^{m}\to \mathbb{R}$ are convex functions, and
$A:{\mathbb{R}}^{n}\to {\mathbb{R}}^{m}$ is a nonzero linear transformation. It is very important to point out that many problems can be formulated into this form. For instance, if
$m:=n-1$,
$f:={\parallel \xb7\parallel}_{1}$,
$h:={\parallel \xb7\parallel}_{1}$, and
the problem (
4) becomes the fused lasso [
4] solution to the problem (
2). This situation also occurs in image denoising problems (
$r=n$ and
B is the identity matrix), and in image inpainting problems (
$r=n$ and
B is a symmetric diagonal matrix), where the term
${\parallel Ax\parallel}_{1}$ is known as an 1D total variation [
5]. When
$m=n$,
$f={\parallel \xb7\parallel}^{2}$,
$h={\parallel \xb7\parallel}_{1}$, and
A is the identity matrix, the problem (
4) becomes the elastic net [
6] solution to the problem (
2). Moreover, in wavelet-based image restoration problems, the matrix
A is given by an inverse wavelet transform [
7].
Let us consider the constrained set of (
4), it is known that introducing the
Landweber operator $T:{\mathbb{R}}^{n}\to {\mathbb{R}}^{n}$ of the form
yields
T is firmly nonexpansive and the set of all fixed points of
T is nothing else that the set
${argmin}_{x\in {\mathbb{R}}^{n}}{\parallel Bu-b\parallel}^{2}$, see [
8] for more details. Motivated by this observation, the bilevel problem (
4) can be considered in the general setting as
where
$T:{\mathbb{R}}^{n}\to {\mathbb{R}}^{n}$ is a nonlinear operator with
$FixT:=\{x\in {\mathbb{R}}^{n}:Tx=x\}\ne \varnothing $. Note that problem (
5) encompasses not only the problem (
4), but also many problems in the literature, for instance, the minimization over the intersection of a finite number of sublevel sets of convex nonsmooth functions (see
Section 5.2), the minimization over the intersection of many convex sets in which the metric projection on such intersection can not be computed explicitly, see [
9,
10,
11] for more details.
There are some existing methods for solving convex optimization problems over fixed-point set in the form of (
5), but the celebrated one is due to the
hybrid steepest descent method, which was firstly investigated in [
12]. Note that the algorithm proposed by Yamada [
12] goes on with the hypotheses that the objective functions are strongly convex and smooth and the operator
T is nonexpansive. Several variants and generalizations of this well-known method are, for instance, Yamada and Ogura [
11] considered the same scheme for solving the problem (
5) when
T belongs to a class of so called quasi-shrinking operator. Cegielski [
10] proposed a generalized hybrid steepest descent method by using the sequence of quasi-nonexpansive operators. Iiduka [
13,
14] considered a nonsmooth convex optimization problem (
5) with fixed-point constraints of certain quasi-nonexpansive operators.
On the other hand, in the recent decade, the split common fixed point problem [
15,
16] turns out to be one of the attractions among several nonlinear problems due to its widely applications in many image and signal processing problems. Actually, for given a nonzero linear transformation
$A:{\mathbb{R}}^{n}\to {\mathbb{R}}^{m}$, and two nonlinear operators
$T:{\mathbb{R}}^{n}\to {\mathbb{R}}^{n}$ and
$S:{\mathbb{R}}^{m}\to {\mathbb{R}}^{m}$ with
$\mathrm{Fix}\left(T\right)\ne \varnothing $, and
$\mathcal{R}\left(A\right)\cap \mathrm{Fix}\left(S\right)\ne \varnothing $, the split common fixed point problem is to find a point
${x}^{*}\in {\mathbb{R}}^{n}$ in which
The key idea of this problem is to find a point in the fixed point of a nonlinear operator in a primal space, and subsequently its image under an appropriate linear transformation forms a fixed point of another nonlinear operator in another space. This situation appears, for instance, in dynamic emission tomographic image reconstruction [
17] and in the intensity-modulated radiation therapy treatment planning, see [
18] for more details. Of course, there are many authors that have investigated the study of iterative algorithms for split common fixed point problems and proposed their generalizations in several aspects, see, for example, [
9,
19,
20,
21,
22] and references therein.
The aim of this paper is to present a nonsmooth and non-strongly convex version of the hybrid steepest descent method for minimizing the sum of two convex functions over the fixed-point constraints of the form:
where
$f:{\mathbb{R}}^{n}\to \mathbb{R}$ and
$h:{\mathbb{R}}^{m}\to \mathbb{R}$ are convex nonsmooth functions,
$A:{\mathbb{R}}^{n}\to {\mathbb{R}}^{m}$ is a nonzero linear transformation,
$T:{\mathbb{R}}^{n}\to {\mathbb{R}}^{n}$ and
$S:{\mathbb{R}}^{m}\to {\mathbb{R}}^{m}$ are certain quasi-nonexpansive operators with
$\mathrm{Fix}\left(T\right)\ne \varnothing $, and
$\mathcal{R}\left(A\right)\cap \mathrm{Fix}\left(S\right)\ne \varnothing $, and
$X\subset {\mathbb{R}}^{n}$ is a simple closed convex bounded set. We prove the convergence of function value to the minimum value where some control conditions on a stepsize sequence and a parameter are imposed.
The paper is organized as follows. After recalling and introducing some useful notions and tools in
Section 2, we present our algorithm and discuss the convergence analysis in
Section 3. Furthermore, in
Section 4, we discuss an important implication of our problem and algorithm to the minimizing sum of convex functions over coupling constraints. In
Section 5, we discuss in detail some remarkably practical applications, and
Section 6 describes the results of numerical experiments on fused lasso like problem. Finally, the conclusions are given in
Section 7.
2. Preliminaries
We summarize some useful notations, definitions, and properties, which we will utilize later.
For further details, the reader can consult the well-known books, for instance, in [
8,
23,
24,
25].
Let ${\mathbb{R}}^{n}$ be an n-dimensional Euclidean space with an inner product $\u2329\xb7,\xb7\u232a$ and the corresponding norm $\parallel \xb7\parallel $.
Let
$T:{\mathbb{R}}^{n}\to {\mathbb{R}}^{n}$ be an operator. We denote the set of all fixed points of
T by
$FixT$, that is,
We say that
T is
$\rho $-
strongly quasi-nonexpansive (
$\rho $-SQNE), where
$\rho \ge 0$, if
$FixT\ne \varnothing $ and
for all
$x\in {\mathbb{R}}^{n}$ and
$z\in FixT$. If
$\rho >0$, then
T is called
strongly quasi-nonexpansive (SQNE). If
$\rho =0$, then
T is called
quasi-nonexpansive (QNE), that is,
for all
$x\in {\mathbb{R}}^{n}$ and
$z\in FixT$. Clearly, if
T is SQNE, then it is QNE. We say that
T is
cutter if
$FixT\ne \varnothing $ and
for all
$x\in {\mathbb{R}}^{n}$ and all
$z\in FixT$. We say that
T is
firmly nonexpansive (FNE) if
for all
$x,y\in {\mathbb{R}}^{n}$.
The following properties will be applied in the next sections.
Fact 1 (Lemma 2.1.21 [
8])
. If $T:{\mathbb{R}}^{n}\to {\mathbb{R}}^{n}$ is $\mathrm{QNE}$, then $FixT$ is closed and convex. Fact 2 ([Theorem 2.2.5 [
8])
. If $T:{\mathbb{R}}^{n}\to {\mathbb{R}}^{n}$ is $\mathrm{FNE}$ with $FixT\ne \varnothing $, then T is a cutter. Let
$f:{\mathbb{R}}^{n}\to \mathbb{R}$ be a function and
$x\in {\mathbb{R}}^{n}$. The
subdifferential of
f at
x is the set
If
$\partial f\left(x\right)\ne \varnothing $, then an element
${f}^{\prime}\left(x\right)\in \partial f\left(x\right)$ is called a
subgradient of
f at
x.
Fact 3 (Corollary 16.15 [
24])
. Let $f:{\mathbb{R}}^{n}\to \mathbb{R}$ be a convex function. Then, the subdifferential $\partial f\left(x\right)\ne \varnothing $ for all $x\in {\mathbb{R}}^{n}$. Fact 4 (Proposition 16.17 [
24])
. Let $f:{\mathbb{R}}^{n}\to \mathbb{R}$ be a convex function. Then, the subdifferential $\partial f$ maps every bounded subset of ${\mathbb{R}}^{n}$ to a bounded set. As we work on the n-dimensional Euclidean space, we will use the notion of matrix instead of the notion of linear transformation throughout this work. Denote ${\mathbb{R}}^{m\times n}$ by the set of all real-valued $m\times n$ matrices. Let $A\in {\mathbb{R}}^{m\times n}$ be given. We denote by $\mathcal{R}\left(A\right):=\{y\in {\mathbb{R}}^{m}:y=Ax\mathrm{for}\mathrm{some}x\in \mathbb{R}{}^{n}\}$ its range, and ${A}^{\top}$ its (conjugate) transpose. We denote the induced norm of A by $\parallel A\parallel $, which is given by $\parallel A\parallel =\sqrt{{\lambda}_{max}\left({A}^{\top}A\right)}$, where ${\lambda}_{max}\left({A}^{\top}A\right)$ is the maximum eigenvalue of the matrix ${A}^{\top}A$.
3. Method and its Convergence
Now, we formulate the composite nonsmooth convex minimization problem over the intersections of fixed-point sets which we aim to investigate throughout this paper.
Problem 1. Let ${\mathbb{R}}^{n}$ and ${\mathbb{R}}^{m}$ be two Euclidean spaces. Assume that
- $\left(\mathrm{A}1\right)$
$f:{\mathbb{R}}^{n}\to \mathbb{R}$ and $h:{\mathbb{R}}^{m}\to \mathbb{R}$ are convex functions.
- $\left(\mathrm{A}2\right)$
$A\in {\mathbb{R}}^{m\times n}$ is a nonzero matrix.
- $\left(\mathrm{A}3\right)$
$T:{\mathbb{R}}^{n}\to {\mathbb{R}}^{n}$ is $\mathrm{QNE}$, and $S:{\mathbb{R}}^{m}\to {\mathbb{R}}^{m}$ is cutter with $\mathcal{R}\left(A\right)\cap \mathrm{Fix}\left(S\right)\ne \varnothing $.
- $\left(\mathrm{A}4\right)$
X is a nonempty convex closed bounded simple subset of ${\mathbb{R}}^{n}$.
Our objective is to solve
Throughout this work, we denote the solution set of Problem 1 by $\Gamma $ and assume that it is nonempty.
Problem 1 can be viewed as a bilevel problem in which data given from two sources in a system. Actually, let us consider the system of two users in different sources (they can have differently a number of factors n and m) in which they can communicate to each other via the transformation A. The first user aims to find the best solutions with respect to criterion f among many feasible points represented in the form of fixed point set of an appropriate operator T. Similarly, the second user has its own objective in the same fashion of finding the best solutions among feasible points in $FixS$ with priori criterion h. Now, to find the best solutions of this system, we consider the fixed-point subgradient splitting method (in short, FSSM) as follows, see Algorithm 1.
Algorithm 1: Fixed-Point Subgradient Splitting Method. |
Initialization: The positive sequence ${\left\{{\alpha}_{k}\right\}}_{k\ge 1}$ and the parameter $\gamma \in \left(0,+\infty \right)$, and an arbitrary ${x}_{1}\in {\mathbb{R}}^{n}$. Iterative Step: For given ${x}_{k}\in {\mathbb{R}}^{n}$, compute |
Remark 1. Actually, this algorithm has simultaneously the following features; (i) splitting computation, (ii) simple scheme, and (iii) boundedness of iterates. Concerning the first feature, we present the iterative scheme by allowing the process of a subgradient of f and a use of operator T in the space ${\mathbb{R}}^{n}$, and a subgradient of h and a use of operator S in the space ${\mathbb{R}}^{m}$ separately. Regarding the simplicity of iterative scheme, we need not to compute the inverse of the matrix A; in this case, the transpose of A is enough. Finally, the third feature is typically required when performing the convergence of subgradient type method. Of course, the boundedness is often considered in image processing and machine learning in the form of a (big) box constraint or a big Euclidean ball.
To study the convergence properties of a function values of a sequence generated by Algorithm 1, we start with the following technical result.
Lemma 1. Let ${\left\{{x}_{k}\right\}}_{k\ge 1}$ be a sequence generated by Algorithm 1. Then, for every $k\ge 1$ and $u\in X\cap FixT\cap {A}^{-1}(FixS)$, it holds that Proof. Let
$k\ge 1$ be arbitrary. By the definition of
${\left\{{y}_{k}\right\}}_{k\ge 1}$, we have
Now, by using the definition of
${\left\{{z}_{k}\right\}}_{k\ge 1}$ and the cutter property of
S, we derive
which in turn implies that (
6) becomes
We now focus on the last two terms of the right-hand side of (
7).
Observe that
thus, by the definition of
${\left\{{z}_{k}\right\}}_{k\ge 1}$, we obtain
Now, inequalities (
6)–(
8) together give
On the other hand, using the definition of
${\left\{{x}_{k}\right\}}_{k\ge 1}$ and the assumption that
T is QNE, we obtain
Replacing (
9) in (
10), we obtain
Next, the convexities of
f and
g give
and
By making use of these two inequalities in (
11), we obtain
which is the required inequality and the proof is completed. ☐
The following lemma is very useful for the convergence result.
Lemma 2. Let ${\left\{{x}_{k}\right\}}_{k\ge 1}$ be a sequence generated by Algorithm 1. Then, ${\left\{{x}_{k}\right\}}_{k\ge 1}$ is bounded. Furthermore, if $0<\gamma <\frac{1}{{\parallel A\parallel}^{2}}$ and ${\left\{{\alpha}_{k}\right\}}_{k\ge 1}$ is bounded, then the sequences ${\left\{SA{x}_{k}\right\}}_{k\ge 1}$, ${\left\{{h}^{\prime}\left(SA{x}_{k}\right)\right\}}_{k\ge 1}$, ${\left\{{z}_{k}\right\}}_{k\ge 1}$, ${\left\{{y}_{k}\right\}}_{k\ge 1}$, ${\left\{T{y}_{k}\right\}}_{k\ge 1}$, and ${\left\{{f}^{\prime}\left(T{y}_{k}\right)\right\}}_{k\ge 1}$ are bounded.
Proof. As
X is a bounded set, it is clear that the sequence
${\left\{{x}_{k}\right\}}_{k\ge 1}$ is bounded. Now, let
$u\in \Gamma $ be given. The linearity of
A and quasi-nonexpansiveness of
S yield
This implies that
${\left\{SA{x}_{k}\right\}}_{k\ge 1}$ is bounded. Consequently, applying Fact 4, we obtain that
${\left\{{h}^{\prime}\left(SA{x}_{k}\right)\right\}}_{k\ge 1}$ is also bounded.
By the triangle inequality, we have
Therefore, the boundedness of
${\left\{{\alpha}_{k}\right\}}_{k\ge 1}$ implies that
${\left\{{z}_{k}\right\}}_{k\ge 1}$ is bounded. Consequently, the triangle inequality and the linearity of
${A}^{\top}$ yields the boundedness of
${\left\{{y}_{k}\right\}}_{k\ge 1}$. As
T is QNE, we have
${\left\{T{y}_{k}\right\}}_{k\ge 1}$ is bounded. Thus,
${\left\{{f}^{\prime}\left(T{y}_{k}\right)\right\}}_{k\ge 1}$ is bounded by Fact 4. ☐
For the sake of simplicity, we let
and assume that
${(f+h\circ A)}^{*}>-\infty $.
We consider a convergence property in objective values with diminishing stepsize as the following theorem.
Theorem 1. Let ${\left\{{x}_{k}\right\}}_{k\ge 1}$ be a sequence generated by Algorithm 1. If the following control conditions hold,
- (i)
$0<\gamma <\frac{1}{{\parallel A\parallel}^{2}}$;
- (ii)
${\sum}_{k=1}^{\infty}{\alpha}_{k}=+\infty $ and ${\sum}_{k=1}^{\infty}{\alpha}_{k}^{2}<+\infty $;
Proof. Let
$z\in \Gamma $ be given. We note from Lemma 1 that for every
$k\ge 1$
this is true because
$-\left(1-{\gamma \parallel A\parallel}^{2}\right)\gamma {\parallel {z}_{k}-A{x}_{k}\parallel}^{2}\le 0$ via the assumption (i). Summing up (
12) for
$1,\dots ,k$ we obtain that
where
This implies that
Next, we show that
${lim\; inf}_{k\to +\infty}\left((f\left(T{y}_{k}\right)+h\left(SA{x}_{k}\right))-{(f+h\circ A)}^{*})\right)\le 0$. By supposing a contradiction that
there exist
${k}_{0}\ge 1$ and
$\gamma >0$ such that
for all
$k\ge {k}_{0}$. Thus, we have
which is a contradiction. Therefore, we can conclude that
☐
Remark 2. The convergence results obtained in Theorem 1 are slightly different from the convergence results obtained by the classical gradient method or even projected gradient method, namely, ${lim\; inf}_{k\to +\infty}(f\left(T{y}_{k}\right)+h\left(SA{x}_{k}\right))={(f+h\circ A)}^{*}$. This is because, in each iteration, we can not ensure whether the estimate $T{x}^{k}$ is belonging to the constrained set $Fix\left(T\right)$ or not, this means that the property $f\left(T{y}_{k}\right)\ge {f}^{*}$ may not be true in general. Similarly, we can not ensure that $h\left(SA{x}_{k}\right)\ge {(h\circ A)}^{*}$.
Remark 3. A step size sequence satisfies the assumption that ${\sum}_{k=1}^{\infty}{\alpha}_{k}=+\infty $ and ${\sum}_{k=1}^{\infty}{\alpha}_{k}^{2}<+\infty $ is, for instance, ${\left\{\frac{a}{k}\right\}}_{k\ge 1}$ where $a>0$.
4. Convex Minimization Involving Sum of Composite Functions
The aim of this section is to show that Algorithm 1 and their convergence properties can be employed when solving a convex minimization involving sum of a finite number of composite functions.
Let us take a look the composite convex minimization problem:
where, we assume further that, for all
$i=1,\dots ,l$, there hold
- (I)
] ${A}_{i}\in {\mathbb{R}}^{{m}_{i}\times n}$ are nonzero matrices,
- (II)
${S}_{i}:{\mathbb{R}}^{{m}_{i}}\to {\mathbb{R}}^{{m}_{i}}$ are cutter operators with $\mathcal{R}\left({A}_{i}\right)\cap Fix{S}_{i}\ne \varnothing $, and
- (III)
${h}_{i}:{\mathbb{R}}^{{m}_{i}}\to \mathbb{R}$ is a convex function.
In this section, we assume that the solution set of (
13) is denoted by
$\Omega $ and asuume that it is nonempty.
Denote the product of spaces
equipped with the addition
the scalar multiplication
with the inner product defined by
and the norm by
for all
$\mathbf{x}=({x}_{1},{x}_{2},\dots ,{x}_{l})$,
$\mathbf{y}=({y}_{1},{y}_{2},\dots ,{y}_{l})\in {\mathbb{R}}^{m}$, is again a Euclidean space (see [
24], Example 2.1). Define a matrix
$\mathbf{A}\in {\mathbb{R}}^{n\times m}$ by
and an operator
$\mathbf{S}:{\mathbb{R}}^{m}\to {\mathbb{R}}^{m}$ by
for all
$\mathbf{y}=({y}_{1},{y}_{2},\dots ,{y}_{l})\in {\mathbb{R}}^{m}$. Note that the operator
$\mathbf{S}$ is cutter with
Furthermore, defining a function
$\mathbf{h}:{\mathbb{R}}^{m}\to \mathbb{R}$ by
for all
$\mathbf{x}=({x}_{1},{x}_{2},\dots ,{x}_{l})\in {\mathbb{R}}^{m}$, we also have that the function
$\mathbf{h}$ is a convex function (see [
24], Proposition 8.25). By hte above setting, we can rewrite the problem (
13) as
which is nothing else than Problem 1.
Here, to investigate the solving of the problem (
13), we state the Algorithm 2 as follow.
Algorithm 2: Distributed Fixed-Point Subgradient Splitting Method. |
Initialization: The positive sequence ${\left\{{\alpha}_{k}\right\}}_{k\ge 1}$ and the parameter $\gamma \in \left(0,+\infty \right)$, and an arbitrary ${x}_{1}\in {\mathbb{R}}^{n}$. Iterative Step: For given ${x}_{k}\in {\mathbb{R}}^{n}$, compute |
As an above consequence, we note that
for all
$\mathbf{x}=({x}_{1},{x}_{2},\dots ,{x}_{l})\in {\mathbb{R}}^{m}$. Furthermore, we know that
see ([
25], Corollary 2.4.5). Putting
${\mathbf{d}}_{k}:=({d}_{k,1},\dots ,{d}_{k,l})$ where
${d}_{k,i}\in \partial {h}_{i}\left({S}_{i}{A}_{i}{x}_{k}\right),i=1,\dots ,l$, for all
$k\ge 1$, we obtain that
for all
$k\ge 1$. Notice that
for all
$k\ge 1.$ Thus, Algorithm 2 can be rewrite as
for all
$k\ge 1$. Since
${\parallel \mathbf{A}\parallel}^{2}\le {\sum}_{i=1}^{l}{\parallel {A}_{i}\parallel}^{2}$, the convergence result therefore follows from Theorem 1 and can be stated as the following corollary.
Corollary 1. Let ${\left\{{x}_{k}\right\}}_{k\ge 1}$ be a sequence generated by Algorithm 2. If the following control conditions hold:
- (i)
$0<\gamma <\frac{1}{{\sum}_{i=1}^{l}{\parallel {A}_{i}\parallel}^{2}}$;
- (ii)
${\sum}_{k=1}^{\infty}{\alpha}_{k}=+\infty $ and ${\sum}_{k=1}^{\infty}{\alpha}_{k}^{2}<+\infty $;
6. Numerical Experiments
In this section, to demonstrate the effectiveness of the fixed-point subgradient splitting method (Algorithm 1), we apply the proposed method to solve the fused lasso like problem. All the experiments were performed under MATLAB 9.6 (R2019a) running on a MacBook Pro 13-inch, 2019 with a 2.4 GHz Intel Core i5 processor and 8 GB 2133 MHz LPDDR3 memory.
For a given design matrix
$\mathbf{A}:=[{\mathbf{a}}_{1}|\cdots |{\mathbf{a}}_{r}{]}^{\top}$ in
${\mathbb{R}}^{r\times s}$ where
${\mathbf{a}}_{i}=({a}_{1i},\dots ,{a}_{si})\in {\mathbb{R}}^{s}$ and a response vector
$b=({b}_{1},\dots ,{b}_{r})\in {\mathbb{R}}^{r}$. We consider the fused lasso like problem of the form
where
Observe that by setting the functions
$f\left(x\right)={\parallel x\parallel}_{1}$,
$h\left(x\right):={\parallel x\parallel}_{1}$,
$A:=D$, the Landweber operator
$S:=Id$; the identity matrix; and the constrained set
$X:={[-1,1]}^{s}$, we obtain that the problem (
18) is a special case of Problem 1 so that the problem (
14) can be solved by Algorithm 1 (See
Section 5.1) for more details.
We generate the matrix
$\mathbf{A}$ with normally distributed random chosen in
$(-10,10)$ with a given percentage
${p}_{\mathbf{A}}$ of nonzero elements. We generate vectors
$b=({b}_{1},\dots ,{b}_{r})\in {\mathbb{R}}^{r}$ corresponding to
$\mathbf{A}$ by the linear model
$b=\mathbf{A}{x}_{0}+\u03f5$, where
$\u03f5\sim N(0,\parallel \mathbf{A}{x}_{0}{\parallel}^{2})$ and the vector
${x}_{0}$ has
$10\%$ of nonzero components with normally distributed random generating. The initial point is a vector whose coordinates are chosen randomly in
$(-1,1)$. In the numerical experiment, denoting the estimate
for all
$k\ge 1$, we consider the behavior of the average of the relative changes
with the optimality tolerance
${10}^{-3}$. We performed 10 independent tests for any collection of dimensions
$(r,s)$ and the percentages of nonzero elements of
$\mathbf{A}$ for various step size parameters
${\alpha}_{k}$. The results are showed in
Table 1, where the average number of iterations (#Iters) and average CPU time (Time) to reach the optimality tolerance for any collection of parameters are presented.
We see that the method performed with parameter ${\alpha}_{k}=0.1/k$ behaves significantly better than other in the sense of the average number of iterations as well as the averaged CPU time for all dimensions and percentages of nonzero elements of $\mathbf{A}$. Moreover, in the case ${p}_{\mathbf{A}}=10\%$, we observed much bigger number of the averaged iterations and CPU time for the choice of the combinations of parameters ${\alpha}_{k}=0.3/k,0.5/k,0.7/k$ and $0.9/k$ with all the problem sizes.