A Novel Forward-Backward Algorithm for Solving Convex Minimization Problem in Hilbert Spaces

In this work, we aim to investigate the convex minimization problem of the sum of two objective functions. This optimization problem includes, in particular, image reconstruction and signal recovery. We then propose a new modified forward-backward splitting method without the assumption of the Lipschitz continuity of the gradient of functions by using the line search procedures. It is shown that the sequence generated by the proposed algorithm weakly converges to minimizers of the sum of two convex functions. We also provide some applications of the proposed method to compressed sensing in the frequency domain. The numerical reports show that our method has a better convergence behavior than other methods in terms of the number of iterations and CPU time. Moreover, the numerical results of the comparative analysis are also discussed to show the optimal choice of parameters in the line search.


Introduction
Let H be a real Hilbert space, and let f , g : H → R ∪ {+∞} be proper, lower-semicontinuous, and convex functions. The convex minimization problem is modeled as follows: min x∈H ( f (x) + g(x)). (1) Throughout this paper, assume that Problem (1) is nonempty, and the solution set is denoted by S * . We know that Problem (1) can be described by the fixed point equation, that is, where α > 0 and prox g is the proximal operator of g defined by prox g = (Id + ∂g) −1 where Id denotes the identity operator in H and ∂g is the classical convex subdifferential of g. Using this fixed point equation, one can define the following iteration process: where α k is a suitable stepsize. This algorithm is known as the forward-backward algorithm, and it includes, as special cases, the gradient method [1][2][3] and the proximal algorithm [4][5][6][7]. Recently, the construction of algorithms has become a crucial technique for solving some nonlinear and optimization problems (see also [8][9][10][11][12][13][14][15]).
It was proven that the sequence generated by (4) converges to minimizers of f + g. However, we note that the convergence of Algorithm 1 depends on the Lipschitz continuity of the gradient of f , which is not an easy task to find in general.
The Douglas-Rachford algorithm is another method that can be used to solve the problem (1). It is defined in the following manner: where It was shown that the sequence (x k ) defined by (5) converges to minimizers of f + g. In this case, we see that the main drawback of Algorithm 2 is that it requires two proximity operators of convex functions f and g per iteration. This leads to a slow convergence speed of algorithms based on Algorithm 2. Please see Section 4 for its convergence.
Very recently, Cruz and Nghia [17] introduced the forward-backward algorithm by using the line search technique in the framework of Hilbert spaces. Assume that the following conditions are satisfied: (A1) f , g : H → R ∪ {+∞} are two proper, l.s.c, convex functions where domg ⊆ dom f and domg is nonempty, closed, and convex.
(A2) f is Fréchet differentiable on an open set that contains domg. The gradient ∇ f is uniformly continuous on bounded subsets of domg. Moreover, it maps any bounded subset of domg to a bounded set in H.
They showed that the sequence (x k ) generated by Algorithm 3 involving the line search technique that eliminates the Lipschitz assumption on the gradient of f converges weakly to minimizers of f + g. It is observed that, to obtain its convergence, one has to find the stepsize α in each iteration. This can be costly and time consuming in computation.
In variational theory, Tseng [24] introduced the following method for solving the variational inequality problem (VIP): where P C is a metric projection from a Hilbert space onto the set C, F is a monotone and L-Lipschitz continuous mapping, and λ ∈ (0, 1/L). Then, the sequence generated by (6) weakly converges to a solution of VIP. This method is often called Tseng's extragradient method and has received great attention by researchers due to its convergence speed (see, for example, [25][26][27][28]). Following this research direction, the main challenge is to design novel algorithms that can speed up the convergence rate compared to Algorithms 1-3.
In this paper, inspired by Cruz and Nghia [17], we suggest a new forward-backward algorithm to solve the convex minimization problem. We then prove weak convergence theorems of the proposed algorithm. Finally, some numerical experiments in signal recovery are given to show its efficiency. Numerical experiments show that our new algorithms have a better convergence behavior than other methods in comparison. The main advantage of this work is that our schemes do not require the computation of the Lipschitz constant as assumed in Algorithm 1.
The content is organized as follows: In Section 2, we recall the useful concepts that will be used in the sequel. In Section 3, we establish the main theorem of our algorithms. In Section 4, we give numerical experiments to validate the convergence theorems, and finally, in Section 5, we give the conclusions of this paper.

Preliminaries
In this section, we give some definitions and lemmas that play an essential role in our analysis. The strong and weak convergence of (x k ) k∈N to x will be denoted by x k → x and x k x, respectively. The subdifferential of h at z is defined by: It is known that ∂h is maximal monotone [29]. The proximal operator of g is defined by prox g : H → domg with prox g (z) = (Id + ∂g) −1 (z), z ∈ H. We know that the prox g is single valued with full domain. Moreover, we have: The following lemma is crucial in convergence analysis.

Lemma 1 ([29]
). Let H be a Hilbert space. Let S be a nonempty, closed, and convex set of H, and let (x k ) be a sequence in H that satisfies: (ii) ω w (x k ) ⊂ S. Then, (x k ) weakly converges to an element of S.

Main Results
In this section, we suggest a new forward-backward algorithm and prove the weak convergence. Next, we assume that Conditions (A1)-(A2) hold.
We next summarize the methods for solving the convex minimization problem (CMP) in Figure 1.

Theorem 1.
Let (x k ) k∈N and (α k ) k∈N be generated by Algorithm 4. If there is α > 0 such that α k ≥ α > 0 for all k ∈ N, then (x k ) k∈N weakly converges to an element of S * .
Proof. Let x * be a solution in S * . Then, we obtain: Using the definition of d k and the line search (8), we have: This shows that: On the other hand, Substituting (11) into (12), we get that: This gives: Combining (10) and (14), we obtain: Since This shows that: By (15) and (16), we see that: Therefore, from (9) and the above, we have: Since x k+1 = x k − γη k d k , it follows that γη k d k = x k − x k+1 . This implies: Thus, lim k→∞ x k − x * exists, and hence, (x k ) is bounded. This yields lim k→∞ x k − x k+1 = 0. We note, by (18), that: By the monotonicity of ∇ f , we see that: Therefore, we have: On the other hand, we have: Therefore, it follows that: Combining (20) and (21), we obtain: By the boundedness of (x k ) k∈N , we know that the set of its weak accumulation points is nonempty. Let x ∞ be a weak accumulation point of (x k ) k∈N . Therefore, there is a subsequence (x n k ) k∈N of (x k ) k∈N . Next, we show that Since y k n = (I + α k n ∂g) −1 (I − α k n ∇ f )x k n , we obtain: (I − α k n ∇ f )x k n ∈ (I + α k n ∂g)y k n , which yields: 1 α k n (x k n − y k n − α k n ∇ f (x k n )) ∈ ∂g(y k n ).
Since ∂g is maximal, we have: Since lim k→∞ x k − y k = 0 and by (A2), we have lim Hence, we obtain 0 ∈ (∇ f + ∂g)x ∞ , and consequently, x ∞ ∈ S * . This shows that (x k ) converges weakly to an element of S * by applying Lemma 1. We thus complete the proof.

Numerical Experiments
Next, we apply our result to the signal recovery in compressive sensing. We show the performance of our proposed Algorithm 4, Algorithm 1 of Combettes and Wajs [16], Algorithm 2 of Douglas-Rachford [22], and Algorithm 3 of Cruz and Nghia [17]. This problem can be modeled as: where y ∈ R M is the observed data, is the noise, A : R N → R M (M < N) is a bounded and linear operator, and x ∈ R N is a recovered vector containing m nonzero components. It is known that (23) can be modeled as the LASSO problem: where λ > 0. Therefore, we can apply the proposed method to solve (1) when f (x) = 1 2 y − Ax 2 2 and g(x) = λ x 1 . In experiment, y is generated by the Gaussian noise with SNR = 40, A is generated by the normal distribution with mean zero and variance one, and x ∈ R N is generated by a uniform distribution in [−2, 2]. We use the stopping criterion by: where x k is an estimated signal of x * . In the following, the initial point x 0 is chosen randomly, and α k in Algorithm 1 is 1 A 2 and λ k = 0.5. In Algorithm 2, λ k = 0.02 and γ = 0.02. Let σ = 100, θ = 0.1, and δ = 0.1 in Algorithm 3 and Algorithm 4, and let γ = 1.9 in Algorithm 4. We denote by CPU the time of CPU and by iter the number of iterations. In Table 1, we test the experiment five times and then calculate the averages of CPU and iter. All numerical experiments presented were obtained from MATLAB R2010a running on the same laptop computer. The numerical results are shown as follows: From Table 1, we see that the experiment result of Algorithm 4 was better than those of Algorithms 1 and 2 in terms of CPU time and number of iterations in each cases.
Next, we provide Figure 2 to show signal recovery in compressed sensing for one example and  From Figures 2 and 3, it is revealed that the convergence speed of Algorithm 4 was better than the other algorithm. To be more precise, Algorithm 2 had the highest CPU time since it required two proximity operators in computation per iteration. Moreover, Algorithm 1 that had the stepsize that was bounded above by the Lipschitz constant had the highest number of iterations. In our experiments, it was observed that the initial guess did not have any significant effect on the convergence behavior.
Next, we analyze the convergence and the effects of the stepsizes, which depended on parameters δ, θ, γ, and σ in Algorithm 4. We next study the effect of the parameter δ in the proposed algorithm for each value of δ. From Table 2, we observe that the CPU time and the number of iterations of Algorithm 4 became larger when the parameter δ approached 0.5 when N = 512, M = 265 and N = 1024, M = 512. Figure 4 shows the numerical results for each δ. From Figure 4, we see that our algorithm worked effectively when the value of δ was taken close to zero.
Next, we investigate the effect of the parameter θ in the proposed algorithm. We intend to vary this parameter and study its convergence behavior. The numerical results are shown in Table 3.
From Table 3, we observe that the CPU time of Algorithm 4 became larger and the number of iterations had a small reduction when the parameter θ approached one when N = 512, M = 265 and N = 1024, M = 512. Figure 5 shows the numerical results for each θ.   From Figure 5, it is shown that Algorithm 4 worked effectively when the value of θ was chosen close to one.
Next, we study the effect of the parameter γ in the proposed algorithm. The numerical results are shown in Table 4. From Table 4, we see that the CPU time and the number of iterations of Algorithm 4 became smaller when the parameter γ approached two when N = 512, M = 265 and N = 1024, M = 512. We show numerical results for each cases of γ in Figure 6. From Figure 6, it is shown that Algorithm 4 worked effectively when the value of γ was chosen close to two.
Next, we study the effect of the parameter σ in the proposed algorithm. The numerical results are given in Table 5.   From Table 5, we see that the parameter σ had no effect in terms of the number of iterations and CPU time when N = 512, M = 265 and N = 1024, M = 512.

Conclusions
In this work, we studied the modified forward-backward splitting method using line searches to solve convex minimization problems. We proved the weak convergence theorem under some weakened assumptions on the stepsize. It was found that the proposed algorithm had a better convergence behavior than other methods through experiments. Our algorithms did not require the Lipschitz condition on the gradient of functions. We also presented numerical experiments in signal recovery and provided a comparison to other algorithms. Moreover, the effects of all parameters were shown in Section 4. This main advantage was very useful and convenient in practice for solving some optimization problems.
Author Contributions: Supervision, S.S.; formal analysis and writing, K.K.; editing and software, P.C. All authors have read and agreed to the published version of the manuscript.