A Regularized Tseng Method for Solving Various Variational Inclusion Problems and Its Application to a Statistical Learning Model

: We study three classes of variational inclusion problems in the framework of a real Hilbert space and propose a simple modiﬁcation of Tseng’s forward-backward-forward splitting method for solving such problems. Our algorithm is obtained via a certain regularization procedure and uses self-adaptive step sizes. We show that the approximating sequences generated by our algorithm converge strongly to a solution of the problems under suitable assumptions on the regularization parameters. Furthermore, we apply our results to an elastic net penalty problem in statistical learning theory and to split feasibility problems. Moreover, we illustrate the usefulness and effectiveness of our algorithm by using numerical examples in comparison with some existing relevant algorithms that can be found in the literature.


Introduction
Variational inclusion problems have widely been studied by researchers because of their many valuable applications and generalizations.It is well known that many problems in applied sciences and engineering, mathematical optimization, machine learning, statistical learning, and optimal control can be modeled as variational inclusion problems.See, for example, [1][2][3] and references therein.In addition, under some assumptions, such problems involve many important concepts in applied mathematics, such as convex minimization, split feasibility, fixed points, saddle points, and variational inequalities; see, for example, [4][5][6][7].
Let H be a real Hilbert space, and let S : H ⇒ H and T : H → H be maximal monotone and monotone operators, respectively.The variational inclusion problem (VIP) is to find u ∈ H such that 0 ∈ (T + S)u.
The FBSM operates as follows: where λ ∈ (0, 2 L ) and I : H → H is the identity operator.The FBSM was proved to generate sequences that converge weakly to a solution of (1) under the assumption that T is 1  L -cocoercive (or inverse strongly monotone).Aside from the cocoercivity assumption, convergence is only guaranteed under a similar strong assumption such as the strong monotonicity of S + T [8].Interested readers could consult [9] for results on finding zeros of sums of maximal monotone operators using a similar forward-backward scheme.
As an improvement on the FBSM, Tseng [10] proposed the modified forward-backward splitting method (MFBSM) (also called the forward-backward-forward splitting method) for solving the VIP for a more general case, where T is monotone and L-Lipschitz continuous.The MFBSM has the following structure: A weak convergence theorem was proved for this algorithm.While the implementation of the FBSM requires the prior knowledge of the Lipschitz constant of T, which makes the algorithm a bit restrictive, the MFBSM uses line search techniques to circumvent the onerous task of estimating the Lipschitz constant.Be that as it may, line search techniques are computationally expensive, as they require several extra computations per iteration; see, for example, [11].For the inexact and the stochastic versions of the MFBSM, respectively, please see [12,13].
Motivated by the need to reduce the computational burden associated with the line search technique, and by the fact that strong convergence is more desirable than weak convergence in infinite dimensional Hilbert spaces and in applications, some authors have incorporated existing hybrid techniques in the MFBSM and have proposed hybrid-like strongly convergent methods with self-adaptive step sizes.See, for instance, [11,14] and references therein.
Let S : H ⇒ H and T i : H → H be maximal monotone and monotone operators, respectively, where i ∈ [I] := {1, 2, . . ., I}.We recall the modified variational inclusion problem (MVIP) introduced in [15]: where a i ∈ (0, (a i T i + S)x.However, the converse is not true in general.
In addition, the MVIP is more general than the following common variational inclusion problem, which has recently been studied in [16]: In view of its generality, the MVIP has recently attracted the attention of some authors who studied it and proposed algorithms for solving it.The authors of [17] studied the problem for the case where the T i s are inverse strongly monotone.They proposed a Halperninertial forward-backward splitting algorithm for solving the problem and proved a strong convergence theorem.Moreover, the authors of [18] studied the MVIP in the case where the T i s are monotone and Lipschitz continuous and designed a modified Tseng method for solving it.By using some symmetry properties, they proved the weak convergence of the method they proposed.Furthermore, they provided a formulation of an image deblurring problem as an MVIP and used their results in order to solve this problem.On the other hand, motivated by the work [16], the authors of [19] have recently studied the following common variational inclusion problem: where S i : H ⇒ H are maximal monotone operators and T i : H → H are monotone Lipschitz continuous operators.They combined the inertial technique, Tseng's method, and the shrinking projection method to design an iterative algorithm that solves (4).Motivated by the above studies, in this paper, we propose a unified simple modification of the MFBSM, which we call the regularized MFBSM (RMFBSM), for solving (2)-( 4) and establish a strong convergence theorem for the sequences it generates.To realize our objectives, we first examine a regularized MVIP and study the solution nets it generates.The novelty of our scheme lies in the fact that, unlike the existing modifications of the MF-BSM in the literature, it yields strong convergence while preserving the two-step structure of the MFBSM.In the case where T i = T and S i = S for all i = 1, 2, . . ., I, we apply our result to a statistical learning model and to split feasibility problems.
The organization of our paper is as follows: In Section 2, we recall some useful definitions and preliminary results, which are needed in our study and in the convergence analysis of our algorithm.In Section 3, we propose the regularized MFBSM and establish a strong convergence theorem for it.In Section 4, we give some applications of our main results.In Section 5, we present numerical examples to illustrate our method and compare it with some existing related algorithms in the literature.We conclude with Section 6.

Preliminaries
We start this section by stating some notations and recalling a number of important definitions.
Let C be a nonempty, closed, and convex subset of a real Hilbert space H, the inner product and induced norm of which are denoted by •, • and • , respectively.We denote by 'u n u' and 'u n → u' the weak and the strong convergence, respectively, of the sequence {u n } to a point u.The following identity is well known: (v) hemicontinuous if for every u, v, w ∈ H, we have Remark 1.In view of the above definitions, it is clear that every strongly monotone mapping is monotone.In addition, Lipschitz continuous mappings are hemicontinuous.
Let S : H ⇒ H be a set-valued operator.The graph of S, denoted by gr(S), is defined by The set-valued operator S is called a monotone operator if ∀ (u, v), (y, z) ∈ gr(S), u − y, v − z ≥ 0 and a maximal monotone operator if the graph of S is not a proper subset of the graph of any other monotone operator.For a maximal monotone operator S and λ > 0, the resolvent of S is defined by It is well known that J λS is firmly nonexpansive (in particualr, nonexpansive).
For each u ∈ H, there exists a unique nearest element, denoted by P C u ∈ C and called the metric projection of H onto C at u.That is, The indicator function of C, denoted by i C , is defined by Recall that the subdifferential ∂ f of a proper convex function f at u ∈ H is defined by The normal cone of C at the point u ∈ H, denoted by N C (u), is defined by We know that ∂i K is a maximal monotone operator, and we have ∂i K = N K .Furthermore, for each λ > 0, The following important lemmata are useful in our convergence analysis.

Lemma 1 ([21]
).Let H be a real Hilbert space.Let S : H ⇒ H be a maximal monotone operator and let T : H → H be a monotone and Lipschitz continuous mapping.Then, the mapping M = S + T is a maximal monotone mapping.

Lemma 2 ([22]
).Let H be a real Hilbert space.Let K be a nonempty, closed, and convex subset of H, and let F : H → H be a hemicontinuous and monotone operator.Then, ū is a solution to the variational inequality if and only if ū is a solution to the following problem:

Lemma 3 ([23]
).Let H be a real Hilbert space.Suppose that F : H → H is κ-Lipschitzian and β-strongly monotone over a closed and convex subset K ⊂ H.Then, the variational inequality

Lemma 4 ([24]
).Let {Ψ n } be a sequence of non-negative real numbers, {a n } be a sequence of real numbers in (0, 1) satisfying the condition a n = ∞, and {b n } be a sequence of real numbers.
Assume that

Regularized Modified Forward-Backward Splitting Method
Let S : H ⇒ H be a maximal monotone operator, T i : H → H be a monotone and L i -Lipschitz continuous operator for i ∈ [I] := {1, 2, . . ., I}, and a i ∈ (0, 1) such that ∑ i∈[I] a i = 1.We denote the solution set of problem (2) by Ω, that is, and assume that Ω = ∅.Let F : H → H be a γ-strongly monotone and L-Lipschitz continuous operator.We are seeking a solution x * ∈ H such that We denote the solution set of ( 6) by Ω.In this connection, we consider the following regularized modified variational inclusion problem (RMVIP): where τ > 0 is a regularization parameter.Observe that by Lemma 1, ∑ i∈[I] a i T i + S is a maximal monotone operator.In addition, (∑ i∈[I] a i T i + S) + τF is strongly monotone because F is strongly monotone.Therefore, for each τ > 0, problem (7) possesses a unique solution which we denote by u τ .The following result concerning the solution net {u τ } τ∈(0,1) can be deduced from [25] of Lemma 2, see also [7] of Proposition 3.1, but we will give the proof for completeness.
u τ exists and belongs to Ω.
Proof.(a) Let u τ be a solution to (7).Then we have Using ( 8) and ( 9), we obtain Since T i is monotone, it follows from (10) that Using (11) and the γ-strong monotonicity of F , we find that It now follows from (12) Using the monotonicity of (∑ i∈[I] a i T i + S) and ( 13), we find that Using ( 14), we see that Therefore, it follows from (15) that Using ( 16), we obtain where The boundedness of the net {u τ } τ∈(0,1) implies that there exists a subsequence {u Since the operator (∑ i∈[I] a i T i + S) is maximal monotone and hence sequentially closed in the weak-strong topology on H × H, taking n → ∞, we infer that 0 ∈ (∑ i∈[I] a i T i + S) ū.Therefore, ū ∈ Ω.
It now follows from (17) and the monotonicity of T i that Using ( 18), we conclude that Moreover, using (19) and the γ-strong monotonicity of F , we see that Therefore, using (20), we have Passing to the limit in (21) as n → ∞, we find that and using Lemma 2, we arrive at the conclusion that ū solves (6).The uniqueness of the solution to F x * , u † − x * ≥ 0 according to Lemma 3 implies that ū = x * .Thus, passing to the limit as n → ∞ in (20), we conclude that u τ n → x * as n → ∞.
Following an approach similar to the above analysis, we denote the solution set of problem (4) by Ω 2 and consider the corresponding regularized common variational inclusion problem: find u ∈ H such that 0 ∈ (T i + S i )u + τF u for all i.
Iterative steps: Given u n , calculate u n+1 as follows: Step 1: Compute Step 2: Update Set n := n + 1 and go back to Step 1. Proof.The proof follows from [26] of Lemma 2.1.

Lemma 6.
Let {u n } be the sequence generated by Algorithm 1.Then, there exists n 0 ∈ N such that where u τ n solves the RMVIP with u τ replaced by u τ n .
Proof.Using the definition of y i n , we obtain from which it follows that Employing ( 24) and ( 25) in particular for i n , we obtain In addition, Therefore, it follows from ( 26) and ( 27) that Using the fact that F is γ-strongly monotone, we infer from (28) that Note that and Substituting ( 30) and ( 31) in (29) and multiplying throughout by 2, we obtain Using (24) and the Peter-Paul inequality, as well as (32), for any 1 > 0, we have Choose 1 = L γ .Observe that from Lemma 5 and the Assumption 1 (A1) and Assumption 1 (A3), Therefore, there exists n 0 ∈ N such that 1 − > (1 − µ) for all n ≥ n 0 .Thus, it follows from (33) that as asserted.
Theorem 1.The sequences generated by Algorithm 1 converge strongly to x * ∈ Ω2 .
Proof.Invoking Proposition 1(b) and the Peter-Paul inequality, we see that there exists 2 > 0 such that Next, we define the sequence {β n } by Then, from Lemma 6 and (35), it follows that ∀i ∈ [I] and for n ≥ n 0 , Let 2 = 0.5β n .Then, from (36), it follows that We see that δ n ∈ (0, 1) and Then, from (37), we obtain It is not difficult to see that the sequence {b n } is bounded.To complete the proof, we invoke Lemma 4 by showing that lim sup From ( 39), we infer that lim Moreover, from ( 24) and ( 40 We next present some consequences of the above result to confirm that our Algorithm 1 indeed provides a unified framework for solving the inclusion problems ( 2)-( 4).

Corollary 1.
Let S : H ⇒ H be a maximal monotone operator and T i : H → H be monotone Lipschitz continuous operators with constants L i for i ∈ [I].Let F : H → H be a γ-strongly monotone and Lipschitz continuous operator with constant L. Assume Then, the sequences generated by Algorithm 2, under Assumption 1, converge strongly to Proof.Note that Algorithm 2 is derived from Algorithm 1 by taking S i = S for i ∈ [I].Thus, the proof follows from the proof of Theorem 1.

Corollary 2.
Let S : H ⇒ H be a maximal monotone operator and T j : H → H be monotone Lipschitz continuous operators with constants L j for j ∈ [J].Let F : H → H be a γ-strongly monotone and Lipschitz continuous operator with constant L and a j ∈ (0, 1) such that Assume that Ω := {x ∈ H : 0 ∈ ∑ j∈[J] (a j T j + S)} = ∅.Then, the sequences generated by Algorithm 3, under Assumption 1, converge strongly to x * ∈ Ω, satisfying F x * , u * − x * ≥ 0 ∀u * ∈ Ω.
Iterative steps: Given u n , calculate u n+1 as follows: Step 1: Compute Step 2: Compute Update Set n := n + 1 and go back to Step 1.
Proof.Note that Algorithm 3 is derived from Algorithm 1 by taking I = 1 and setting Note that the new operator T is monotone and Lipschitz continuous.Thus, the proof follows from the proof of Theorem 1.
Remark 3. On the other hand, we note that the structure of some steps in Algorithm 1 differs from that of Algorithm 1 of [18].For instance, the authors of [18] needed to solve an optimization problem to compute u n+1 .Our iterative scheme 3 provides an alternative and bypasses such difficulties.However, we do not consider accelerated schemes in the present study because such results can easily be obtained from the results in a recent work of ours [27].

Applications
In this section, in the cases where I = 1, we will consider two applications of our main results.

Split Feasibility Problems
Let C and Q be nonempty, closed, and convex subsets of the real Hilbert spaces H 1 and H 2 , respectively, and let A : H 1 → H 2 be a bounded linear operator, the adjoint of which is denoted by A * : H 2 → H 1 .We now recall the split feasibility problems (SFP): We denote the set of solutions of ( 42) by S and assume that S = ∅.The concept of SFP was introduced by Censor and Elfving [28] in 1994 in the framework of Euclidean spaces.It has been successfully applied to model some inverse problems in medical image reconstruction, phase retrievals, gene regulatory network inference, and intensity modulation radiation therapy; see, for example, [29,30].From its conception to date, the SFP have been widely studied by several authors who also proposed various iterative schemes for solving them.Interested readers should consult [31][32][33][34][35][36][37][38] and references therein for recent studies and generalizations of this problem.One can verify that the solution set of SFP ( 42) coincides with the solution set of the following constrained minimization problem; see, for example, [36]: However, it is to be observed that the minimization problem ( 43) is ill-posed, in general, and therefore calls for regularization.We consider the following Tikhonov regularization [38]: Equivalently, (44) can be written as the following unconstrained minimization problem: Note that the function f κ is differentiable and its gradient ∇ f κ (u) = A * (Au − P Q Au) + κu.Problem (45) is equivalent to the following inclusion problem: It is not difficult to see that ∇ f κ is monotone and ( A 2 + κ) Lipschitz continuous.Therefore, since ∂i C is a maximal monotone operator, (46) assures us that we can apply our result to solving the SFP (42).Thence, our next result.
Theorem 2. Let C and Q be nonempty, closed, and convex subsets of the real Hilbert spaces H 1 and H 2 , respectively.Suppose that A : H 1 → H 2 is a bounded linear operator and let the operator A * : H 2 → H 1 be its adjoint operator.Assume that S := {u ∈ H 1 : u ∈ C and Au * ∈ Q} = ∅ and that Assumption 1 holds.Then, the sequence {u n } generated by Algorithm 4 converges strongly to a point u * ∈ S, satisfying F u * , y − u * ≥ 0 ∀y ∈ S.
Proof.Let S = ∂i C and T i = T = ∇ f κ for all i = 1, 2, . . ., N in Algorithm 1.Note that by (5), we have J λ n S = P C .Therefore, the proof can be obtained by following the steps of the proof of Theorem 1.
Set n := n + 1 and go back to Step 1.

Elastic Net Penalty Problem
We consider the following linear regression model used in statistical learning: where y is the response predicted by the N predictors A 1 , A 2 , • • • , A N and is a random error term.We assume that the model is sparse with a limited number of nonzero coefficients.The model can be written in matrix format as follows: where y ∈ R M , A ∈ R M×N is the predictor matrix and u = ( ū1 , ū2 , . . ., ūN ) ∈ R N .In the statistics community, one solves the model by recasting (48) as the following penalized optimization problem: where pen σ (u), σ ≥ 0, is a penalty function of u.Penalized regression and variable selection are two key topics in linear regression analysis.They have recently attracted the attention of many authors who have proposed and analyzed various penalty functions (see [39] and references therein).A very popular penalized regression model for variable selection is LASSO (least absolute shrinkage and selection operator), which was proposed by Tibshirani [40].It is an 1 -norm regularization least square model given by where σ 1 ≥ 0 is a nonnegative regularization parameter.The 1 penalty makes LASSO perform both continuous shrinkage and automatic variable selection simultaneously [41].However, its conditions are invalid when applied to a group of highly correlated variables [42].
In addition to LASSO, other approaches for variable selection with penalty functions more general than the 1 penalty have been proposed (please see [43] and references therein for more details on other penalty functions and applications to SVM).
In this subsection, we focus on the penalty function, which was proposed by Zou and Hastie [41] and called the elastic net penalty function.The elastic net is a linear combination of 1 and 2 penalties and is defined by Therefore, we will consider the following elastic net penalty problem: Then, ∇ f (u) = A (Au − y) + 2σ 2 u, where A stands for the transpose of A. Thus, ∇ f is monotone and Lipschitz continuous with constant L + 2σ 2 , L being the largest eigenvalue of the matrix A A. Note that g is a proper lower semicontinuous convex function and ∂g is maximal monotone.The resolvent of the maximal monotone operator ∂g is given as (see [20]) By the optimality condition, the penalty problem (51) is equivalent to the following inclusion problem: We apply our Algorithm 1 to solve the elastic net penalty problem (51), taking S = ∂g and T i = T = ∇ f for each i = 1, 2, . . ., N, and then compare the effectiveness of our methods with some existing methods in the literature.To this end, let u ∈ R N be a sparse randomly generated N × 1 matrix and let A ∈ R M×N and ∈ R M be randomly generated matrices, the entries of which are normally distributed with mean 0 and variance 1.In our experiments, we choose different values for u, u 1 , A, and , as follows: = randn(120, 1), u = sprandn(512, 1, 1 64 ), u 1 = sprandn(512, 1, 1 64 ).
We compare the performance of our Algorithm (1) (RMFBSM) with MFBSM and VTM.We choose α n = 1 n+2 and f (u) = u 2 for VTM.The stopping criterion used for our computations is e n = u n − J λ n S (I − λ n T)u n < 10 −8 .Observe that e n = 0 implies that u n is a solution to the inclusion problem of finding u ∈ H such that 0 ∈ (S + T)u.We plot the graphs of errors against the number of iterations in each case.The figures and numerical results are shown in Figure 2 and Table 2, respectively.

Algorithm 1 ,Lemma 5 .
and d n = 0 in Algorithm 1 of Hieu et al.[25], then the two algorithms are the same.However, the problems we intend to solve are more general than the problem (VIP) studied byHieu et  al.The sequence of step sizes in Algorithm 1 is convergent, as shown in the next lemma.The sequence {λ n } generated by Algorithm 1 is bounded, and λ n ∈ [min{ µ L in , λ 1 }, λ 1 + P].Moreover, there exists λ ∈ [min{ µ L in , λ 1 }, λ 1 + P] such that lim n→∞ λ n = λ, where P = ), it follows that lim k→∞ u n k +1 − u n k = 0. Indeed, it immediately follows from conditions (A1) and (A2) that lim sup k→∞ b k = 0. Therefore, using Lemma 4, we obtain that lim n→∞ Θ n = 0. Consequently, using Proposition 1 (c) and (40), we conclude that lim n→∞ u n = lim n→∞ y i n = x * .

Figure 1 .
Figure 1.Top left: Case A; top right: Case B; bottom left: Case C; bottom right: Case D.