A Dynamically Adjusted Subspace Gradient Method and Its Application in Image Restoration

: In this paper, a new subspace gradient method is proposed in which the search direction is determined by solving an approximate quadratic model in which a simple symmetric matrix is used to estimate the Hessian matrix in a three-dimensional subspace. The obtained algorithm has the ability to automatically adjust the search direction according to the feedback from experiments. Under some mild assumptions, we use the generalized line search with non-monotonicity to obtain remarkable results, which not only establishes the global convergence of the algorithm for general functions, but also R-linear convergence for uniformly convex functions is further proved. The numerical performance for both the traditional test functions and image restoration problems show that the proposed algorithm is efﬁcient.


Introduction
The Conjugate Gradient (CG) method is dedicated to solving the unconstrained optimization problem: min where f : R n → R is smooth and the gradient of f (x) at x k is marked g k . The advantages of its simple form and low storage requirements make the CG method a powerful tool for dealing with problem (1). It starts at a starting point x 0 and generates an iterative sequence {x k } in the following form: that is, x k moves forward by one step α k along the search direction d k and reaches the (k + 1)-th iteration point x k+1 . The direction d k is usually defined as where β k is CG parameter. The different β k corresponds to different CG methods, such as Polak and Ribiere (PRP) [1], Hestenes and Stiefel (HS) [2], Liu and Storey (LS) [3], Fletcher and Reeves (FR) [4], Dai and Yuan (DY) [5], and the conjugate descent (CD) method [6]. In addition, more relevant research and the progress of CG method can be found in the literature [7][8][9][10].
The step size α k can be obtained by different rules. In this paper, we focus on the following generalized line search, which has been shown to be very efficient for CG methods in [11].
where the definition of C k is as follows: From Equation (5), we can find that C k is a convex combination of the function values f (x 0 ) to f (x k ). The generalized line search is non-monotonic, which facilitates the establishment of the global convergence of the algorithm under milder conditions. Subspace technology plays an extraordinary role in solving large-scale unconstrained optimization problems. As the scale of optimization problems to be dealt with continues to expand, subspace technology has attracted increased attention from researchers. Using subspace minimization technology with CG method, Yuan and Stoer [12] creatively proposed theSMCG method, in which the approximate function of f (x) is minimized on the subspace Ω k+1 = Span{g k+1 , s k }, and the expression of the search direction is derived: where µ k and ν k are parameters, and s k = x k+1 − x k . Obviously, the SMCG method is a further promotion based on the CG method, and at the same time, it has a profound influence on the subsequent vigorous development of subspace technology. Based on Yuan's ideas above, Andrei [13] developed a new SMCG method, in which it further expands the search direction, develops into three subspaces, and used the acceleration strategy. Inspired by Andrei, Yang et al. [14] applied the technique of subspace minimization to another special three-term subspace and came up with a new SMCG method. On the same subspace, Li et al. [10] conducted a more in-depth study of Yang's results, analyzed more complex three-parameter situations, and set different conditions to dynamically select the search direction under different dimensions of subspace. Subspace technology has more extensive applications. Dai [15] proposed a new method called BBCG by fusing it with the Barzilai-Borwein [16] method and compared the performance of several BBCG methods proposed in the article through numerical experiments. It was found that the BBCG3 method has better performance. Many scholars have also tried to integrate the idea of minimizing subspace into the trust region method. For related research, readers can refer to [17]. More research on the use of subspace technology to construct different methods is still in progress [18][19][20][21][22]. The outline of this article is as follows: in Section 2, we give preliminary information. In Section 3, the search direction is be discussed first, and then the obtained algorithm is proposed. Based on the above-mentioned work, under some mild assumptions, the global convergence of the algorithm for general functions is proved; more importantly, the result of R-linear convergence for uniformly convex functions is also established. Some numerical results for solving unconstrained opitmization problems and image restoration problems are shown in Section 4. The conclusion and discussion are presented in Section 5.

Preliminary
The main work of this section is: in the subspace Ω k+1 = Span{−g k+1 , s k , g k }, according to the different dimensions of Ω k+1 , the discussion is divided into three cases; then, combined with the technique of subspace minimization, four forms of d k are determined, and the conditions for dynamic selection of each direction are given.
In this paper, the direction at x k+1 is expected to minimize the quadratic approximation of the objective function, on the subspace Ω k+1 , where B k+1 is regarded as an approximation of the Hessian matrix and is positive definite. Assuming B k+1 satisfies the modified secant equation [23] B k+1 s k = y * where

Direction Selection
According to the above discussion, as is known, the subspace may have three different dimensions; based on that, we analyze the selection of the search direction in the next section.
Case I: dim(Ω k+1 ) = 3. In this case, the direction can be expressed as where a k , b k , c k are parameters to be determined. Substituting (9) into (7), we get where ρ k = g T k B k+1 g k , ρ k+1 = g T k+1 B k+1 g k+1 , w k = g T k+1 B k+1 g k . Inspired by the BBCG method [11], we set where ξ k is an adaptive parameter, and its value remains the same throughout the whole paper. Setting ξ 0 = 1.5, we not only find that 1.2 ≤ ξ k ≤ 1.75, but we also show that its numerical performance is better than a constant. The matrix in (10) is represented by D k . The positive definiteness of D k is presented in Lemma 1. Now that we assume that D k is positive definite, the unique solution of (10) can be calculated as follows: where If D k is positive definite, then k > 0, so Setting and substituting the variable value in Equation (12), we get Considering the formula in Equation (13), we compute ρ k+1 as In order to make the algorithm perform better, in a manner similar to [7,24], we set the following conditions: where ζ 1 , ζ 2 are positive constants, ρ 0 ∈ (0, 1). Now, we prove that D k is positive definite.

Lemma 1.
If ρ k+1 is calculated by Equation (15), then the matrix D k is positive definite.
Proof. Using mathematical induction and ρ 0 ∈ (0, 1), we can get ρ k+1 ≥ ξ k N > 0; because where a k , b k are parameters. Similarly, substituting Equation (19) into the approximation function (7), we find (20) has a unique solution: Similar to the way that Equation w k in (11) is evaluated, we set ρ k+1 = ξ k apparently, k > 0. Furthermore, for the better performance of the algorithm, we require relevant variables to satisfy the condition in Equation (17).
As we all know, DY and HS methods have some good properties. For example, the finite termination of HS is helpful to improve the convergence rate. In view of the above considerations, we put forward an idea; when the conditions are met, we take where ζ 3 ∈ [0, 1). Above all, in the case of two-dimensional subspace, when the condition (17) is established, d k takes Equations (19) and (21); when the inequalities in Equations (22) and (23) are true, d k is calculated by Equation (24).

Description of DSCG Algorithm
In this section, we first introduce an acceleration strategy (Algorithm 1) [25] which has been shown to be quite efficient for the CG method. Then, we present our dynamically adjusted subspace conjugate gradient algorithm (DSCG, Algorithm 2) and prove that the direction satisfies sufficient descent.

Algorithm 1: Acceleration Strategy.
Step 1: Compute: z = x k + α k d k , g z = ∇ f (z) and y z = g k − g z ; Step 2: Compute: a k = α k g T k d k and b k = −α k y T k d k ; Step 3: If b k > 0, then, compute η k = − a k b k and update the variables as Algorithm 2: DSCG.
Step 2: When g k ≤ , stop, otherwise go to step 3.
Step 4: Compute the direction d k .

Convergence Analysis
In this subsection, we focus on the convergence properties of the proposed algorithm (DSCG). The sufficient descent condition is crucial for a gradient descent algorithm. In order to establish the sufficient descent condition for the DSCG method, we firstly introduce the following lemma. (9) and (12) or by Equations (19) and (20), then

Lemma 2. If d k+1 is generated by Equations
holds.
Proof. If d k+1 is generated by Equations (19) and (20), we get where H min ( 2 as the variable. When d k+1 is given by Equations (9) and (12), Proof. Based on the form of direction, we analyze this in three cases: Case I: if d k+1 = −g k+1 , let c 3 = 1 2 , and thus it is proved. Case II: When the direction is binomial, the following information must be considered. We first discuss the case where d k+1 is given by Equations (19) and (21); here, for , combined with the conditions (11), (17) and Lemma 2, the following results can be obtained: When d k is determined by Equation (24), for β k = β DY k , we have for β k = β HS k ; similarly, using (23), the same result can be obtained. Case III: When the direction is computed by Equations (9) and (12), considering Lemma 2, we first prove that ρ k+1 has an upper bound.
The above formula follows from Equations (11), (14), (15), and χ > 0. By using the conditions in Equations (16) and (17), we have Finally, using the conclusion of Lemma 2, it is concluded that Summarizing all the above cases, we take and thus complete the proof.
In the remainder of this subsection, the global convergence of the algorithm for general functions is proved; more importantly, the result of R-linear convergence for uniformly convex functions is also established in this section. We first introduce two necessary assumptions.
Assumption 1. Function f : R n → R is continuously differentiable and has a lower bound on R n . Assumption 2. The gradient function g(x) is Lipschitz continuous with a constant L > 0; i.e., which means that y k ≤ L s k . Remarkably, Assumption 1 is milder than the usual assumption: the level set D = {x ∈ n : f (x) ≤ f (x 0 )} is bounded.

Lemma 4.
Supposing α k is generated by a generalized line search (4) and satisfies Assumption 2, then Proof. By the line search condition (4), we get Since σ < 1 and g T k d k < 0, then (36) holds immediately.

Lemma 5.
If α k fulfills the generalized line search conditions (4), (5) and Assumption 1 holds, it follows that f k ≤ C k , ∀k.

Lemma 6.
Let d k+1 be generated by the DSCG algorithm. Then, there is a constant c 4 > 0 such that Proof. Similarly, we analyze it in three cases. Case I: If d k+1 = −g k+1 , let c 4 = 1, and thus it is proved. Case II: If d k+1 is given by Equation (24), from Assumption 2 and condition (22), When β k = β DY k , using the same method, we can get (41). Now, if d k is calculated by Equations (19) and (21) conditions (11) and (17) hold, then Applying the above results, combined with Cauchy inequality and triangle inequality, we have Case III: When d k is computed by Equations (9) and (12), Similarly, let us first discuss a lower bound of k . Based on Equations (11), (13), and (15), we have Defining χ 1 = ρ k (s T k y * k ), from Equations (14) and (16), we have χ = χ 1 n k ≥ χ 1 ρ 0 > 0; therefore, Then, where e k = ρ k s k + s T k y * k g k , andi k = s k g k , j k = ρ k s k 2 + s T k y * k g k 2 . From Equations (15), (17), and (18), we obtain Based on the above results, it can be deduced further that In conclusion, let and thus (40) holds.

Theorem 1.
Assuming that Assumptions 1 and 2 hold, the sequence {x k } is generated by the DSCG algorithm, and we have lim Proof. According to the generalized line search conditions (4) and (5), we know that Combined with Lemmas 3, 4, and 6, it follows that ). (51) Since According to Assumption 1 and Lemma 5, it can be seen that C k+1 has a lower bound, and so Thus, we proved that Equation (49) holds.

Theorem 2.
Supposing that Assumptions 1 and 2 hold, f is a uniformly convex function, and the unique minimizer is x * , the sequence {x k } is generated by the DSCG algorithm. For all k, there existsâ > 0 such that α k ≤â, τ max < 1. Then, there is a constant θ ∈ (0, 1), which makes Proof. In the proof of Lemma 5, we know that C k > C k+1 > f (x * ), which implies that ; obviously, r ∈ [0, 1]. Let us first analyze the case when r = 1. Now, there is a subsequence {x k j } such that From Equation (2.15) of [26], we know that 0 and so 0 < r 1 ≤ 1. Through the expression of C k+1 (1.6), we have Combining the above three formulas, it is obvious that Based on Equation (3.4) of [26], we know that the uniformly convex function f has the following property: It is known that α k ≤â, g is Lipschitz continuous; thus, it follows that Condition (51) means that and From Equation (59), we can see that Combined with condition (62), Obviously, this conflicts with Equation (59), so r = 1; i.e., Thus, there is an integer k 0 > 0 such that It is deduced that 0 < max Define θ = max{ 1+r 2 ,r}, from condition (55); obviously, It follows from condition (69) that Lemma 5 ( f k ≤ C k ) and C 0 = f 0 imply that (54) holds.

Numerical Results
In this section, we report the numerical performance of the DSCG algorithm from two aspects. Firstly, the algorithm is compared with TTS [13] and CG_DESCENT [27] algorithms on the normal unconstrained problem; secondly, the algorithm is applied to the image restoration problem, and the numerical results are observed.The running environment of all codes is a PC with 2.20 GHz CPU, 4.00 GB RAM memory, and the Windows 10 operating system.
The initial stepsize selection strategy [27] is where · ∞ represents the infinite norm. TTS and CG_DESCENT use the parameters in their code. We apply the profiles of Dolan and Moré [28] to evaluate the effectiveness of the three algorithms and discuss the performance profiles of the algorithm in CPU, NFG, and NI in detail.
The meanings of some symbols in the text are as follows:  When the problem dimension is 9000, the CG_DESCENT method only solves 64 problems, while the other two methods successfully complete all the problems. Compared with other methods, the method represented by the top curve in a performance profile drawing can solve the most problems in the best time range.
As shown in Figure 1, it is clear that the DSCG method is superior to other algorithms in terms of CPU time. Thiss corresponds to the top curve and can solve 47.49% of the test problems in the shortest time. In contrast, TTS is the fastest at solving 40.64% of the test problems, and CG_DESCENT is the fastest for only 9.59% of problems.
Now, let us focus on Figure 2. By comparison, it is found that DSCG requires fewer functions and gradient evaluations than other algorithms, which helps to simplify calculation and improve algorithm efficiency. It can solve 58.9% of the test problems with minimal function and gradient evaluations. TTS can solve 28.77% of the test problems with the least amount of function and gradient evaluations. The proportion of CG_DESCENT corresponds to 13.24%.
In addition, Figure 3 shows the performance comparison results of each algorithm in terms of the number of iterations. It can be seen from the figure that the performance of the DSCG algorithm is outstanding, as it can solve 64.38% of the problems with the minimum number of iterations. At the same time, TTS and CG_DESCENT have the least iterations in 52.05% and 7.76% of the problems, respectively.
The above three pictures of CPU, NFG, and NI contain some similar information. It is concluded that in the given test set, DSCG performs very well, with numerical results superior to those of TTS and CG_DESCENT.

Image Restoration Problem
The proposed DSCG is also applied to the problem of image restoration in this subsection. For more professional work in the field of image processing, please see [29,30]. In two scenes with different noise values, the blurred original image is repaired to make the picture clear and recognizable. This work has a wide range of applications in many fields of production and life, with important practical significance, and is also a difficult subject in the field of optimization. Its basic model is b = Ax + ς, where x ∈ n is the original image, A ∈ m×n is the blur matrix, ς ∈ m represents noise, and b ∈ m is the image observed after noise reduction. The unknown noise value ς is usually obtained through but because the image system is susceptible to noise and lack of information, it is difficult to obtain a satisfactory solution. In order to overcome the above shortcoming, the least square model is usually introduced, where Υ is the linear operator, · 1 represents the l 1 norm, and λ is the regularization parameter used to weigh the data item and the regularization term.
Stop condition:  Table 2, which presents the CPU time spent in the repair process of algorithms.  Many works have focused on the image restoration problem, and more detailed references can be found in [31][32][33][34]. In this paper, we display the original image to be repaired, and the repaired results of DSCG, TTS, and CG_DESCENT from left to right.
Summarizing the information contained in the pictures and tables in this section, we have obtained two conclusions: (i) both algorithms are capable of repairing pictures within a reasonable time frame; (ii) with noise interference of 20% and 50%, DSCG is shown to be a promising algorithm.

Conclusions and Discussion
In this paper, an algorithm for dynamically adjusting direction was proposed, which corresponds to the directions of four calculation forms by satisfying different conditions. We discuss the selection of directions in a special three-term subspace using modified secant equations, subspace minimization techniques, and acceleration strategies. The algorithm has a good property: each search direction satisfies the sufficient descent condition. We use the nonmonotonic generalized line search to obtain remarkable results: under some mild assumptions, we not only prove the global convergence of the general function algorithm but also further prove the R-linear convergence of the uniformly convex function. Interestingly, we apply this algorithm to image restoration, and the algorithm has good numerical performance in both the unconstrained and image restoration problems, which fully demonstrates the efficiency of this algorithm.