A q -Gradient Descent Algorithm with Quasi-Fejér Convergence for Unconstrained Optimization Problems

: We present an algorithm for solving unconstrained optimization problems based on the q -gradient vector. The main idea used in the algorithm construction is the approximation of the classical gradient by a q -gradient vector. For a convex objective function, the quasi-Fejér convergence of the algorithm is proved. The proposed method does not require the boundedness assumption on any level set. Further, numerical experiments are reported to show the performance of the proposed method.


Introduction
The descent direction plays a central role in the development of optimization algorithms. The classical gradient descent method was first proposed by Cauchy [1] in 1847. The optimization problem is a significant mathematical model in a wide class of disciplines [2]. In applications such as image processing [3], data analysis [4], and machine learning [5], in which one needs to quickly provide an approximate solution, several gradient-based algorithms [6][7][8][9] have been proposed based on the iterative technique of the gradient descent method [2,10]. Quantum calculus is the modern field for the investigation of calculus without limits. Quantum calculus, or q-calculus, began with the work of Jackson in the early twentieth century [11], but similar kinds of calculus had already been developed by Euler and Jacobi in the eighteenth and nineteenth centuries, respectively. Recently it has come under deep interest due to the high demand of mathematics that models quantum computing. Besides appearing as a connection between mathematics and physics, q-calculus has many applications in different mathematical areas such as operator theory [12], combinatorics [13], orthogonal polynomials [14], basic hyper-geometric functions [15], and other sciences like quantum theory [16,17], mechanics [18], and fractional calculus [19][20][21][22][23][24][25]. For more recent studies about fractional calculus analysis and applications, refer to [26][27][28][29][30][31].
The q-Taylor formula for functions of several variables with mean value theorems in qcalculus was first used to develop a new method for solving systems of equations [32]. The advantage of q-calculus is shown in the next example [32] where the a scheme involving q-derivatives finds the solution, but the classical Newton-Kantorovich method fails to do so      |x 2 1 − 4| + e 7x 2 −36 = 2, log 10 For q = 0.9, the iterations converge to the exact solution as The concept of the q-analogy of the gradient was first introduced in [33] for solving systems of equations. It has some advantages with respect the classical method when the functions are not differentiable. Further, the same concept of q-gradient was introduced in the steepest descent method [34] to optimize single objective functions [35]. The parameter q was generated by a Gaussian distribution with standard deviation σ that decreases by an iterative process as with a starting standard deviation σ 0 and the reduction factor β. The step length was generated using the golden section search method [34]. However, the convergence properties of the steepest descent method with inexact line searches have been studied under several strategies for the choice of the step length α k [36][37][38]. Recently, several modified unconstrained optimization algorithms using the q-gradient have been proposed to solve unconstrained optimization problems [10,19,[39][40][41].
In this paper, we propose a q-gradient line search scheme that provides a q-descent direction at every kth iteration. For this, a sequence q (k) [39] is taken to generate the values of q, and the backtracking technique is utilized to find the step length without requiring the bounded level sets or Lipschitz condition on the gradient of the function. We also provide the convergence proof theoretically when step length is fixed without any hypothesis on the level sets of the objective function. The advantage of using q-gradient is shown by comparing our method with the method given in [36] based on the number of iterations and function evaluations. The paper is organized as follows. In the next section, some notations and definitions for q-calculus and other prerequisites are provided, which are used throughout the paper. In Section 3, the q-gradient descent algorithm is given, and its convergence analysis is provided in Section 4. Numerical experiments are performed in Section 4, which is followed by a section of concluding remarks.

Essential Preliminaries
We assume that R + stands for the nonnegative real line, q ∈ (0, 1) is a real number, and the q-integer [n] q is given as: for all n ∈ N. The expansion of (1 + x) n q is: The q-derivative of x n with respect to x is: The q-derivative of a function ψ : R → R is given by In the special case, The higher order q-derivatives of ψ are: The q-derivative of a function is a linear operator [42] for any constants c 1 and c 2 as: Let ψ(x) be a continuous function on [a, b], where a, b ∈ R. Then, there existsq ∈ (0, 1) and x ∈ (a, b) [43] such that for all q ∈ (q, 1) ∪ (1,q −1 ). The q-partial derivative of function ψ : R n → R at x ∈ R n with respect to x i is defined as (see ( [35]): A function is called q-differentiable at a point if its q-partial derivatives are continuous. A continuously q-differentiable function ψ is a function whose q-derivative function is also continuous at the point. We now choose the parameter q as a vector; that is, q = (q 1 , . . . , q i , . . . , q n ) T ∈ R n .

Proposition 1.
If ψ(x) = a 0 + x T a, where a 0 ∈ R and a ∈ R n , then for any x, q ∈ R n , Example 1. Consider a function ψ : R 2 → R such that ψ(x) = 1 x 1 x 2 . Then, the q-gradient is given as Definition 1 (q-Integral [42]). The q-analog of the integral is given by In the special case, it reduces to the classical integral b 0 ψ(x) dx, when q → 1. It is right only in the case [44] 1 0 Definition 2 (q-Newton-Leibniz formula [42]). The q-anti-derivative of ψ(x) is In that manner, the q-Newton-Leibnitz formula is Definition 3 (Quasi-Fejér Convergence [36]). A sequence {x (k) } is quasi-Fejèr convergent to a set U ⊆ R n if for every u ∈ U , there exists a non-negative, summable sequence { k } ⊆ R such that k ≥ 0, ∑ ∞ k=0 k < ∞ and for all k.
In the next section, the q-gradient descent algorithm is presented for solving unconstrained optimization problems.

A q-Gradient Descent Algorithm
We consider the unconstrained optimization problem: where ψ : R n → R is a continuously q-differentiable convex function. Note that min is same as the max of −ψ(x). We choose a starting point x (0) ∈ R n . The general iterative scheme to solve (9) using the q-gradient is of the following form: where x (k+1) is a new iterative point, x (k) is the previous iterative point, and d (k) q (k) is a q-descent direction given as: and α k ∈ R is called the step length can be computed by two main line search strategies: exact line search and inexact line search. The exact optimal step length theoretically cannot be found for practical computation, and it is expensive to generate the value of α k . Therefore, the most frequently used technique in practice is inexact line searching for finding the descent direction. When inexact line searches are performed then α k is assigned a given predetermined value or through some finite iterative method. Note that existence of a solution to the minimization problem (9) is implicitly assumed. The condition for the existence of a solution [2] in the context of q-calculus is: The following result presents the first-order necessary condition in the light of q-calculus.

Theorem 1.
Let Ω be a subset of R n and ψ be a first-order q-differentiable real valued function on Ω. If x * is a local minimizer of ψ over Ω, then for any feasible q-direction d Define the composite function f (α) = ψ(x(α)). Applying q-Taylor's theorem to ψ(α) and taking We can write this as At α = 0 and when we have x * = x(0), then Therefore, where α ≥ 0. Since x * is a local minimizer of ψ over Ω, for α sufficiently small α > 0, we can write From (11) and (12), This completes the proof.
We present the following result for the interior case in the context of q-calculus.

Corollary 1.
Let Ω be a subset of R n and ψ be a first-order q-differentiable real valued function on Ω. If x * is a local minimizer of ψ over Ω and if x * is an interior point of Ω, then ∇ q (k) ψ(x * ) = 0.
Proof. Since x * is a local minimizer of ψ over Ω, then for any feasible q-direction, we have Since x * is an interior point of Ω, then every q-direction is a feasible direction, therefore From (13) and (14), we obtain This completes the proof.
Before writing an algorithm for the q-gradient descent method, we need the following assumptions. Assumption 1. We consider the two following assumptions:

1.
Let ψ : R n → R be convex and continuously q-differentiable.

2.
The q-gradient of ψ with constant L > 0 satisfies the following condition: for all x, y ∈ R n .
Note that (3) implies that φ (0) ≥ 0, thus from (2), we obtain 0 ≤ φ (0) < 1 and from (1), φ is convex and continuously q-differentiable. Therefore, φ is non-decreasing. The statement of the theorem given in [45] can be presented in the light of q-calculus as given below. There Then, there exists a neighborhood U (x • ) of x • and at least one function : for any x ∈ U (x • ).

If
, then the function is the only one that satisfies (16) and is continuous at (x • , u • ).
Note that the proof q-analogue of the above implicit function theorem is out of the scope of the present research. For developing the algorithm, we need the following proposition whose proof is given in the light of [36].

1.
For all x ∈ G, there exists a unique (x) > 0 such that and if and only if 0 ≤ u ≤ (x).

1.
We first prove (1). Fix x ∈ G, u ∈ R + and define the function in the context of q-calculus as: From (1) of Assumption 2, F (x, .) is convex and continuously q-differentiable, and when substituting u = 0 in (19), then we obtain Applying q-derivative with respect to u to (19), then Substituting u = 0 in the right hand side of above equation, we obtain Since In addition, where ψ * is the minimum function value of ψ. From (20) and (23), we conclude that F (x, .) is negative in some interval to the right of zero, and from (24), (1) and (2) of Assumption 2, we obtain lim From Theorem 2 it follows that there exists (x) > 0 such that F (x, (x)) = 0. Using the above value in (19), we obtain (17). Since F (x, .) is convex, therefore there exists a uniqueness of (x). Note that a convex function of a real variable can take a given value different from its minimum point at most two different points while and from (20) and (23), the minimum point of F (x, .) is not zero. Thus, (1) of this proposition is proved.
We present the following Algorithm 1.

1.
In Algorithm 1, the modified backtracking technique finds α k using only one inequality instead of two inequalities required in [46] 2.
We can find α k by another technique; we take positive numbers δ 1 and δ 2 such that where , so that step length α k can be computed using here L > 0.
Note that we start our Algorithm 1 by taking s 0 such that where δ 1 and δ 2 are two positive numbers. This proposed algorithm is well defined, then it must be established that the following inequality used in Algorithm 1, is satisfied after some finite number of steps. Note that every accumulation point of {x (k) } is a minimizer of ψ. Since {ψ(x (k) )} is non-increasing we have for all k with fixed x * . The content of Theorem 5 will be argued later. The analysis of backtracking used in Algorithm 1 to compute α k is shown below due to [36]. (29) and (30) stops after a finite number of iterations with min    δ 1 ,

From Equations
and since s 0 > δ 1 , then

2.
There exists a unique t ∈ N, where t ≥ 1 such that Then, From Equations (29) and (30), we have used in Algorithm 1 so the above inequality establishes that We now claim that α k = s t . From Equation (34), we obtain s t ≤ (x (k) ), and if we assume s t−1 > (x (k) ), then for t = 1, we obtain Case 2, so this assumption is true. Using Proposition 2, and α k = s t , the inequality (30) is satisfied, but this does not satisfy if α k = s t−1 . Note that (31) follows from (34) and in fact, we have that s t ≤ s 0 < δ 2 .
The proof is complete.
The following proposition states that the q-gradient descent method moves in orthogonal steps whose proof is very similar to the proof of [2] (Proposition 6.1).

Proposition 4. If {x (k) } ∞
k=0 is a q-gradient descent sequence for a given function ψ : R n → R, then for every k, the vector Proof. The iterative formula of the q-gradient descent is: For k = k + 1, we obtain From (35) and (36), we obtain We need to show that used in Algorithm 1, then from the first-order necessary condition, we have We obtain the desired result after putting the value of the above equation in (37).
The above theorem implies that ∇ q (k) ψ(x (k) ) is parallel to the tangent plane to the level set ψ(x) = ψ x (k+1) , at x (k+1) when q (k) → (1, . . . , 1) T as k → ∞. Note that each approximate point generated by the q-gradient descent algorithm decreases the corresponding objective function ψ.
is the q-gradient descent sequence for ψ : R n → R and Proof. We know that There exists an α > 0 such that φ k (0) > φ k (α), for each α ∈ (0, α]. That is, Thus, This completes the proof.

Convergence Analysis
In this section, we present the convergence analysis of the proposed method with the inexact line searches and the concept of quasi Fejér convergence. Proof. The proof of the above theorem can be seen in [36]. There exists γ > 0 such that The {ψ(x (k) )} is nonincreasing and convergent, 3. Proof.

1.
For Algorithm 1, we compute α k using inequalities (27) and (29). Note that Since α k > 0, we also have We take By definition of ξ, there exists θ > 0 such that if for a general case, the value of step length α ∈ (0, θ), then However, for every k, we take the following two cases for choosing the step length as: In this case, by Proposition 3, we have that and it follows from (1) and (2) of Assumption 2 that φ is increasing, implying φ(θ) ≤ φ(α k ). Thus, we have From (38), we have From Equations (39) and (40) and using above inequality, we obtain Thus, (1) of this theorem is proved. If we use Equations (29) and (30) to compute the step length then from q-Newton-Leibniz formula [42] ψ From (2) of Assumption 1, we obtain Using a special case of the q-integral [44], we obtain where we assume From (28), we have then we obtain 1 . From (41), we obtain where Thus, (1) is proved.
We also need to present the following proposition to prove the convergence of Algorithm 1.

Proposition 6. Let
Every point generated by Algorithm 1 is placed in T, making T nonempty. For any z ∈ T , we have then the sequence {x (k) } generated by Algorithm 1 is a quasi-Fejér convergence to a point x * ∈ T with any α k > 0.
Proof. Given that z ∈ T , then A function will be called a q-convex function if ψ satisfies the following inequality in the light of q-calculus: With the above inequality and (42), we obtain and from (3) of Theorem 2, we have then we have that x (k) is a quasi-Fejér convergence to T ⊆ R n with and from Theorem 3, {x (k) } is bounded. Further, it has an accumulation point x of {x (k) }, which is in T . Thus, lim k→∞ x (k) = x.
Theorem 5. The sequence {x (k) } generated by Algorithm 1 converges in the sense of quasi-Fejér convergence to a minimizer of function ψ : R n → R.
Proof. From Proposition 6, lim k→∞ x (k) = x * ∈ T , where T is a set of accumulation points that are responsible for decreasing the objective function in every iteration. However, we need to prove that x * ∈ T * , where T * is a set of minimizers that minimize the objective function. From Algorithm 1, suppose that x * ∈ T * , then from the convexity of ψ, From Proposition 2, (x * ) > 0 and x (k) converges to (x * ). Thus, there exists k 0 such that for all k ≥ k 0 , we have Let Then, for any k ≥ k 0 , Since From Equations (43) and (45), we obtain This contradicts (3) of Theorem 4. Thus it is proved that x * ∈ T * . If we choose α k using (27) and (28), then we have Since α 2 k ≥ δ 2 1 , then From (3) of Theorem 4, the continuity of ∇ q (k) f (.), and δ 1 > 0, used in the above is a good approximation of ∇ψ x (k) [2]. Further, see the proof of Proposition 2.2 in [47] for an affine function where the classical gradient and q-gradient of ψ are the same. For other functions, Example 3.2 and Remark 3.3 can be seen again in [47]. Hence, the accumulation point x * = x (k) is a minimizer of the function.

Experimental Results
We compared the numerical performance of our algorithm with the methodology used in [36]. The stopping criteria were set as where = 10 −6 to terminate both the algorithms. Numerical results were compared based on the number of iterations and number of function evaluations. The iteration was stopped either when it satisfied the stopping criteria or when iteration counts were 500. All problems were taken from [48] and computer codes were written in R language.
The numerical experiments were performed on an Intel Core i5-3210M CPU with 2.5 GHz, 4 GB of RAM and a 64-bit XXX (Intel, Santa Clara, CA, USA) to solve the unconstrained minimization problems. Example 2 ([49]). Consider a function ψ : R 3 → R is given by We apply a q-gradient descent algorithm with a starting point on different values of τ = 2, 5, 10, 20, 50. We compared our method with the methodology used in [36] and the numerical results are described in Table 1.
It is worth mentioning that the method given in this paper generates the least number of iterations, and minimizer x * and minimum function value ψ(x * ) = ψ * are almost the same for both methods.
With a starting point Algorithm 1 shows improvement over the algorithm given in [36]. The numerical results are shown in the following Tables 2 and 3, with abbreviations used in columns as 'gn' norm of gradient, 'n f ' number of function evaluations, and step length computed using the backtracking technique. We observe that our proposed algorithm converges to the solution point in 28 iterations while the algorithm used in [36] converges to the same solution point in 40 iterations. The graphs for both methods in terms of f e versus the logarithm of the function value are provided in Figures 1 and 2. The three-dimensional pictorial representation of Example 3 is shown in Figure 3.   Dolan and Moré [51] presented an appropriate technique to demonstrate the performance profiles, which is a statistical process. The performance ratio is presented as: where r (p,s) refers to the iteration and function evaluations for solver s spent on problem p, and n s refers to the number of problems in the model test. The cumulative distribution function is given as: where P s (τ) is the probability that a performance ratio ρ (p,s) is within a factor of τ ∈ R. That is, for a subset of the methods being analyzed, we plot the fraction ρ s (τ) of problems for which any given method is within a factor τ of the best. We use this tool to show the performance of Algorithm 1. First, we solved 28 test problems with different starting points and stored the number of iterations and number of function evaluation values in Table 4. In fact, Figures 4 and 5 show that the q-Gradient Descent method solves about 86% and 79% of 28 test problems [48] with the least number of iterations and function evaluations, respectively. We can conclude that the proposed method is superior.    Table 3.