Improving the Performance of Optimization Algorithms Using the Adaptive Fixed-Time Scheme and Reset Scheme

: Optimization algorithms have now played an important role in many ﬁelds, and the issue of how to design high-efﬁciency algorithms has gained increasing attention, for which it has been shown that advanced control theories could be helpful. In this paper, the ﬁxed-time scheme and reset scheme are introduced to design high-efﬁciency gradient descent methods for unconstrained convex optimization problems. At ﬁrst, a general reset framework for existing accelerated gradient descent methods is given based on the systematic representation, with which both convergence speed and stability are signiﬁcantly improved. Then, the design of a novel adaptive ﬁxed-time gradient descent, which has fewer tuning parameters and maintains better robustness to initial conditions, is presented. However, its discrete form introduces undesirable overshoot and easily leads to instability, and the reset scheme is then applied to overcome the drawbacks. The linear convergence and better stability of the proposed algorithms are theoretically proven, and several dedicated simulation examples are ﬁnally given to validate the effectiveness


Introduction
With the rapid development of big data and artificial intelligence, machine learning and deep learning have played a vital role in many fields where the original problem can always be transformed into an optimization problem [1][2][3][4].Gradient descent (GD), an unconstrained convex optimization algorithm, is a popular method of solving such optimization problems [5,6].As the complexity of the problem increases rapidly, the problem of how to design high-efficiency optimization algorithms has gained increasing attention.To improve the performance of conventional GD, many variants such as GDs with additional momentum [7,8] and robust GDs [9-11] have been considered.Recently, it was found that control theories could contribute substantially to the analysis and design of high-efficiency GDs, and many important results have been reported.Though many high-efficiency GDs have been proposed, only asymptotic convergence can be achieved, and the closed-loop stability can be worsened.In this paper, the reset scheme and fixed-time scheme in control theories will be applied to improve the performance both in stability and convergence rate of existing GDs.
Many research results have implied that control theory could help the analysis and design of optimization algorithms.For instance, the convergence property of numerical algorithms was proven by using the passivity theory in [12].Considering the continuous-time form of momentum GD (MGD), the acceleration mechanism was interpreted by the system response of a linear second-order system in [7].Recently, the authors in [13,14] formulated the Neserov accelerated GD (NGD) and MGD as a second-order continuous-time system and analyzed the convergence property with the famous Lyapunov theorem.Furthermore, by defining the Bregman Lagrangian, a large class of AGDs were generated in [15].Additionally, several discretization strategies were also provided to derive the discrete-time form.For more details about analyzing and designing optimization algorithms using control theory, one may refer to [16][17][18].
Though many high-efficiency GDs based on control theory have been proposed, only asymptotic convergence can be achieved.In order to further accelerate the convergence speed and achieve non-asymptotic convergence, finite-time and fixed-time convergence have been considered in optimization algorithms.For instance, finite-time convergent GD was designed, motivated by the finite-time control in [19], which was similar to the design of finite-time reaching laws in sliding mode control.Additionally, by using the Hessian matrix, fixed-time convergence was achieved.Furthermore, novel fixed-time stable gradient flows were designed, motivated by the fixed-time convergence theorems in [20,21], and the results were also extended to constraint optimization in [22].In [23,24], novel fractional FTGDs were proposed based on fractional-order system theory.The fixed-time scheme has also been widely applied for multi-agent systems and distributed optimization [25,26].Though existing fixed-time GDs could reach the minimum point in a fixed time, they have too many tuning parameters, and their discrete form can easily lead to instability.
Besides the aforementioned issue, existing high-efficiency GDs may encounter an undesirable overshoot when accelerating the convergence speed [27].The restarting scheme has been found to be an excellent strategy for attenuating the overshoot.For instance, an adaptive restarting scheme was given for time-varying NGD in [28], where the time-varying parameters were re-initialized when the restarting condition held.In [29], a cone-based restarting scheme was proposed to attenuate the overshoot for simplified NGD, and the results were then extended to a non-smooth and non-strongly convex case in [30].Though the restarting scheme was introduced many years ago, a general design framework has not been established.In system control, the reset scheme is an efficient strategy for attenuating the overshoot and has been widely applied in practice.The main idea is to reset some of the control input to zero when the output reaches the reference point [31,32].As shown before, AGDs can be formulated as a second-order feedback system, and the reset scheme can then be applied perfectly to improve the convergence performance.
Motivated by the aforementioned reasons, the fixed-time scheme and reset scheme were introduced to design high-efficiency algorithms for unconstrained optimization problems.By viewing the momentum item in AGDs as the control input, the reset scheme can be perfectly applied, and a general design framework for reset AGDs can then be obtained.Secondly, a novel FTGD with with fewer tuning parameters was designed, motivated by the results in [33].To further improve the stability of the discrete FTGD, the reset scheme was applied, and it was found the the reset scheme could significantly improve the performance and stability of the discrete FTGD.Several dedicated numerical examples are given to verify all the results.The main contributions can be concluded as follows

•
The reset scheme is utilized to improve the performance of AGDs, and a general design framework is also given, with which both the convergence performance and the stability of the optimization algorithms are significantly improved.

•
A novel fixed-time optimization algorithm with an adaptive learning rate is proposed.This algorithm has fewer tuning parameters and is more robust to initial conditions compared with the existing results in [22].

•
The reset scheme is applied for discrete FTGD, with which the convergence speed and stability of the discrete FTGD are both significantly improved.
The remainder of the paper is organized as follows: Section 2 formulates the optimization problem and systematic representation for AGDs.The basic principle of reset control and a general design framework of reset AGDs are given in Section 3. A novel adaptive FTGD and reset FTGD are presented in Section 4. A conclusive discussion of the proposed algorithms is provided in Section 5. Some illustrative examples are shown in Section 6 to validate the effectiveness of the proposed algorithms.Section 7 concludes the paper.

Systematic Representation for AGDs
In this paper, the following unconstrained convex optimization problem is considered where f (x) has one global minimum point x * .Before moving on, some basic definitions are listed in the following [34,35].
Definition 1.A convex function f (x) is said to be l-smooth if its gradient exists and there exists a scalar l > 0 such that ∇ f (x) − ∇ f (y) ≤ l x − y , ∀x, y.
Definition 2. A convex function f (x) is said to be µ-strongly convex if its gradient exists and there exists a scalar µ > 0 such that Definition 3. A convex function f (x) is said to be κ-gradient-dominated if its gradient exists and there exists a scalar κ > 0 such that where f * = f (x * ) is the minimum value of function f (x).
As shown in many existing results, optimization algorithms could be viewed as a feedback system, and many advanced control theories could be applied to design highefficiency algorithms.In the following, systematic representation for AGDs will be given, which is helpful for understanding the reset scheme in optimization algorithms.Commonly used types of AGD include MGD and NGD [36].MGD can be formulated as where η > 0 is the learning rate, and 0 < λ < 1 is the decaying parameter.Performing Z-transform on both sides of (3) yields where z is treated as the control input and X(z) as the output, and one has the following transfer function Similarly, simplified NGD can be formulated as and the following transfer function can be similarly derived While for the conventional GD the corresponding transfer function is Remark 1.The mentioned GD, MGD and NGD can be formulated as a feedback system with different transfer functions.As known, a second-order system generally has a faster response speed compared with a first-order system, which indicates the accelerating mechanism for MGD and NGD in system theory.Moreover, the transfer function of NGD has a zero λ 1+λ compared with MGD, which contributes to the generally better convergence performance of NGD.However, a second-order system will lead to an undesirable overshoot, which worsens the convergence performance around the minimum point.
By using the transfer function description, many equivalent variants can be derived for MGD and NGD, which is known as state-space realization.Taking NGD (7) as an example, an intermediate variable Performing inverse Z-transform on both sides yields According to the system theory , NGD ( 6) and ( 11) are equivalent under the initial zero condition.The systematic representation for AGDs is helpful for implementing the reset scheme, and it will be shown that NGD (11) is more suitable for the reset scheme in the following.

Brief Introduction for Reset Control
In system control, reset control is an efficient strategy for attenuating the overshoot by simply setting the control input to zero when an overshoot is detected.Consider the linear feedback system with an open-loop transfer function (5), where part z z−1 can be viewed as the controlled system and part η z−λ can be viewed as the controller.For a discrete-time system, the overshoot can be detected by checking the sign of (x k+1 − r)(x k − r) where r is the reference signal.The reset scheme sets the control input to zero when an overshoot is detected, i.e., (x k+1 − r)(x k − r) < 0. The control diagram is shown in Figure 1.As shown in Figure 2, the second-order system has a fast response speed, and it results in an undesirable overshoot around the reference signal.By using the reset scheme, it is found that the overshoot is totally eliminated, and the convergence performance is significantly improved.(Parameter settings: λ = 0.9, η = 0.1, r = 1).In optimization problems, the reference signal, i.e., x * , is always unknown, and the reset condition (x k+1 − r)(x k − r) < 0 cannot be applied anymore.Furthermore, the mentioned reset condition in one dimension cannot be directly extended to the highdimensional case.The overshoot for an optimization problem can be defined based on the function value.Definition 4. For a convex optimization problem, the convergence procedure is said to have an overshoot if its function value is not constantly decreasing.
According to the aforementioned definition, the reset condition can be directly given as f (x k+1 ) > f (x k ), which is commonly used in restarted GDs [29].Furthermore, the following reset condition can be used for convex optimization The next problem to be considered is which variable needs to be reset when the reset condition holds.In [28], the authors re-initialized the time-varying parameters in NGD, while the authors in [29] replaced the AGD with conventional GD.In this paper, the problem will be reconsidered from the perspective of reset control, and a general design framework of reset AGDs will be given.

Reset MGD
In MGD (3), y k is treated as the momentum item, which is the weighted sum of all the previous gradients and helps accelerating the convergence speed.However, such a momentum item will easily lead to an undesirable overshoot around the minimum point, which is known as the overshoot in system theory.From the perspective of system control, y k can be viewed as the control input, and one can reset y k to zero when the overshoot is detected.Reset MGD is described as Algorithm 1.The reset condition is always checked for the next step, and if the condition is satisfied, then the iteration will be re-conducted by setting y k = 0. Interestingly, the authors in [29] provided an NGD with gradient-mapping restart, which is exactly the same one as the proposed reset MGD.Therefore, the following lemma holds according to the results in [29].
Lemma 1.For an l-smooth and µ-strongly convex function f (x), the convergence speed of reset MGD in Algorithm 1 is linear with 0 < η < 2 l+µ .
The proposed reset scheme cannot be directly used for NGD (6) since y k is not the integral value of the control input and cannot be simply reset to zero when the reset condition holds.However, the reset scheme can be applied for its equivalent form (11), where part η z−λ can be viewed as the controller, part can be viewed as the controlled system and y k is the integral value of the control input.Then, reset NGD (11) can be designed as Algorithm 2, where y k is reset to zero when the reset condition holds.
Proof.The reset condition (12) implies that the function value is constantly decreasing until the condition holds.At the step when the reset condition holds, reset NGD is reduced to the conventional GD with the learning rate (1 + λ)η, and the function value is guaranteed to be decreasing for the learning rate 0 < (1 + λ)η < 2 l+µ according to the existing results [34].Then, similar to the results of restart NGD in [29], the linear convergence speed can be proven.

•
If the learning rate is set sufficiently small, ∇ T f (x k+1 )(x k+1 − x k ) > 0 may never happen, and reset AGDs will then reduce to the conventional AGDs, which indicates the same convergence speed as the conventional AGDs.

•
For conventional AGDs, parameter tuning is difficult since either an excessive or insufficient learning rate will result in an inadequate performance.However, for reset AGDs, the overshoot introduced by the momentum item will be significantly attenuated, and one can tune the learning rate in the same way as the conventional GD, which simplifies the parameter tuning.

•
The proposed design framework can also be applied for many other GDs/AGDs, which can be formulated as a high-order feedback system.

FTGD with an Adaptive Learning Rate
Fixed-time convergence is said to reach the minimum point in a fixed time and is maintained on the minimum point thereafter.Different from the results in [19,22], a novel adaptive fixed-time GD was designed, and its discrete form will be given, which has fewer tuning parameters and maintains better robustness to initial conditions.Moreover, since the FTGD can be viewed as a special second-order system, the reset scheme can be applied to improve the performance.Lastly, reset FTGD will be discussed.
Theorem 1.For a µ-gradient-dominated convex function f (x), FTGD (13) reaches the minimum point in a fixed time T r ≤ Proof.Take the energy function as V f = f (x) − f * , and one has that By introducing P(t) ≥ 0 and Q(t) ≥ 0, it is obtained that The corresponding Laplace transform of ( 14) is where V f (s), θ(s), P(s) and Q(s) are the corresponding Laplace transform of V f (t), θ(t), P(t) and Q(t).Solving (15) gives Vf (s) = (s + λ) Vf (0) − P(s) Since ˙V f ≤ 0 and Vf (t) ≥ 0, we only need to prove that V f (t) could reach zero in a fixed time.To simplify the expression, which is a positive real number according to the condition λ 2 < 8ηµ 2 α .Performing inverse Laplace transform on both sides of ( 16) results in The first positive zero t 0 of function g(t) := cos(ωt) + λ ω sin(ωt) must be smaller than π/ω.Then, g(t) ≥ 0 and sin(ωt) ≥ 0 hold for any 0 < t ≤ t 0 ≤ π/ω.Combined the fact that P(t) ≥ 0 and Q(t) ≥ 0, the following inequalities hold −e − λ 2 t g(t) * P(t) < 0, and Then the following inequality can be derived Since function g(t) = cos(ωt) + λ ω sin(ωt) must have a zero in half cycle, then Vf (t) must reach zero within π/ω.Combined with the fact that ˙V f (t) ≤ 0 and Vf (t) ≥ 0, it is known that Vf (t) will reach and stay at zero in a finite time.Moreover, the convergence time is shorter than π/ω, which is determined by the frequency ω and indicates the fixed-time convergence.
Remark 2. Some comments on Theorem 1 are given as follows

•
Since the convergence time is determined by , one can set α = 1.0 and tune λ, η to achieve a desirable convergence time for the practical usage.Moreover, the proposed FTGD has fewer tuning parameters compared with the results in [22].

•
The parameter λ in algorithm ( 13) is used to attenuate the value of θ after reaching the minimum point.It is quite useful when realizing algorithm (13) in its discretization form.When λ = 0, the attenuating item e − λ 2 t will disappear during the proof process of Theorem 1, and the conclusion for fixed-time convergence still holds.

•
To avoid singularity, an additional positive scalar can be introduced for practical usage, and algorithm ( 13) can be modified as where δ > 0 is a small scalar.By using such a replacement, algorithm (18) can guarantee a fixed-time convergence to the bounded region of minimum point with ∇ f (x) α ≤ δ.

Euler Discretization of FTGD
In order to apply the FTGD in practice, we will discuss the discrete form of FTGD ( 13) in this subsection.By using the Euler-Maruyama discretization with step size γ > 0, the discrete form of FTGD ( 13) with α = 1 can be derived as To simplify the following discussion, we will ignore the parameter relationship of ( 13) and (20) and reformulate GD (20) as where 0 < λ < 1 and η > 0.
Proof.Consider the following Lyapunov function and one has that If ρ < 1−λ 2 l , then V k+1 − V k ≤ 0, and algorithm ( 21) is asymptotically convergent according to the Lyapunov theorem.

Reset FTGD
Similar to the aforementioned AGDs, FTGD (21) accelerates the convergence speed since the learning rate is always larger than the conventional GD.However, the undesirable overshoot will exist around the minimum point.Then, the reset scheme can be applied to attenuate the overshoot and improve the convergence performance.Different from the aforementioned AGDs, FTGD (21) cannot be directly viewed as a second-order system, but θ k can be treated as a special control input and set to be zero if the reset condition holds.Reset FTGD can then be formulated in Algorithm 3. Theorem 3.For an l-smooth and µ-strongly convex function f (x), the convergence speed of reset FTGD in Algorithm 3 is linear with 0 < η < 2 l+µ .
Proof.Suppose the reset operation happens at steps k 1 , k 2 , k 3 • • • .If linear convergence is proven between any successive two resets, then the proof is completed.In the following, suppose k 1 ≤ k < k 2 with start point x k 1 and θ k 1 = 0. Additionally, condition ∇ T f (x k+1 )(x k+1 − x k ) ≤ 0 holds since there is no reset during this period.Since f (x) is u-strongly convex, it can be proven that [35]

end if end for
Moreover, it is straightforward that On this basis, the following inequality holds Moreover, since f (x) is u-strongly convex and ∇ T f (x k+1 )(x k+1 − x k ) ≤ 0, one has Summing up both sides of (26) from k 1 to k 2 yields Furthermore, it is concluded that Since k 2 − k 1 + 1 must be a finite number, otherwise the linear convergence speed can be followed according to the existing analyses, the mean convergence speed for k 1 ≤ i ≤ k 2 − 1 can be defined as which indicates a linear convergence.Similar analyses can be applied for any other two successive reset steps.Additionally, the reset condition (12) implies that the function value decreases constantly until the condition holds.At the step when the reset condition holds, reset MGD is reduced to the conventional GD, and the function value decreases for the learning rate 0 < η < 2 l+µ , which indicates the stability of the reset FTGD.

Remark 3. •
During the proof process for the linear convergence, the l-smooth property is not required; only the µ-strongly convex property is used.Moreover, condition (23) is exactly the gradient-dominated property.

•
As known, for an l-smooth and µ-strongly convex function, the convergence speed for conventional GD can be proven to be linear.The result shown in Theorem 3 is the worst case.Generally, the reset FTGD converges to the minimum point much faster than the conventional GD since the adaptive learning rate is always greater than η.

•
Compared with the results in Theorem 2, the stable region of the learning rate for reset FTGD is larger than FTGD without the reset scheme, which improves the stability of FTGD.

•
The proposed reset scheme can be applied for other existing optimization algorithms with an adaptive learning rate, such as Adagrad and Adadelta algorithms.

•
The reset scheme and fixed-time scheme in system control have been introduced to design high-efficiency GDs for unconstrained convex optimization problems.On the one hand, a general design framework for reset AGDs is given for the first time by using the systematic representation.On the other hand, a novel FTGD with an adaptive learning rate was designed, which has a simpler structure and fewer tuning parameters.

•
The proposed algorithms could improve the performance of existing GDs in both convergence rate and stability, where the reset scheme helps attenuate the undesirable overshoot and improve the stability of AGDs, and the fixed-time scheme helps to achieve the non-asymptotic convergence and reach the optimal point in a fixed time.

•
The proposed algorithms could be effectively applied for practical usages such as machine learning/deep learning problems.Some instructions for parameter tuning are also given for better practical implementations.

Illustrative Examples
In this section, we will validate the fixed-time convergence of FTGD (13) and compare the convergence performance for reset AGDs and reset FTGD.The simulator is MATLAB R2018b, and the simulation step was chosen to be the fixed step with 1 × 10 −5 .

•
For different initial conditions, fixed-time convergence is achieved by both FTGDs and is smaller than the estimated upper bound (1 s).

•
Compared with FTGD in [22], the proposed FTGD ( 13) not only has fewer tuning parameters but also maintains better robustness to initial conditions where the convergence time is almost the same for different initial conditions as shown in Figure 3a.

•
As we have declared before, the exact fixed-time convergence to the minimum point cannot be achieved since FTGD ( 13) is singular when ∇ f (x) = 0, and algorithm (18) can then be applied.13) and FTGD in [22]: (a) convergence results for FTGD (13) (b) convergence results for FTGD in [22].
Example 2. In this example, we will compare the convergence results for different reset AGDs.
Consider the quadratic convex function f (x) = n ∑ i=1 a i x 2 i , a i > 0 and take a i = i for simplicity.Firstly, we will compare the convergence results for reset AGDs.When simulating, set n = 10, η = 0.01 and λ = 0.8, and initial conditions are randomly assigned.Results are shown in Figure 4.It is found that for a small learning rate η (η = 0.005 and η = 0.01) that reset MGD and NGD perform similarly, while for a large learning rate (η = 0.05), reset NGD performs much better.However, the stability of reset NGD is worse as shown in Figure 4d, where reset NGD has already reached divergence.
Next, we will compare the performance for reset FTGD with different parameter settings.The results are shown in Figure 5.

•
Unlike the conventional AGDs where λ has to be between 0 and 1, the stability can still be guaranteed for λ ≥ 1.Moreover, for a different learning rate η, a larger λ always performs better, and thus λ = 1 is a good choice for practical usage.

•
For a large learning rate (η = 0.01 and η = 0.05), reset FTGD with different λ performs similarly since the reset condition is established almost constantly.Thus, reset FTGD totally reduces to the conventional GD.
Example 3. In this example, we will consider a special log-sum-exponential function f (x) = log n ∑ i=1 e a i x T x , where a i > 0 is randomly assigned in the simulation.When simulating, λ for reset MGD and NGD is set to 0.9, while λ for reset FTGD is set to 1.0.Results are shown in Figure 6 and it is observed that • For different learning rates, reset FTGD always converges the fastest, while reset MGD and reset NGD perform similarly.

•
For a small learning rate (η = 0.001), fixed-time convergence of discrete FTGD can be observed in Figure 6a (sharp decaying around k = 100), which is similar to the result shown in Figure 3b.It is well understood since FTGD (21) has a similar simulation results to its corresponding continuous-time FTGD (20) when the learning rate is sufficiently small.

•
Monotone convergence cannot be guaranteed in this example since the mentioned target function is neither l-smooth nor strongly convex, and f (x) cannot be guaranteed to be decreasing when the reset condition holds.

Figure 1 .
Figure 1.Reset control diagram for a second-order system.

Figure 2 .
Figure 2. System responses with/without reset scheme.