On Application of the Ray-Shooting Method for LQR via Static-Output-Feedback

In this article we suggest a randomized algorithm for the LQR (Linear Quadratic Regulator) optimal-control problem via static-output-feedback. The suggested algorithm is based on the recently introduced randomized optimization method called the Ray-Shooting Method that efficiently solves the global minimization problem of continuous functions over compact non-convex unconnected regions. The algorithm presented here is a randomized algorithm with a proof of convergence in probability. Its practical implementation has good performance in terms of the quality of controllers obtained and the percentage of success.


Introduction
Let a continuous-time system be given by ẋ (t) = Ax (t) + Bu (t) where A ∈ R p×p , B ∈ R p×q , C ∈ R r×p and x, u, y are the state, the input and the measurement, respectively.We assume that (A, B) and A T , C T are controllable (see [1] for a new reduction from the case where (A, B) and A T , C T are only stabilizable).Let denote the cost functional, where Q > 0 and R ≥ 0. Assuming that x 0 = x (0) is given, the LQR problem is to attenuate the disturbance x 0 using minimal control cost, i.e., to design a regulation input u (t) that minimizes J (x 0 , u), subject to the constraints given by (1).Let u = −Ky be the static-output-feedback (SOF) with the closed-loop matrix A c (K) := A − BKC.Let C − denote the left-half plane, let α > 0 and let C α denote the set of all z ∈ C with (z) ≤ −α, where (z) is the real part of z.Let S q×r denote the set of all matrices K ∈ R q×r , such that A c is stable, i.e., σ (A c ) ⊂ C − (where σ (A c ) is the spectrum of A c ).By S q×r α , we denote the set of all matrices K ∈ R q×r , such that σ (A c ) ⊂ C α .In this case, we say that A c is α-stable.Below, we will occasionally write S α instead of S q×r α , when it is clear what the size of the related matrices is.
Let K ∈ S q×r α be given.Substitution of u = −Ky = −KCx into (2) gives Since Q + C T K T RKC > 0 and since A c (K) is stable, it follows that the Lyapunov equation Algorithms 2018, 11, 8; doi:10.3390/a11010008www.mdpi.com/journal/algorithms has a unique solution P > 0, given by where vec puts all the columns of the given matrix in a single column and mat is the inverse of vec.
Let us denote the solution (5) as P (K).Substitution of (4) into (3) and noting that ẋ (t) = A c (K) x (t) with A c (K) stable, leads to Thus, we look for K ∈ S q×r α that minimizes the functional J (x 0 , K) = x T 0 P (K) x 0 .When x 0 is unknown, we seek K ∈ S q×r α for which is minimal.In this case, we get a robust LQR via SOF, in the sense that it minimizes J (x 0 , K) for the worst possible (unknown) x 0 .Note that and that there exists x 0 = 0 for which equality holds.Therefore J(x 0 ,K) x 0 2 ≤ σ max (K), where equality holds in the worst case.
Note that the functionals J (x 0 , K) and σ max (K) are generally not convex since their domain of definition S q×r (and therefore S q×r α ) is generally non-convex.Necessary conditions for optimality were given as three quadratic matrix equations in [2][3][4][5].Necessary and sufficient conditions for optimality, based on linear matrix inequalities (LMI), were given in [6][7][8].However, algorithms based on these formulations are generally not guaranteed to converge, seemingly because of the non-convexity of the coupled matrix equations or inequalities, and when they converge, it is to a local optimum only.
The application of SOFs in LQRs is appealing for several reasons: they are reliable and cheap, and their implementation is simple and direct.Moreover, the long-term memory of dynamic-feedbacks is useless for systems subject to random disturbances, to fast dynamic loadings or to impulses, and the application of state-feedbacks is not always possible, due to unavailability of full-state measurements (see [9], for example).On the other hand, in practical applications, the entries of the needed SOFs are bounded, and since the problem of SOFs with interval constrained entries is NP-hard (see [10,11]), one cannot expect the existence of a deterministic polynomial-time algorithm to solve this problem.Randomized algorithms are thus natural solutions to this problem.The probabilistic and randomized methods for the constrained SOF problem and robust stabilization via SOFs (among other hard problems) are discussed in [12][13][14][15].The Ray-Shooting Method was recently introduced in [16], where it was utilized to derive the Ray-Shooting (RS) randomized algorithm for the minimal-gain SOF problem with regional pole-assignment, where the region can be non-convex and unconnected.For a survey of the SOF problem see [17] and for a recent survey of the robust SOF problem see [18].
The contribution of this research is as follows: 1.The suggested algorithm is based on the Ray-Shooting Method (see [16]), which, as opposed to smooth optimization methods, has the potential of finding a global optimum of continuous functions over compact non-convex and unconnected regions.2. The suggested algorithm has a proof of convergence (in probability) and explicit complexity.3. Experience with the algorithm shows good quality of controllers, high percent of success and good run-time for real-life systems.Thus, the suggested practical algorithm efficiently solves the problem of LQR via SOF. 4. The algorithm does not need to solve any Riccati equations and thus can be applied to large systems.5.The suggested algorithm is one of the few that deals with LQR via SOF and has the ability to deal with discrete-time systems under the same formulation.
The reminder of the article is organized as follows: In Section 2, we introduce the practical randomized algorithm for the problem of LQR via SOF.In Section 3, we give the results of the algorithm for some real-life systems and we compare its performance with the performance of a well known algorithm that has a proof of convergence to local minimum (under some reasonable assumptions).Finally, in Section 4 we conclude with some remarks.

The Practical Algorithm for the Problem of LQR via SOF
Assume that K (0) ∈ int (S α ) was found by the RS algorithm (see [16]) or by any other method (see [19][20][21]).Let h > 0 and let U (0) be a unit vector w.r.t. the Frobenius norm, i.e., U (0) and let L be the hyperplane defined by L (0) + V, where V, U (0) , where B (0, ) denotes the closed ball centered at 0 with radius (0 < ≤ 1  2 ), with respect to the Frobenius norm on R q×r .Let D (0) = CH K (0) , R ∞ ( ) denote the convex-hull of the vertex K (0) with the basis R ∞ ( ).Let S (0) α = S α ∩ D (0) and note that S (0) α is compact (but generally not convex).We wish to minimize the continuous function σ max (K) (or the continuous function J (x 0 , K), when x 0 is known) over the compact set S α ∩ B K (0) , h .Let K * denote a point in S α ∩ B K (0) , h where the minimum of σ max (K) is accepted.Obviously, K * ∈ D (0) , for some direction U (0) , as above.
The suggested algorithm in Algorithm 1 works as follows: We start with a point K (0) ∈ int (S α ), found by the RS algorithm.
Assuming that K * ∈ D (0) , the inner-loop (j = 1, . . ., n) uses the Ray-Shooting Method in order to find an approximation of the global minimum of the function σ max (K) over S (0) α -the portion of S α bounded in the cone D (0) .The proof of convergence in probability of the inner-loop and its complexity (under the above mentioned assumption) can be found in [16] (see also [22]).In the inner-loop, we choose a search direction by choosing a point F in R ∞ ( )-the base of the cone D (0) .Next, in the most inner-loop (k = 1, . . ., s) we scan the ray K (t) := (1 − t) K (0) + tF and record the best controller on it.Repeating this a sufficient number of times (as is given in (7) and in the discussion right after), we reach K * (or an -neighborhood of it) with high probability, under the assumption that K * ∈ D (0) .The outer-loop (i = 1, . . ., m) is used as a substitution for restarting the RS algorithm again and again, by taking K (best) as the new vertex of the search cone instead of K (0) and by choosing a different direction U (0) .The choice of a different direction is made as a backup to the case where the above mentioned assumption didn't hold in the previous iterations (see Remark 1 below).The replacement of K (0) by K (best) can be considered as a heuristic step, which is made instead of running the RS algorithm many times in order to generate "the best starting point", which is relevant only if we actually evaluate σ max (K) on each such point and take the point with the best value as the best starting point.Since we, in any case, evaluate σ max (K) in the main algorithm, we could avoid the repeated execution of the RS algorithm.The outer-loop is similar to what is done in the Hide-And-Seek algorithm (see [23,24]).The convergence in probability of the Hide-And-Seek algorithm can be found in [25].
The complexity of the suggested practical algorithm measured as the number of its arithmetic operations is given as follows: • computing the matrix P (K (t)) as in ( 5) takes O p 2 3 = O p 6 , since the dominant operation is the inversion of the p 2 × p 2 matrix there.

•
checking K (t) ∈ S α by checking the α-stability of A cl (K (t)) (as well as computing σ (K (t))), takes O p 3 for computing the characteristic polynomial of A c (K (t)) (of P (K (t)), respectively) and O p log 2 2 (p) log 2 2 (p) + log 2 2 (b) for computing approximations λj for all the eigenvalues λ j , j = 1, . . ., p of A c (K (t)) (of P (K (t)), respectively), with accuracy λj − λ j < 2 p, by the algorithm of V. Y. Pan (see [26]).We end up with O p 3 for these operations.

•
computing uniformly distributed q × r matrix takes O max (q, r) 3 operations.
We therefore have a total complexity of O mns max (q, r) 3 + p 6 .
Let a closed -neighborhood of K * in D (0) be defined by Let the idealized algorithm be the algorithm that samples the search space D (0) until hitting S (0) α ( ), where the sampling is according to a general p.d.f.g and a related generator G.For 0 < β < 1, the number of iterations needed to guarantee a probability of at least 1 − β to hit S (0) where Vo denotes the volume of the related set and M g , m g are the essential-supremum and essential-infimum of g over D (0) , respectively (see [16]).Similarly to what is done in [16], one can show that the last is O , where r is a radius of a ball of a basis of a cone with height and vertex K * that has a volume that equals to Vo S (0) α ( ) .This results in an exponential number of iterations, but if we restrict the input of the algorithm to systems with q, r satisfying q ≤ q 0 , r ≤ r 0 when q 0 , r 0 are fixed, then, the number of iterations would be O |ln(β)|hM g m g r ∞ r q 0 r 0 , i.e., polynomial in r ∞ r -which can be considered as the true size of the problem (for a fixed p, q, r).In this sense we can say that the algorithm is efficient.The total number of arithmetic operations of the idealized algorithm that guarantees a probability at least 1 − β to hit S (0) α ( ) is therefore given by O |ln(β)|h r ∞ r q 0 r 0 max (q, r) 3 + p 6 , since sampling points according to the uniform distribution g (and therefore m g = M g = 1) and the related generator G, takes O max (q, r) 3 .
For the sake of comparison, which will be presented in the next section, we bring here, in Algorithm 2, the algorithm of D. Moerder and A. Calise (see [5]) adjusted to our formulation of the problem, which we call: the MC Algorithm.To the best of our knowledge, this is the best algorithm for LQR via SOF published so far.

Algorithm 1:
The practical randomized algorithm for the LQR via static-output-feedback (SOF) problem.

Experiments
In the following experiments we applied the Algorithm 1 and Algorithm 2, on systems taken from the liberaries [27][28][29].We took only the systems with controllable (A, B) , A T , C T pairs, for which the RS algorithm succeeded in finding SOFs (see [16], Table 8, p. 231).In order to initialize the MC Algorithm, we also used the RS algorithm to find a starting α-stabilizing static-output-feedback, for known optimal value for α.In all the experiments for the suggested algorithm we used m = 100, n = 100, s = 100, h = 100, r ∞ = 100, = 10 −16 , and for the MC Algorithm we used m = 10,000, s = 100 (in order to get the same number of 10 6 overall iterations and the same number s = 100 of iterations for the local search).In every case, we took Q = I p , R = I q .The Stability Margin column of Table 1 relates to α > 0 for which the real part of any eigenvalue of the closed-loop is less than or equal to −α.The values of α in Table 1 relates to the largest α for which the RS algorithm succeeded in finding K (0) .The reason is that if λ is an eigenvalue of A c (K) with corresponding eigenvector v then, ( 4) implies It follows that minimizing σ max (K) results in a larger abscissa.Thus, it is worth searching for a starting point K (0) that maximizes the abscissa α.This can be done efficiently by running a binary search on the abscissa and using the RS algorithm as an oracle.Note that RS CPU time appearing in the third column of Table 1 relates to running the RS algorithm for known optimal value of the abscissa.The RS algorithm is sufficiently fast also for this purpose, but other methods such as the HIFOO algorithm (see [19]) can be applied for this purpose as well.
Let σ max (F) denote the functional (6) for the system A, B, I p , where A − BF is stable, i.e., F ∈ S q×p .Let P (F) denote the unique solution of (4) for the system A, B, I p with F as above.Let σ max (K) denote the functional (4) for the system (A, B, C) with K ∈ S q×r and related Lyapunov matrix P = P (K).Now, if A − BKC is stable for some K then, A − BF is stable for F = KC (but there might exist a F such that A − BF is stable, but which cannot be defined as KC for some q × r matrix K).Therefore where F * is an optimal LQR state-feedback controller for the system A, B, I p .We conclude that best) .Thus, σ max (F * ) is a lower-bound for σ max K (best) and can serve as a good estimator for it (as is evidently seen from Table 1 in many cases) in order to stop the algorithm earlier, since σ max (F * ) can be calculated in advance.
The fourth column of Table 1 represents σ max K (0) .The fifth column stands for σ max K (best) , where the number in the parentheses is the relative improvement over σ max K (0) , in percent.The sixth column is for σ max (F * ) and the seventh column is for the CPU time of the suggested algorithm, given in seconds.The eighth column stands for σ max K (best) found by the MC Algorithm and finally, the ninth column stands for the CPU time of the MC Algorithm.
The MC Algorithm seems to perform better locally, while the suggested algorithm seems to perform better globally.Thus, practically, the best approach would be to apply the suggested algorithm in order to find a close neighborhood of a global minimum and then to apply the MC Algorithm on the result, for the local optimization.

Concluding Remarks
For a discrete-time system and cost functional Since Q + C T K T RKC > 0 and since A c (K) is stable, it follows that the Stein equation has a unique solution P > 0, given by Substitution of ( 12) into (11) and noting that x k = A c (K) k x 0 with A c (K) stable, leads to x T 0 A c (K) Tk P − A c (K) T PA c (K) A c (K) k x 0 = x T 0 P (K) x 0 .
Thus, we look for K ∈ S q×r α that minimizes the functional J (x 0 , K) = x T 0 P (K) x 0 , and when x 0 is unknown, we seek K ∈ S q×r α for which σ max (K) := max (σ (P (K))) is minimal.Similarly to the continuous-time case, we have Finally, if λ is an eigenvalue of A c (K) and v is a corresponding eigenvector then (12) yields 1 − |λ| 2 = v * (Q+C T K T RKC)v v * P(K)v ≥ v * Qv v * P(K)v .Therefore |λ| 2 ≤ 1 − v * Qv v * P(K)v , and thus, minimizing σ max (K) results in larger abscissa.Now, the suggested algorithm can be readily applied to discrete-time systems.As to the MC Algorithm, we are not aware of any discrete-time analogue of it.
We conclude that the Ray-Shooting Method is a powerful tool, since it practically solves the problem of LQR via SOF, for real-life systems.The suggested algorithm has good performance, and is proved to converge in probability.The suggested method can similarly handle the problem of LQR via SOF for discrete-time systems.Obviously, this enlarges the scope and usability of the Ray-Shooting Method and we expect to receive more results in this direction.

Algorithm 2 :
The MC Algorithm.
let u k = −Ky k be the SOF, and let A c (K) := A − BKC be the closed-loop matrix.Let D denote the open unit-disk, let 0 < α < 1 and let D α denote the set of all z ∈ D with |z| ≤ 1 − α (where |z| is the absolute value of z).Let S q×r denote the set of all matrices K ∈ R q×r such that σ (A c ) ⊂ D (i.e., stable in the discrete-time sense), and let S q×r α denote the set of all matrices K ∈ R q×r such that σ (A c ) ⊂ D α .In this case we say that A c is α-stable.Let K ∈ S q×r α be given.Substitution of u k = −Ky k = −KCx k into (10) yields J (x 0 , K) := C T K T RKC x k .

Table 1 .
Results of the suggested randomized algorithm for LQR via SOF compared with the results of the MC Algorithm.