On the Locally Polynomial Complexity of the Projection-Gradient Method for Solving Piecewise Quadratic Optimisation Problems

This paper proposes a method for solving optimisation problems involving piecewise quadratic functions. The method provides a solution in a finite number of iterations, and the computational complexity of the proposed method is locally polynomial of the problem dimension, i.e., if the initial point belongs to the sufficiently small neighbourhood of the solution set. Proposed method could be applied for solving large systems of linear inequalities.


Introduction
Let us consider the following optimisation problem: where c + := max{c, 0}, A is an m × n matrix , A = a ij , x ∈ R n , x = {x i }, b ∈ R m , b = b j , i = 1, . . . , n, j = 1, . . . m and · is the Euclidean norm of R n . In this paper, a method for solving the problem in (1) is proposed; moreover, the number of iterations (equivalent to the computational complexity) required by the proposed method with respect to m and n is locally polynomial, and in the worst-case scenario, it has a geometric convergence rate.
Let us define the set of solutions of (1) as follows If some point sufficiently close to the set X * of solutions to (1) is known, then it is possible to find a solution of (1) within a polynomial number of computational iterations; thus, the computational complexity is of the order of O(m 3 · n 3 ). Many methods for solving (1) have been proposed (cf. Karmanov [1], Golikov and Evtushenko [2], Evtushenko and Golikov [3], Tretyakov [4], Tretyakov and Tyrtyshnikov [5] and Han [6]). All of these methods have reasonable computational complexity but, as mentioned above, to date, no strongly polynomial-time algorithm for solving (1) has been proposed. In studies by Tretyakov and Tyrtyshnikov [7] and Mangasarian [8], linear programming problems were solved by reducing them to the unconditional minimisation of strongly convex piecewise quadratic functions. A solution is obtained within a finite polynomial number of iterations if the starting point of the algorithm belongs to a sufficiently close neighbourhood of the unique solution to the problem. Unfortunately, the authors imposed severe limitations on the functions to be minimised: they should be strongly convex, the eigenvalues of the Hessian matrices should satisfy specific conditions, etc.
These results create significant limitations on the class of problems that can be solved: it is required that (1) has only one unique solution, etc. The solution method described by Tretyakov and Tyrtyshnikov [7] is based on exploiting information about the problem being solved by analysing a sufficiently small neighbourhood of an arbitrary solution of (1). Analogous methods were proposed by Facchinei et al. [9] for the forecasting (identification) of the active constraints in a sufficiently close neighbourhood of the solution to the problem. In papers by Tretyakov and Tyrtyshnikov [5] and Wright [10], locally polynomial methods for solving quadratic programming problems based on similar ideas were presented. Tretyakov [4] proposed the gradient projection method for solving (1); this method involves finding a solution of (1) in a finite number of iterations and is a combination of iterative and straightforward (e.g., Gaussian) methods.
This paper proposes a computational method for solving (1). When the starting point of the proposed method is sufficiently close to the set X * of solutions to (1), then its computational complexity is locally polynomial, i.e., it is of the order of O(m 3 · n 3 ).
We point out that solving a system of linear inequalities involves where the 0 m -m-dimensional vectors of zeroes can be reduced to solve the problem (1). This means that the number of computations required for establishing a solution (if a given system of linear inequalities has one) is locally polynomial. Let us denote It is obvious that the set X might be empty in general, but our method, presented in this paper, either determines this situation in a locally polynomial number of computations or provides a solution to the system (3). The proposed method could be applied when solving large systems of linear inequalities, which appear in many practical, industrial applications, e.g., the simplex method (Pan [11]), Karmarkar's method (Wright,[12]), Chubanov's method (Roos [13]), and Fourier-Motzkin elimination method (Khachiyan [14], I. Šimeček, R. Fritsch, D. Langr, R. Lórencz [15]).

Definitions and Theoretical Results
Let Theorem 1. The function ϕ(x) is convex and has a nonempty set of minimal values Proof. Theorem 1 follows immediately from the well-known features of quadratic-type convex functions (see, e.g., [16]).
It is obvious that the elements x * ∈ X * , cf. (6) satisfy where a T i is the ith row of matrix A.
Therefore, in the general case, our goal is to solve the following equation In the sequel, x * stands for an arbitrary element of X * (the minimum point of ϕ). If the minimum value of ϕ is equal to zero, then X = X * , and if the minimum value of ϕ is positive, then X = ∅. Let us denote where f i (x) is introduced to simplify the definitions of the sets J 0 (x) and J + (x). According to (7) and the above notations, x * should satisfy the formula The formula (10) is equivalent to a condition that should be satisfied at point x * In (11), it is considered that This, in turn, means that, in the general case, we should solve the following equations Without loss of generality, we may denote The main idea exploited in this paper is based on the following Lemma. For Lemma 1. Let x * be a solution to the problem (1). Then, there exists ε > 0, such that for any By virtue of the above lemma, in a sufficiently small neighbourhood of some fixed point x * ∈ X * , for everyx ∈ U ε (x * ), the following hold Now, our goal is to correctly define the sets J 0 (x * ) and J + (x * ) based on the information gained at pointx ∈ U ε (x * ). Let us denotē Let A(x) and b(x) represent the matrix and vector obtained from A and b, respectively. The rows of A(x) and the coefficients of b(x) correspond to the index set, which is defined by J 0 (x) ∪ J + (x). In this case, Equations (12)-(13) may be rewritten as LetĀ(x) denote the matrix in the equations in (14) corresponding to the maximum set of linearly independent rows, and let b(x) denote the corresponding vector of constant terms in (14). The equations in (14) may be reformulated in the following waȳ Let us observe that, at point x * , the following holds This, in turn, means thatĀ and a j , If the rank of a matrix B of size r × n is equal to r, then the pseudoinverse matrix (operator) B + may be defined as B + := B T · (B · B T ) −1 . We denote the quadratic matrix n × n orthogonally projected on the space containing the rows of matrix B as B T II := B T B · B T −1 · B = B + · B, and its projection on the orthogonal complement of the matrix is denoted as B T ⊥ := I − B T II , where I is an all-ones matrix of size n × n. Let a point z(x) be the projection of pointx on the set M(x). Let us observe that x * ∈ M(x) ifx ∈ U ε (x * ) and ε is sufficiently small.
Moreover, if the constraints at point z(x) are f i (z(x)) ≤ 0 for a certain i ∈ J + (x), then we define the set I − in the following way Otherwise, if the constraints at point z(x) are f i (z(x)) ≥ 0 for a certain i ∈ J − (x), we define the set I + in an analogous way Next, we project pointx on the new set M(x), cf. (18), and a new point z(x) is obtained. Let define the operator for the projection of point x on set M(x).

Algorithm for Finding the Solution of (1)
In this section, the algorithm designed to find the solution to (1) is presented. The main idea of this algorithm is based on information related to a current pointx belonging to a sufficiently small neighbourhood of the point x * ∈ X * . We also demonstrate how to find such a point. The proposed method comprises two algorithms. The starting point of the method can be arbitrary, because Algorithm 2 (gradient method with a special step selection) starts at an arbitrary point and, on a certain iteration, it will provide a point arbitrarily close to the solution set. Therefore, Algorithm 1 could start at the point specified by Algorithm 2.

Algorithm 1. Initialisation
Step: For the current pointx, the sets of indices J 0 (x), J − (x) and J + (x) are defined according to (9). If set J + (x) = ∅, thenx is the solution of (1) and Algorithm 1 is terminated. Otherwise, the Main Recursive Step is performed.

Main Recursive
Step: Let z(x), the projection of pointx on the set M(x), be defined according to (20). We check if the following condition is satisfied I + = ∅ and I − = ∅. (21)

Checking
Step: If (21) holds, then z(x) ∈ X * , and equation (10) is satisfied; z(x) is the solution of (1), as defined in (2), and Algorithm 1 is terminated. Otherwise, if for certain values of i ∈ D the condition (21) is violated and i ∈ I + ∪ I − , we define J 0 (x), J + (x) and J − (x) according to (19), M(x) is redefined according to (18), and the Main Recursive Step is repeated.
The set D is finite, and |D| = m; therefore, the number of changes to the index sets J 0 (x), J + (x) and J − (x) does not exceed m, and finally, the point z(x) fulfilling (12) is established. This means that z(x) is the solution of (1), as defined in (2).
It is of utmost importance thatx belongs to a sufficiently small neighbourhood of the point x * , because otherwise, z(x) may not satisfy (12). If this is not the case, it is necessary to find another pointx that is closer to x * . The process for accomplishing this is described below.

Theorem 2.
For a sufficiently small ε > 0 and for everyx ∈ U ε (x * ), Algorithm 1 provides z * = z(x) as the solution for and this is equivalent to finding the solution for (12) within a number of iterations of the order of 0(m 3 · n 3 ).
Proof. The proof is based on the observation that forx belonging to a sufficiently small neighbourhood of the point x * , according to Lemma 1, the constraints f i (x) ≥ 0 correspond to constraints f i (x * ) ≥ 0. Therefore, Let us determine z(x) as the projection of the pointx on the set M(x), which is defined according to (18). It may happen that the set J 0 (x) becomes enlarged. However, the number of iterations required when J 0 (x) becomes enlarged does not exceed m, the number of elements in the set D. Therefore, at some iteration, (21) is satisfied. This means that z(x) satisfies (12) or, equivalently, ϕ (z(x)) = 0 n . This demonstrates that z(x) is the solution for (1), as defined in (2). The computational complexity of establishing each projection z(x) is of order 0(m 2 · n 3 ); this process takes the computational effort related to the multiplications of matrices into account. The number of iterations does not exceed m and, therefore, the overall computational complexity is of order 0(m 3 · n 3 ).
To complement the presentation of this chapter, the gradient method for establishinḡ x belonging to the sufficiently small neighbourhood U ε (x * ) of some fixed solution x * ∈ X * to (1) is described. This gradient method has the following scheme where α = 1 L and gradient ϕ (x k ) fulfils the Lipschitz condition The convergence of the gradient method (23) is considered in the following theorem, cf.
Theorem 3. Let x 0 ∈ R n and the sequence {x k }, k = 0, 1, 2 . . ., be constructed according to (23). Then, Proof. The scheme in (23) produces a sequence that converges to a certain x * ∈ X * . Moreover, for every sufficiently small ε > 0, there existsk = k(ε) such that {x k } ∈ U ε (x * ), for all k ≥k. This, in turn, means that at iterationk, the hypothesis of Theorem 2 is satisfied, and we obtain a solution to (1).

Conclusions and Appendix
As previously mentioned, the locally polynomial complexity estimate is valid only if the starting point of the proposed method belongs to a sufficiently small neighbourhood of the set of solutions X * . To reach such a desired point, the gradient method (23) is used. There are accelerated gradient methods (see those of Nesterov [17] and Poliak [18]), but these methods do not guarantee monotonic convergence to a set of solutions X * . The method presented in this paper monotonically converges to a certain point x * , x * ∈ X * . It is obvious that the point x * depends on the initial point x 0 and, therefore, the number of iterations required by the gradient method for entering the proper neighbourhood of point x * depends on the position of the initial point x 0 . Moreover, the ε radius of the neighbourhood of point x * , which the gradient method should reach, is unknown in the general case and depends on the specific problem being considered. However, it appears that we can guarantee a geometric convergence rate for the gradient method (23) while minimising piecewise quadratic functions of the form (5).
Namely, for every strongly convex function ψ(x), the gradient method (23) has a geometric convergence rate, i.e., where c is a constant that is independent of the size of the problem but depends on the initial point x 0 . In the general case, for functions that are not convex in the strongest sense, there is no proof of the geometric convergence of the gradient method (23). However, in the case where the function ϕ(x) is given by (5), it is possible to prove the geometric convergence of the gradient method (23). Let The theorem presented below proves the strong convexity of the function ϕ(x) in the cone of convergence.
Proof. First, it should be pointed out that because the second derivative of the function ϕ(x) has a finite number of points of discontinuityS ∈ R n in every direction, i.e., on the ray x * + λ ·S, there exists σ > 0 such that on the closed interval [x * , x * + σ ·S], the function ϕ(x) has a continuous second derivative that obviously depends onS. Let us assume that the theorem does not hold, i.e., there is not γ > 0, such that (24) holds. This means that for l(x k ) = {x * + β · s k , β ≥ 0} the following must hold or For vector s = lim k→∞ s k , the following condition A T · A · s, s = 0 holds, or, due to the construction of ϕ(x), where β ∈ [0,β],β > 0 is a certain fixed constant. Let x * k be (locally) the projection of x k on the set M(s) ∈ X * . Then, due to s k → s, k → ∞, we have Let us set δ k sufficiently small and consider the points x k+r , r = 1, 2,. . . . Then, according to Theorem 3, we have On the other hand, according to (26), when r → ∞ This is contradictory to (27), and therefore Theorem 5 holds.
Theorem 5 allows for the estimation of the convergence rate of the gradient method (23).

Theorem 6.
Under the assumptions of Theorem 5 for the sequence {x k }, constructed according to (23), the following convergence rates hold ϕ(x k ) − ϕ * ≤ c 1 · τ k and x k − x * ≤ c 2 · τ k 2 , where τ ∈ (0, 1), and c 1 and c 2 > 0; the constants c 1 , c 2 are independent of the value of k but depend on the initial point x 0 .
This proves the first part of (28), while the latter part of (28) follows from the strong convexity of the function ϕ(x) in the cone of convergence.
The conduction computational experiments and comparison of the presented method with other methods in the literature remains a topic for future research.
Funding: This research was funded by the Ministry of Science and Higher Education, grant number 61/20/B.