The algorithm pseudocode is presented in Algorithm 3. To better understand it, the next three subsections are devoted to discussing some delicate aspects of it and of its implementation: unboundedness and infeasibility, complexity and convergence properties, and numerical issues that arise when moving from algorithmic description to software implementation. Similarly to what happens for standard algorithms, it is important to stress that many of the proofs in this section assume Euclidean numbers to be represented by finite sequences of monosemia. Indeed, even if the reference set
defined by Axioms 1–3 admits numbers represented by infinite sequences, it would not be reasonable to use them in a machine and to discuss the algorithm’s convergence. The reasons are two: (i) the algorithm should manage and manipulate an infinite amount of data; (ii) the machine is finite and cannot store all that information. Notice that, at this stage, the focus is not on variable-length representations of Euclidean numbers, as they would slow the computations down [
14]. Fixed-length representations, as the ones discussed in [
11] are, therefore, preferred, because they are easier to implement in hardware (i.e., they are more “hardware friendly”), as recent studies testify [
36].
Algorithm 3 Non-Archimedean predictor–corrector infeasible primal-dual IPM. |
- 1:
procedureNA-IPM(A, b, c, Q, , ) - 2:
Notice that the divergence is dealt with the embedding presented in Section 4.1 - 3:
Therefore the flag of correct termination and the threshold ω are useless here - 4:
Notice also that only , while and Q are Euclidean matrices and vectors - 5:
x, , s = starting_point(A, b, c, Q) - 6:
- 7:
for do - 8:
compute residuals - 9:
- 10:
compute centrality, n is the length of x - 11:
- 12:
compute KKT conditions satisfaction parameters - 13:
- 14:
check convergence on all the (meaningful) monosemia of the aggregated values (Section 4.3) - 15:
if and and then - 16:
primal-dual feasible optimal solution found - 17:
return x, , s - 18:
compute predictor directions solving (3) - 19:
, , predict(A, b, c, Q, , , ) - 20:
keep only the leading monosemia of the gradients (Section 4.3) - 21:
- 22:
compute predictor step size - 23:
- 24:
- 25:
- 26:
estimate σ - 27:
- 28:
- 29:
) - 30:
compute corrector directions solving the corresponding Newton’s system - 31:
, , corrector(A, b, c, Q, ) - 32:
compute new direction - 33:
- 34:
keep only the leading monosemia of the gradients (Section 4.3) - 35:
- 36:
compute step size - 37:
- 38:
- 39:
- 40:
compute target primal-dual solution - 41:
- 42:
Add infinitesimal centrality to the close-to-zero entries (Section 4.3) - 43:
- 44:
return x, , s
|
4.1. Infeasibility and Unboundedness
As stated in
Section 2, one can approach the problem of infeasibility and unboundedness in two different ways: divergence detection at run time or problem embedding. While the first keeps the problem complexity fixed but negatively affects the computation because of norm divergence polling, the second wastes resources by optimizing a more complex problem, which is solved efficiently nevertheless. Therefore, the simpler the embedding is, the lesser it affects the performance.
One very simple embedding, proposed by [
37,
38,
39,
40], consists of the following mapping:
where
and
are two positive and sufficiently big constants. This embedding adds two artificial variables (one to the primal and one to the dual problem) and one slack variable (to the primal). The goal of adding the artificial variables is to guarantee the feasibility of their corresponding problem, while on their own dual this is equivalent to adding one bounding hyperplane to prevent any divergence. From duality theory indeed, if the primal problem is infeasible, then the dual is unbounded and vice versa. Geometrically, the hyperplane slope is chosen considering a particular conical combination of the constraints and the constant term vector (the one with all coefficients equal to 1). If there is any polyhedron unboundedness, the conical combination outputs a diverging direction and generates a hyperplane orthogonal to it; otherwise, the addition of such constraints has no effect.
On the other hand, the constraint intercept depends on the penalizing weights
and
, respectively, for the primal and the dual hyperplane. The larger the weight is, the farther is located the corresponding bound. From the primal perspective instead,
and
act as penalizing weights for the artificial variables of the dual and the primal problem, respectively. The need for this penalization comes from the fact that to make the optimization consistent, the algorithm must be driven towards feasible points of the original problem, if any. By construction, the latter always have artificial variables equal to zero, which means one has to penalize them in the cost function as much as possible in order to force them to that value. More formally, it can be proved that for sufficiently large values of
and
: (i) the enlarged problem is strictly feasible and bounded; (ii) any solution for the larger problem is also optimal for the embedded one if and only if both the artificial variables are zero [
38].
Unfortunately, this idea is unsustainable when moving from theory to practice, i.e., to implementation. Indeed, a good estimate of the weights is difficult to determine a priori, and the computational performance is sensitive to their values [
41]. Trying to find a solution, Lustig [
42] investigated the optimal directions generated by Newton’s step equation when
and
are driven to
∞, proposing a weight-free algorithm based on these directions. Later, Lustig et al. [
43] showed that directions coincide with those of an infeasible IPM, without solving the unboundedness issue actually. When considering a set of numbers larger than
as
, however, an approach in the middle between (14) and the one by Lustig is possible. It consists of the use of
infinitely large penalizing weights, i.e., in a non-Archimedean embedding. This choice has the effect of infinitely penalizing the artificial variables, while from a dual perspective it locates the bounding hyperplanes infinitely far from the origin. For instance, in the case of a standard QP problem, it is enough to set both
and
to
, obtaining the following map
The idea to infinitely penalize an artificial variable is not completely new, it has already been successfully used in the I-Big-M method [
20], previously proposed by the author of this work, even if in a discrete context rather than in a continuous one.
Nevertheless, there is still a little detail to take care of. Embedding-based approaches leverage the milestone theorem of duality to guarantee optimal the solution’s existence and boundedness. A non-Archimedean version of the duality theorem must hold too, otherwise, non-Archimedean embeddings end up being theoretically not well founded. Thanks to the transfer principle, is free from any issue of this kind, as stated by the next proposition.
Proposition 2 (Non-Archimedean Duality). Given an NA-QP maximization problem, suppose that the primal and dual problems are feasible. Then, if the dual problem has a strictly feasible point, the optimal primal solution set is nonempty and bounded. Vice versa is true as well.
Proof. The theorem is true thanks to the transfer principle which, roughly speaking, transfers the properties of standard quadratic functions to quadratic non-Archimedean ones. □
If a generic non-Archimedean QP problem is considered instead, setting the weights to
may be insufficient to correctly build the embedding. Actually, their proper choice depends on the magnitude of the values constituting the problem. Proposition 3 gives a sufficient estimate of them; before showing it, however, three preliminary results are necessary. Lemmas 1–3 address them. All these three lemmata make use of the functions
and
provided in
Section 3 as Definition 4 and 5, respectively. In particular, Lemma 1 provides an upper bound to the magnitude of the entries of the solutions
x of a non-Archimedean linear system
. This upper bound is expressed as a function of the magnitude of the entries of both
A and
b. Furthermore, Lemma 1 considers the case in which the linear system to solve is the dual feasibility constraint of a QP problem, i.e., it has the form
with
satisfying
. Lemmas 2 and 3 generalize Lemma 1 considering corner cases too.
Lemma 1. Let the set of primal-dual optimal solutions Ω be nonempty and bounded. Additionally, let , A has full row rank, and its entries are represented by at most l monosemia, i.e., . Then, any satisfieswhere and . Proof. By hypothesis,
and, therefore,
too. Focusing on the j-th constraint,
, it holds
which implies
The proof for the second part of the thesis is very similar:
where
is such that
and has the form
,
. Now, following the same guidelines used in (15) and (16), one gets
□
Equation (15) may seem analytically trivial, but actually, it underlines a subtle property of non-Archimedean linear systems: the solution can have entries infinitely larger than any number involved in the system itself. As an example, the 2-by-2 linear system below admits the unique solution
. However, each value in the system is finite, i.e., the magnitude of each entry of
A and
b is
:
Notice that Lemma 1 works perfectly here. Indeed, and imply .
Lemma 2. Let either the primal problem be unbounded or be unbounded in the primal variable. Let also , and A satisfy the same hypothesis as in Lemma 1. Then,with I and J as in Lemma 1. Proof. If the primal problem is unbounded, it means that such that , and such that such that . Nevertheless, a relaxed version of Lemma 1 still holds for the primal polyhedron, that is such that and (the request for optimality is missing). According to (14), a feasible bound for the primal polyhedron is , , provided a suitable choice of . Indeed, it can happen that a wrong value for turns the unbounded problem into an infeasible one. This aspect shall be discussed in Proposition 3, which specifies as a function of .
Choice of
apart, the addition of the bound to the primal polyhedron guarantees that
such that it is feasible for the bounded primal problem and
such that
(remember that ξ is the dual variable associated with the new constraint of the primal problem and that
). Following the same reasoning used in Lemma 1, one gets the second part of the thesis
The case in which and unbounded in the primal variable is very similar. Together with the assumption , it means that there are plenty of (not strictly) feasible primal-dual optimal solutions, but there does not exist any with maximum centrality. This fact negatively affects IPMs, since they move towards maximum centrality solutions. Therefore, an IPM that tries to optimize such a problem will never converge to any point, even if there are a lot of optimal candidates. To avoid this phenomenon, it is enough to bound such a set of solutions with the addition of a further constraint to the primal problem (which has also the effect to guarantee the existence of the strictly feasible solutions missing in the dual polyhedron). As a result, the very same considerations applied for problem unboundedness work in this case as well, leading to exactly the result in the thesis of this lemma. □
Lemma 3. Let either the primal problem be infeasible or be unbounded in the dual variable. Let also , and A satisfy the same hypothesis as in Lemma 1. Then,with I and J as in Lemma 1. Proof. In this case such that but such that . Enlarging the primal problem in accordance to (14), one has that () such that . In addition, it holds such that , provided that for some suitable choice of (which shall be discussed in Proposition 3 as well). By analogous reasoning to the ones used in Lemma 1 and 2, the thesis immediately comes.
The case in which and is unbounded on the dual variable works in the same but symmetric way of the complementary scenario discussed in Lemma 2. Because of this, it implies the bounds stated in the thesis, while the proof is omitted for brevity. □
Proposition 3. Given an NA-QP problem and its embedding as defined in (14), a sufficient estimate of the penalizing weights iswith I and J as in Lemma 1. In case then , while implies . Proof. The extension to the quadratic case of Theorem 2.3 in [
38] (proof omitted for brevity) gives the following sufficient condition for
and
, which holds true even in a non-Archimedean context thanks to the transfer principle:
where (
,
,
) is an optimal primal-dual solution of the original problem, if any. A possible way to guarantee the satisfaction of Equation (17) is to choose
and
such that their magnitudes are infinitely higher than the right-hand terms of the inequalities. For instance, one may set
or more weakly
In case Ω is nonempty and bounded, Lemma 1 holds and provides an estimate on the magnitude of both and . In case either the primal problem is unbounded or Ω is unbounded in the primal variable, Lemma 2 applies: the optimal solution is handcrafted by bounding the polyhedron, its magnitude is overestimated by , and (17) gives a clue for a feasible choice of . Similar considerations hold for the case of either primal problem infeasibility or Ω unboundedness in the dual variable, where Lemma 3 is used.
Corner cases are the scenarios where either or . Since implies , the primal problem is either unbounded or with unique feasible (and optimal) point . In both cases, it is enough to set . Since is a feasible solution, in the case of unboundedness it must exist a feasible point with at least one finite entry and no infinite ones because of continuity. In the other scenario, is the optimal solution and, therefore, any finite vector is a suitable upper bound for it. Analogous considerations hold for the case , where a sufficient magnitude bound is . □
4.2. Convergence and Complexity
The main theoretical aspects to investigate in an iterative algorithm are convergence and complexity. Notice that in the case of non-Archimedean algorithms, the complexity of elementary operations (such as the sum) assumes their execution on non-Archimedean numbers, rather than on real ones. Since, theoretically, the NA-IPM is just an IPM able to work with numbers in , one first result on NA-IPM complexity comes straightforwardly thanks to the transfer principle. It is worth stressing that, as usual, Theorem 1 assumes to apply the NA-IPM to an NA-QP problem whose optimal solutions set is non-empty and bounded.
Theorem 1 (NA-IPM convergence). The NA-IPM algorithm converges in , where is the primal space dimension and is the optimality relative tolerance.
Proof. The theorem holds true because of the transfer principle. □
In spite of this result being remarkable, it is of no practical utility. Indeed, the relative tolerance may not be a finite value but an infinitesimal one, making the time needed to converge infinite. However, under proper assumptions, finite time convergence can also be guaranteed, as stated by Theorem 2. Before showing it, some preliminary results are needed and are presented as lemmas. In fact, Lemma 4 guarantees optimality improvement iteration by iteration, Lemma 5 provides a preliminary result used by Lemma 6 which proves the algorithm convergence on the leading monosemium.
Lemma 4. In the NA-IPM, if then such that and .
Proof. Applying the transfer principle to Lemma 6.7 in [
27], it holds true that
where C is a positive constant at most finite. Equation (18) immediately implies
and
. The assumption
completes the proof. □
Lemma 5. Let be the right-hand term in (6) at the k-th iteration, and the vector of its last n entries. If the temporary solution (see Lemma 6 for its definition), then . Proof. By definition,
. Focusing on the radicand, one has
where the strict inequality comes from the fact that
by hypothesis, which implies
. Considering again the square root, the result comes straightforwardly:
□
Lemma 6. Let be the NA-IPM starting point, and be the compact form for Newton’s step Equation (6) at the beginning of the optimization. Let one rewrite the right-hand term , where Call the first entries of and the last n, i.e., . Then, such that and and the Newton–Raphson method reaches that iteration in . Proof. As usual, the central path neighborhood is
Lemma 4 and μ’s positivity implies that
. The application of the transfer principle to Theorem 6.2 in [
27] guarantees that
such that
holds true and the Newton–Raphson algorithm reaches that iteration in
. Together, Lemma 4 and
guarantee that
(reached in polynomial time as well) such that
too. Set
. Then, one has
and
. Moreover, by construction it holds
and
(the latter comes from Lemma 5). Therefore, the following two chains of inequalities hold true:
as stated in the thesis. □
Corollary 1. Let k satisfy Lemma 6, then either Proof. The result comes straightforwardly from three facts: (i) ; (ii) ; (iii) the leading term of entry of x, s, and λ is never zeroed since the full optimizing step is never taken (see lines 19–20 and 31–32 in Algorithm 2). □
We are now ready to provide the convergence theorem for the NA-IPM.
Theorem 2 (NA-IPM convergence). The NA-IPM converges to the solution of an NA-QP problem in , where is the primal space dimension, is the relative tolerance, and is the number of consecutive monosemia used in the problem optimization.
Proof. For the sake of simplicity, assume to represent all the Euclidean numbers in the NA-QP problem by means of the same function of powers . From the approximation up to l consecutive monosemia, one can rewrite and . Lemma 6 guarantees that for which is ε-satisfied. Now, update the temporary solution substituting each entry of x and s which satisfies Corollary 1 with any feasible value one order of magnitude smaller, e.g., satisfying Corollary 1 is replaced with a positive value of the order , where is such that . Actually, they are those variables that are not active at the optimal solution, at least considering the zeroing of only. Then, recompute , which by construction satisfies , . All these operations have polynomial complexity and do not affect the overall result. Updating the right-hand term as and zeroing the leading term of those entries whose magnitude is still . Next, algorithm iterations are forced to consider the previous as already fully satisfied. What is actually happening is that the problem now tolerates an infeasibility error whose norm is equal to . Therefore, one can apply Lemma 6 again to obtain one solution which is ε-optimal on the second monosemia of too, and this result is achieved with a finite number of iterations and in polynomial complexity. Repeating the update-optimization procedure for all the l monosemia by means of which is represented, one obtains one ε-optimal solution on all of them. Since each of the lε-satisfactions is achieved in , then the whole algorithm converges in . □
The next proposition highlights a particular property of the NA-IPM when solving lexicographic QP problems. Actually, it happens that every time decreases by one order of magnitude, then one objective is -optimized.
Proposition 4. Consider an NA-QP problem generated from a standard lexicographic one in accordance with Theorem 1 and . Then, each of the l objectives is ε-optimized in polynomial time and when the i-th one is ε-optimized the magnitude of μ decreases from to in the next iteration.
Proof. Assume to start the algorithm with a sufficiently good and well-centered solution, as the one produced by Algorithm 1, then,
. Since
by construction, one can interpret each monosemia in
as the satisfaction of the corresponding objective function at the k-th iteration, that is if
then the first objective lacks
to be fully optimized, the second one lacks
, and so on. Because of Lemma 6, in polynomial time,
is ε-optimized, that is, the KKT conditions (
3) are ε-satisfied. In fact, this means that primal-dual feasibility is close to finite satisfaction and centrality is finitely close to zero. There is a further interpretation nevertheless. Interpreting the KKT conditions from a primal perspective their ε-satisfaction testifies that: (i) the primal solution if feasible (indeed, the primal is a standard polyhedron and, therefore, is enough to consider the leading terms of x only, getting rid of the infinitesimal infeasibility
); (ii) the objective function is finitely ε-optimized (which means that the first objective is ε-optimized since the original problem was a lexicographic one and the high-priority objective is the only one associated with finite values of the non-Archimedean objective function); (iii) the approximated solution is very close to the optimal surface of the first objective, roughly speaking it is ε-finitely close. Moreover, the fact that
after the updating procedure used in Theorem implies that the magnitude of μ will be one order of magnitude smaller in the next iteration, i.e., it will decrease from
to
. Since what was just said holds for all the l monosemia (read priority levels of, i.e., objectives in the lexicographic cost function), the proposition is proved true. □
4.3. Numerical Considerations and Implementation Issues
The whole field
cannot be used in practice since it is too big to fit in a machine. However, the algorithmic field
presented below is enough to represent and solve many real-world problems:
where
is a monotone decreasing function and the term “algorithmic field” refers to finite approximations of theoretical fields realized by computers [
12]. Similarly to IEEE-754 floating point numbers, which is the standard encoding for real numbers within a machine, a finite dimension encoding for Euclidean numbers in
is needed. In [
11,
12], the bounded algorithmic number (BAN) representation is presented as a sufficiently flexible and informative encoding to cope with this task. The BAN format is a fixed-length approximation of a Euclidean number. An example of BAN is
, where the “precision” in this context is given by the degree of the polynomial in
plus 1 (three in this case). The BAN encoding with degree three is indicated as BAN3.
The second detail to take care of when attempting to do numerical computations with Euclidean numbers is the effect of lower-magnitude monosemia on them. For instance, consider a two-objective lexicographic QP whose first objective is degenerate with respect to some entries of
x. When solving the problem by means of the NA-IPM, the following phenomenon (which can also be proved theoretically) occurs: the information of the optimizing direction for the secondary objective is stored as an infinitesimal gradient in the solution of Newton’s step Equation (
6). As an example, assume that
and the entries
and
are degenerate with respect to the first objective. Then, at each iteration the infinitesimal monosemium in the optimizing direction of
assumes a negligible value, while for
and
this is not true: it is significant and grows exponentially in time. In fact, the infinitesimal gradient represents the optimizing direction which must be followed along the optimal (and degenerate) surface of the first objective in order to also reach optimality for the second one. However, such infinitesimal directions do not significantly contribute to the optimization, since the major role is played by the finite entries of the gradient. Therefore, the effect of this infinitesimal information in the gradient only generates numerical instabilities. As soon as the first objective is
-optimized, i.e., the first objective surface is reached, the optimizing direction still assumes finite values but this time oriented in order to optimize the second objective keeping the first one fixed, while all the infinitesimal monosemia of the gradient assume negligible values. Roughly speaking, it happens as a sort of “gradient promotion” as a result of the change in the objective to optimize. To cope with the issue of noisy and unstable infinitesimal entries in the gradient, two details need to be implemented: (i) after the computation of the gradients (both the predictor and the corrector step), only the leading term of each entry must be preserved, zeroing the remaining monosemia; (ii) after having computed the starting point according to Algorithm 1, again only the leading term of each entry of
x,
s, and
must be preserved. These variations do not affect convergence nor the generality of the discussion since the leading terms of the primal-dual solution are the only ones that impact the zeroing of
.
The choice of dealing with only the leading terms of the gradients comes in handy to solve another issue during the computations: a good choice for the value to assign to the zeroed entries of x and s during the updating phase discussed in Theorem 2. Actually, it is enough to add one monosemium whose magnitude is such that the following equality holds true: . For instance, assume again a two-objective lexicographic QP scenario after having completed the optimization of the first one. It holds true that and either or are smaller than , say . Then, the updating phase of Theorem 2 sets to a value having magnitude one order smaller. A reasonable approach is to set equal to a monosemium, say , such that . Since is finite because of Corollary 1, one has . The naive choice is , but it may be not the best one. Indeed, this approach does not guarantee the generation of a temporary solution with the highest possible centrality, i.e., , where is the centrality measure after the update. To do it, the value to opt for must be , where and is a monosemium one order of magnitude smaller than and sufficiently (but arbitrarily) far from zero.
Another numerical issue to care about is the computation accuracy due to the number of monosemia used. As a practical example, consider the task to invert the matrix
A reported below. Since its entries have magnitudes from
to
, one may think BAN3 encoding is enough to properly compute an approximation of its inverse, which is reported below as
. However, the product
testifies the presence of an error whose magnitude is
, quite close to
. Depending on the application, such noise could negatively affect further computations and the final result. Therefore, a good practice is to foresee some additional slots in the non-Archimedean number encoding. For instance, adopting the BAN5 standard, the approximated inverse
manifests an error with magnitude
(see matrix
), definitely a safer choice even if at the expense of extra computational effort.
Finally, the last detail concerns terminal conditions. In the standard IPM, execution stops when the three convergence measures
,
, and
in (
9) are smaller than the threshold
. However, in a non-Archimedean context optimality means to be
-optimal on each of the
l monosemia in the objective function, i.e., to satisfy the KKT condition (
3) on the first
l monosemia in
b and
c and
. This means that the terminal condition needs to be modified in order to cope with such a convergence definition. The convergence measures need to be redefined as
with the convention
. In this way, one has the guarantee that
,
, and
are finite values when close to optimality. To better clarify this concept, consider the case in which
b has only infinitesimal entries. This implies that the norm of
b is infinitesimal too. In case the convergence measures in (
9) are used, the denominator of
is a finite number. Therefore, any finite approximation of
b, i.e., any primal solution
x such that
, is finite induces a finite value for
. This is definitely a bad behavior since it is natural to expect that: (i) the convergence measures are finite numbers only if the optimization is close to optimality; (ii) their leading monosemium is smaller than
when the leading monosemium of the residual norm (
in this case) is small as well. However, this is not the case in the current example as
assumes finite values for approximation errors of
b which are infinitely larger than its norm. Using the definitions in (19) instead, the issue is solved since now finite approximations of
b are mapped into infinite values of
, while infinitesimal errors are mapped into finite ones. In fact, the introduction of the magnitude of the constant term vectors in the definitions avoids the bias which would have been introduced by an a priori choice of the magnitude of the constant term added to the denominator of the convergence measures.
Then, feasibility on the primal problem is -achieved when the absolute value of the first monosemia in are smaller than , i.e., assuming and , a sufficient level of optimality is reached when , . Similar considerations hold for c and as well.