The algorithm pseudocode is presented in Algorithm 3. To better understand it, the next three subsections are devoted to discussing some delicate aspects of it and of its implementation: unboundedness and infeasibility, complexity and convergence properties, and numerical issues that arise when moving from algorithmic description to software implementation. Similarly to what happens for standard algorithms, it is important to stress that many of the proofs in this section assume Euclidean numbers to be represented by finite sequences of monosemia. Indeed, even if the reference set
$\mathbb{E}$ defined by Axioms 1–3 admits numbers represented by infinite sequences, it would not be reasonable to use them in a machine and to discuss the algorithm’s convergence. The reasons are two: (i) the algorithm should manage and manipulate an infinite amount of data; (ii) the machine is finite and cannot store all that information. Notice that, at this stage, the focus is not on variablelength representations of Euclidean numbers, as they would slow the computations down [
14]. Fixedlength representations, as the ones discussed in [
11] are, therefore, preferred, because they are easier to implement in hardware (i.e., they are more “hardware friendly”), as recent studies testify [
36].
Algorithm 3 NonArchimedean predictor–corrector infeasible primaldual IPM. 
 1:
procedureNAIPM(A, b, c, Q, $\epsilon $, $\mathtt{max}\_\mathtt{it}$)  2:
${/}^{*}$Notice that the divergence is dealt with the embedding presented in Section 4.1${}^{*}/$  3:
${/}^{*}$Therefore the flag of correct termination and the threshold ω are useless here${}^{*}/$  4:
${/}^{*}$Notice also that only $\epsilon \in {\mathbb{R}}^{+}$, while $A,b,c$ and Q are Euclidean matrices and vectors${}^{*}/$  5:
x, $\lambda $, s = starting_point(A, b, c, Q)  6:
$x=\mathtt{lead}\_\mathtt{mon}\left(x\right)\phantom{\rule{1.em}{0ex}}\lambda =\mathtt{lead}\_\mathtt{mon}\left(\lambda \right)\phantom{\rule{1.em}{0ex}}s=\mathtt{lead}\_\mathtt{mon}\left(s\right)$  7:
for $i=1,\phantom{\rule{0.166667em}{0ex}}\dots ,\phantom{\rule{0.166667em}{0ex}}\mathtt{max}\_\mathtt{it}$ do  8:
${/}^{*}$compute residuals${}^{*}/$  9:
${r}_{b}=Axb\phantom{\rule{1.em}{0ex}}{r}_{c}={A}^{T}\lambda +sQxc\phantom{\rule{1.em}{0ex}}{r}_{\mu}=x\odot s$  10:
${/}^{*}$compute centrality, n is the length of x${}^{*}/$  11:
$\mu ={\textstyle \frac{{r}_{\mu}^{T}\mathbf{1}}{n}}$  12:
${/}^{*}$compute KKT conditions satisfaction parameters${}^{*}/$  13:
${\rho}_{1}={\textstyle \frac{\parallel Axb\parallel}{1\mathcal{O}\left(b\right)+\parallel b\parallel}},\phantom{\rule{1.em}{0ex}}{\rho}_{2}={\textstyle \frac{\parallel {A}^{T}\lambda +sQxc\parallel}{1\mathcal{O}\left(c\right)+\parallel c\parallel}},\phantom{\rule{1.em}{0ex}}{\rho}_{3}={\textstyle \frac{\mu}{1\mathcal{O}(\frac{1}{2}{x}^{T}Qx+{c}^{T}x)+\frac{1}{2}{x}^{T}Qx+{c}^{T}x}}$  14:
${/}^{*}$check convergence on all the (meaningful) monosemia of the aggregated values (Section 4.3)${}^{*}/$  15:
if $\mathtt{all}\_\mathtt{monosemia}\left({\rho}_{1}\right)\le \epsilon $ and $\mathtt{all}\_\mathtt{monosemia}\left({\rho}_{2}\right)\le \epsilon $ and $\mathtt{all}\_\mathtt{monosemia}\left({\rho}_{3}\right)\le \epsilon $ then  16:
${/}^{*}$primaldual feasible optimal solution found${}^{*}/$  17:
return x, $\lambda $, s  18:
${/}^{*}$compute predictor directions solving (3)${}^{*}/$  19:
$\Delta {x}_{p}$, $\Delta {\lambda}_{p}$, $\Delta {s}_{p}=$ predict(A, b, c, Q, ${r}_{b}$, ${r}_{c}$, ${r}_{\mu}$)  20:
${/}^{*}$keep only the leading monosemia of the gradients (Section 4.3)${}^{*}/$  21:
$\Delta {x}_{p}=\mathtt{lead}\_\mathtt{mon}(\Delta {x}_{p})\phantom{\rule{1.em}{0ex}}\Delta {\lambda}_{p}=\mathtt{lead}\_\mathtt{mon}(\Delta {\lambda}_{p})\phantom{\rule{1.em}{0ex}}\Delta {s}_{p}=\mathtt{lead}\_\mathtt{mon}(\Delta {s}_{p})$  22:
${/}^{*}$compute predictor step size${}^{*}/$  23:
${\nu}_{pp}=0.99min({max}_{\overline{\nu}}\{\mathtt{lead}\_\mathtt{mon}\left(\overline{\nu}\right)\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}x+\overline{\nu}\Delta {x}_{p}\ge 0\},\phantom{\rule{0.166667em}{0ex}}1)$  24:
${\nu}_{pd}=0.99min({max}_{\overline{\nu}}\{\mathtt{lead}\_\mathtt{mon}\left(\overline{\nu}\right)\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}s+\overline{\nu}\Delta {s}_{p}\ge 0\},\phantom{\rule{0.166667em}{0ex}}1)$  25:
$\nu =min({\nu}_{pp},\phantom{\rule{0.166667em}{0ex}}{\nu}_{pd})$  26:
${/}^{*}$estimate σ${}^{*}/$  27:
$\tilde{x}=x+\nu \Delta {x}_{p}\phantom{\rule{1.em}{0ex}}\tilde{s}=s+\nu \Delta {s}_{p}$  28:
${\mu}^{\mathrm{new}}={\textstyle \frac{{\tilde{x}}^{T}\tilde{s}}{n}}$  29:
$\sigma =\mathtt{lead}\_\mathtt{mon}({\left({\textstyle \frac{{\mu}^{\mathrm{new}}}{\mu}}\right)}^{3}$)  30:
${/}^{*}$compute corrector directions solving the corresponding Newton’s system${}^{*}/$  31:
$\Delta {x}_{c}$, $\Delta {\lambda}_{c}$, $\Delta {s}_{c}=$ corrector(A, b, c, Q, $\sigma \mu \mathbf{1}\Delta {x}_{p}\odot \Delta {s}_{p}$)  32:
${/}^{*}$compute new direction${}^{*}/$  33:
$\Delta x=\Delta {x}_{p}+\Delta {x}_{c}\phantom{\rule{1.em}{0ex}}\Delta \lambda =\Delta {\lambda}_{p}+\Delta {\lambda}_{c}\phantom{\rule{1.em}{0ex}}\Delta s=\Delta {s}_{p}+\Delta {s}_{c}$  34:
${/}^{*}$keep only the leading monosemia of the gradients (Section 4.3)${}^{*}/$  35:
$\Delta x=\mathtt{lead}\_\mathtt{mon}(\Delta x)\phantom{\rule{1.em}{0ex}}\Delta \lambda =\mathtt{lead}\_\mathtt{mon}(\Delta \lambda )\phantom{\rule{1.em}{0ex}}\Delta s=\mathtt{lead}\_\mathtt{mon}(\Delta s)$  36:
${/}^{*}$compute step size${}^{*}/$  37:
${\nu}_{p}=0.99min({max}_{\overline{\nu}}\{\mathtt{lead}\_\mathtt{mon}\left(\overline{\nu}\right)\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}x+\overline{\nu}\Delta x\ge 0\},\phantom{\rule{0.166667em}{0ex}}1)$  38:
${\nu}_{d}=0.99min({max}_{\overline{\nu}}\{\mathtt{lead}\_\mathtt{mon}\left(\overline{\nu}\right)\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}s+\overline{\nu}\Delta s\ge 0\},\phantom{\rule{0.166667em}{0ex}}1)$  39:
$\nu =min({\nu}_{p},\phantom{\rule{0.166667em}{0ex}}{\nu}_{d})$  40:
${/}^{*}$compute target primaldual solution${}^{*}/$  41:
$x=x+\nu \Delta x\phantom{\rule{1.em}{0ex}}\lambda =\lambda +\nu \Delta \lambda \phantom{\rule{1.em}{0ex}}s=s+\nu \Delta s$  42:
${/}^{*}$Add infinitesimal centrality to the closetozero entries (Section 4.3)${}^{*}/$  43:
$x,\phantom{\rule{0.166667em}{0ex}}s=\mathtt{update}\_\mathtt{zero}\_\mathtt{entries}(x,\phantom{\rule{0.166667em}{0ex}}s)$  44:
return x, $\lambda $, s

4.1. Infeasibility and Unboundedness
As stated in
Section 2, one can approach the problem of infeasibility and unboundedness in two different ways: divergence detection at run time or problem embedding. While the first keeps the problem complexity fixed but negatively affects the computation because of norm divergence polling, the second wastes resources by optimizing a more complex problem, which is solved efficiently nevertheless. Therefore, the simpler the embedding is, the lesser it affects the performance.
One very simple embedding, proposed by [
37,
38,
39,
40], consists of the following mapping:
where
${\wp}_{1}$ and
${\wp}_{2}$ are two positive and sufficiently big constants. This embedding adds two artificial variables (one to the primal and one to the dual problem) and one slack variable (to the primal). The goal of adding the artificial variables is to guarantee the feasibility of their corresponding problem, while on their own dual this is equivalent to adding one bounding hyperplane to prevent any divergence. From duality theory indeed, if the primal problem is infeasible, then the dual is unbounded and vice versa. Geometrically, the hyperplane slope is chosen considering a particular conical combination of the constraints and the constant term vector (the one with all coefficients equal to 1). If there is any polyhedron unboundedness, the conical combination outputs a diverging direction and generates a hyperplane orthogonal to it; otherwise, the addition of such constraints has no effect.
On the other hand, the constraint intercept depends on the penalizing weights
${\wp}_{1}$ and
${\wp}_{2}$, respectively, for the primal and the dual hyperplane. The larger the weight is, the farther is located the corresponding bound. From the primal perspective instead,
${\wp}_{1}$ and
${\wp}_{2}$ act as penalizing weights for the artificial variables of the dual and the primal problem, respectively. The need for this penalization comes from the fact that to make the optimization consistent, the algorithm must be driven towards feasible points of the original problem, if any. By construction, the latter always have artificial variables equal to zero, which means one has to penalize them in the cost function as much as possible in order to force them to that value. More formally, it can be proved that for sufficiently large values of
${\wp}_{1}$ and
${\wp}_{2}$: (i) the enlarged problem is strictly feasible and bounded; (ii) any solution for the larger problem is also optimal for the embedded one if and only if both the artificial variables are zero [
38].
Unfortunately, this idea is unsustainable when moving from theory to practice, i.e., to implementation. Indeed, a good estimate of the weights is difficult to determine a priori, and the computational performance is sensitive to their values [
41]. Trying to find a solution, Lustig [
42] investigated the optimal directions generated by Newton’s step equation when
${\wp}_{1}$ and
${\wp}_{2}$ are driven to
∞, proposing a weightfree algorithm based on these directions. Later, Lustig et al. [
43] showed that directions coincide with those of an infeasible IPM, without solving the unboundedness issue actually. When considering a set of numbers larger than
$\mathbb{R}$ as
$\mathbb{E}$, however, an approach in the middle between (14) and the one by Lustig is possible. It consists of the use of
infinitely large penalizing weights, i.e., in a nonArchimedean embedding. This choice has the effect of infinitely penalizing the artificial variables, while from a dual perspective it locates the bounding hyperplanes infinitely far from the origin. For instance, in the case of a standard QP problem, it is enough to set both
${\wp}_{1}$ and
${\wp}_{2}$ to
$\alpha $, obtaining the following map
The idea to infinitely penalize an artificial variable is not completely new, it has already been successfully used in the IBigM method [
20], previously proposed by the author of this work, even if in a discrete context rather than in a continuous one.
Nevertheless, there is still a little detail to take care of. Embeddingbased approaches leverage the milestone theorem of duality to guarantee optimal the solution’s existence and boundedness. A nonArchimedean version of the duality theorem must hold too, otherwise, nonArchimedean embeddings end up being theoretically not well founded. Thanks to the transfer principle, $\mathbb{E}$ is free from any issue of this kind, as stated by the next proposition.
Proposition 2 (NonArchimedean Duality). Given an NAQP maximization problem, suppose that the primal and dual problems are feasible. Then, if the dual problem has a strictly feasible point, the optimal primal solution set is nonempty and bounded. Vice versa is true as well.
Proof. The theorem is true thanks to the transfer principle which, roughly speaking, transfers the properties of standard quadratic functions to quadratic nonArchimedean ones. □
If a generic nonArchimedean QP problem is considered instead, setting the weights to
$\alpha $ may be insufficient to correctly build the embedding. Actually, their proper choice depends on the magnitude of the values constituting the problem. Proposition 3 gives a sufficient estimate of them; before showing it, however, three preliminary results are necessary. Lemmas 1–3 address them. All these three lemmata make use of the functions
$\mathcal{O}(\xb7)$ and
$\mathit{o}(\xb7)$ provided in
Section 3 as Definition 4 and 5, respectively. In particular, Lemma 1 provides an upper bound to the magnitude of the entries of the solutions
x of a nonArchimedean linear system
$Ax\le b$. This upper bound is expressed as a function of the magnitude of the entries of both
A and
b. Furthermore, Lemma 1 considers the case in which the linear system to solve is the dual feasibility constraint of a QP problem, i.e., it has the form
${A}^{T}\lambda Q\overline{x}\le c$ with
$\overline{x}$ satisfying
$A\overline{x}\le b$. Lemmas 2 and 3 generalize Lemma 1 considering corner cases too.
Lemma 1. Let the set of primaldual optimal solutions Ω be nonempty and bounded. Additionally, let $b,\phantom{\rule{0.166667em}{0ex}}[c,\phantom{\rule{0.166667em}{0ex}}Q]\ne \mathbf{0}$, A has full row rank, and its entries are represented by at most l monosemia, i.e., ${A}_{ij}={\sum}_{h=1}^{l}{\left({a}_{ij}\right)}^{h}{\alpha}^{g\left(h\right)}$. Then, any $(\overline{x},\phantom{\rule{0.166667em}{0ex}}\overline{\lambda},\phantom{\rule{0.166667em}{0ex}}\overline{s})\in \mathsf{\Omega}$ satisfieswhere $J=\{j=1,\phantom{\rule{0.166667em}{0ex}}\dots ,\phantom{\rule{0.166667em}{0ex}}m\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{b}_{j}\ne 0\}$ and $I=\{i=1,\phantom{\rule{0.166667em}{0ex}}\dots ,\phantom{\rule{0.166667em}{0ex}}n\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{c}_{i}\ne 0\phantom{\rule{0.277778em}{0ex}}\vee \phantom{\rule{0.277778em}{0ex}}{Q}_{i}\ne \mathbf{0}\}$. Proof. By hypothesis,
$A\overline{x}=b$ and, therefore,
$A\overline{x}=b$ too. Focusing on the jth constraint,
$j\in J$, it holds
which implies
The proof for the second part of the thesis is very similar:
where
$\xi \in {\mathbb{E}}^{n}$ is such that
${Q}_{i}\overline{x}\le {Q}_{i}\xi $ and has the form
$\xi ={\xi}^{0}\mathcal{O}\left(\tilde{x}\right)$,
${\xi}^{0}\in {\mathbb{R}}^{n}$. Now, following the same guidelines used in (15) and (16), one gets
□
Equation (15) may seem analytically trivial, but actually, it underlines a subtle property of nonArchimedean linear systems: the solution can have entries infinitely larger than any number involved in the system itself. As an example, the 2by2 linear system below admits the unique solution
$\overline{x}=[{\alpha}^{2},\phantom{\rule{0.166667em}{0ex}}{\alpha}^{2}]$. However, each value in the system is finite, i.e., the magnitude of each entry of
A and
b is
$\mathcal{O}\left({\alpha}^{0}\right)=\mathcal{O}\left(1\right)$:
Notice that Lemma 1 works perfectly here. Indeed, $\mathcal{O}\left(b\right)=1$ and $\mathit{o}\left(A\right)={\eta}^{2}$ imply $\mathcal{O}\left(\tilde{x}\right)={\alpha}^{2}\ge \mathcal{O}\left(\overline{x}\right)$.
Lemma 2. Let either the primal problem be unbounded or $\mathsf{\Omega}\ne \varnothing $ be unbounded in the primal variable. Let also $b,\phantom{\rule{0.166667em}{0ex}}c,\phantom{\rule{0.166667em}{0ex}}Q$, and A satisfy the same hypothesis as in Lemma 1. Then,with I and J as in Lemma 1. Proof. If the primal problem is unbounded, it means that $\exists x\in {\mathbb{E}}^{n}$ such that $Ax=b$, and $\forall x$ such that $Ax=b,\phantom{\rule{0.166667em}{0ex}}\nexists \lambda \in {\mathbb{E}}^{m}$ such that ${A}^{T}\lambda \le c+Qx$. Nevertheless, a relaxed version of Lemma 1 still holds for the primal polyhedron, that is $\exists \overline{x}$ such that $A\overline{x}=b$ and $\mathcal{O}\left(\overline{x}\right)\le \mathcal{O}\left(\tilde{x}\right)={min}_{j\in J}{\textstyle \frac{\mathcal{O}\left({b}_{j}\right)}{\mathit{o}\left({A}_{j}\right)}}$ (the request for optimality is missing). According to (14), a feasible bound for the primal polyhedron is $({c}^{T}{\mathbf{1}}^{T}+{\mathbf{1}}^{T}Q)x\zeta ={\wp}_{2}$, $\zeta \ge 0$, provided a suitable choice of ${\wp}_{2}$. Indeed, it can happen that a wrong value for ${\wp}_{2}$ turns the unbounded problem into an infeasible one. This aspect shall be discussed in Proposition 3, which specifies ${\wp}_{2}$ as a function of $\overline{x}$.
Choice of
${\wp}_{2}$ apart, the addition of the bound to the primal polyhedron guarantees that
$\exists {x}^{\prime}=(x,\phantom{\rule{0.166667em}{0ex}}\zeta )\in {\mathbb{E}}^{n+1}$ such that it is feasible for the bounded primal problem and
$\exists {\lambda}^{\prime}\in {\mathbb{E}}^{m+1}$ such that
${A}^{T}\lambda +{({c}^{T}{\mathbf{1}}^{T}+{\mathbf{1}}^{T}Q)}^{T}\xi \le c+Qx$ (remember that ξ is the dual variable associated with the new constraint of the primal problem and that
$\tilde{Q}{x}^{\prime}=Qx$). Following the same reasoning used in Lemma 1, one gets the second part of the thesis
The case in which $\mathsf{\Omega}\ne \varnothing $ and unbounded in the primal variable is very similar. Together with the assumption $b,\phantom{\rule{0.166667em}{0ex}}[c,\phantom{\rule{0.166667em}{0ex}}Q]\ne \mathbf{0}$, it means that there are plenty of (not strictly) feasible primaldual optimal solutions, but there does not exist any with maximum centrality. This fact negatively affects IPMs, since they move towards maximum centrality solutions. Therefore, an IPM that tries to optimize such a problem will never converge to any point, even if there are a lot of optimal candidates. To avoid this phenomenon, it is enough to bound such a set of solutions with the addition of a further constraint to the primal problem (which has also the effect to guarantee the existence of the strictly feasible solutions missing in the dual polyhedron). As a result, the very same considerations applied for problem unboundedness work in this case as well, leading to exactly the result in the thesis of this lemma. □
Lemma 3. Let either the primal problem be infeasible or $\mathsf{\Omega}\ne \varnothing $ be unbounded in the dual variable. Let also $b,\phantom{\rule{0.166667em}{0ex}}c,\phantom{\rule{0.166667em}{0ex}}Q$, and A satisfy the same hypothesis as in Lemma 1. Then,with I and J as in Lemma 1. Proof. In this case $\nexists x$ such that $Ax=b$ but $\forall x\in {\mathbb{E}}^{n}$$\exists \lambda \in {\mathbb{E}}^{m}$ such that ${A}^{T}\lambda \le c+Qx$. Enlarging the primal problem in accordance to (14), one has that $\exists {x}^{\prime}=(\overline{x},\phantom{\rule{0.166667em}{0ex}}\zeta )\in {\mathbb{E}}^{n+1}$ ($\zeta \ge 0$) such that $A\overline{x}+(bA\mathbf{1})\zeta =b$. In addition, it holds $\exists \overline{\lambda}$ such that ${A}^{T}\overline{\lambda}\le c+Q\overline{x}$, provided that ${(bA\mathbf{1})}^{T}\overline{\lambda}\le {\wp}_{1}$ for some suitable choice of ${\wp}_{1}$ (which shall be discussed in Proposition 3 as well). By analogous reasoning to the ones used in Lemma 1 and 2, the thesis immediately comes.
The case in which $\mathsf{\Omega}\ne \varnothing $ and is unbounded on the dual variable works in the same but symmetric way of the complementary scenario discussed in Lemma 2. Because of this, it implies the bounds stated in the thesis, while the proof is omitted for brevity. □
Proposition 3. Given an NAQP problem and its embedding as defined in (14), a sufficient estimate of the penalizing weights iswith I and J as in Lemma 1. In case $J=\varnothing $ then ${\wp}_{2}=\alpha $, while $I=\varnothing $ implies ${\wp}_{1}=\alpha $. Proof. The extension to the quadratic case of Theorem 2.3 in [
38] (proof omitted for brevity) gives the following sufficient condition for
${\wp}_{1}$ and
${\wp}_{2}$, which holds true even in a nonArchimedean context thanks to the transfer principle:
where (
$\overline{x}$,
$\overline{\lambda}$,
$\overline{s}$) is an optimal primaldual solution of the original problem, if any. A possible way to guarantee the satisfaction of Equation (17) is to choose
${\wp}_{1}$ and
${\wp}_{2}$ such that their magnitudes are infinitely higher than the righthand terms of the inequalities. For instance, one may set
or more weakly
In case Ω is nonempty and bounded, Lemma 1 holds and provides an estimate on the magnitude of both $\overline{x}$ and $\overline{\lambda}$. In case either the primal problem is unbounded or Ω is unbounded in the primal variable, Lemma 2 applies: the optimal solution is handcrafted by bounding the polyhedron, its magnitude is overestimated by $\tilde{x}$, and (17) gives a clue for a feasible choice of ${\wp}_{2}$. Similar considerations hold for the case of either primal problem infeasibility or Ω unboundedness in the dual variable, where Lemma 3 is used.
Corner cases are the scenarios where either $J=\varnothing $ or $I=\varnothing $. Since $J=\varnothing $ implies $b=\mathbf{0}$, the primal problem is either unbounded or with unique feasible (and optimal) point $x=\mathbf{0}$. In both cases, it is enough to set $\mathcal{O}\left(\tilde{x}\right)=1$. Since $\mathbf{0}$ is a feasible solution, in the case of unboundedness it must exist a feasible point with at least one finite entry and no infinite ones because of continuity. In the other scenario, $\mathbf{0}$ is the optimal solution and, therefore, any finite vector is a suitable upper bound for it. Analogous considerations hold for the case $I=\varnothing $, where a sufficient magnitude bound is $\mathcal{O}\left(\tilde{\lambda}\right)=1$. □
4.2. Convergence and Complexity
The main theoretical aspects to investigate in an iterative algorithm are convergence and complexity. Notice that in the case of nonArchimedean algorithms, the complexity of elementary operations (such as the sum) assumes their execution on nonArchimedean numbers, rather than on real ones. Since, theoretically, the NAIPM is just an IPM able to work with numbers in $\mathbb{E}$, one first result on NAIPM complexity comes straightforwardly thanks to the transfer principle. It is worth stressing that, as usual, Theorem 1 assumes to apply the NAIPM to an NAQP problem whose optimal solutions set is nonempty and bounded.
Theorem 1 (NAIPM convergence). The NAIPM algorithm converges in $\mathcal{O}\left({n}^{2}\rightlog\epsilon \left\right)$, where $n\in \mathbb{N}$ is the primal space dimension and $\epsilon \in {\mathbb{E}}^{+}$ is the optimality relative tolerance.
Proof. The theorem holds true because of the transfer principle. □
In spite of this result being remarkable, it is of no practical utility. Indeed, the relative tolerance $\epsilon $ may not be a finite value but an infinitesimal one, making the time needed to converge infinite. However, under proper assumptions, finite time convergence can also be guaranteed, as stated by Theorem 2. Before showing it, some preliminary results are needed and are presented as lemmas. In fact, Lemma 4 guarantees optimality improvement iteration by iteration, Lemma 5 provides a preliminary result used by Lemma 6 which proves the algorithm convergence on the leading monosemium.
Lemma 4. In the NAIPM, if $\sigma \in {\mathbb{R}}^{+}$ then $\exists \phantom{\rule{0.166667em}{0ex}}\tilde{\nu}\in [0,\phantom{\rule{0.166667em}{0ex}}1]\subset \mathbb{R}$ such that ${\mu}^{(k+1)}\le (10.1\tilde{\nu}){\mu}^{\left(k\right)}$ and $\parallel ({r}_{b}^{(k+1)},\phantom{\rule{0.166667em}{0ex}}{r}_{c}^{(k+1)})\parallel \le (10.1\tilde{\nu})\parallel ({r}_{b}^{\left(k\right)},\phantom{\rule{0.166667em}{0ex}}{r}_{c}^{\left(k\right)})\parallel $.
Proof. Applying the transfer principle to Lemma 6.7 in [
27], it holds true that
where C is a positive constant at most finite. Equation (18) immediately implies
$\tilde{\nu}\in [0,\phantom{\rule{0.166667em}{0ex}}1]\subset \mathbb{E}$ and
$\mathcal{O}\left(\tilde{\nu}\right)=\mathcal{O}\left(\sigma \right)$. The assumption
$\sigma \in {\mathbb{R}}^{+}$ completes the proof. □
Lemma 5. Let ${d}^{\left(k\right)}$ be the righthand term in (6) at the kth iteration, and ${d}_{\mu}$ the vector of its last n entries. If the temporary solution $({x}^{\left(k\right)},\phantom{\rule{0.166667em}{0ex}}{\lambda}^{\left(k\right)},\phantom{\rule{0.166667em}{0ex}}{s}^{\left(k\right)})\in {\mathcal{N}}_{\infty}(\gamma ,\phantom{\rule{0.166667em}{0ex}}\beta )$ (see Lemma 6 for its definition), then $\parallel {d}_{\mu}\parallel <n\mu $. Proof. By definition,
$\parallel {d}_{\mu}\parallel =\sqrt{{\sum}_{i=1}^{n}{(\sigma \mu {x}_{i}{s}_{i})}^{2}}$. Focusing on the radicand, one has
where the strict inequality comes from the fact that
${x}_{i}{s}_{i}\ge \gamma \mu $ by hypothesis, which implies
${x}_{i}{s}_{i}\le n\mu \gamma \mu (n1)<n\mu $. Considering again the square root, the result comes straightforwardly:
□
Lemma 6. Let $({x}^{\left(0\right)},\phantom{\rule{0.166667em}{0ex}}{\lambda}^{\left(0\right)},\phantom{\rule{0.166667em}{0ex}}{s}^{\left(0\right)})$ be the NAIPM starting point, and ${M}^{\left(0\right)}{\Delta}^{\left(0\right)}={d}^{\left(0\right)}$ be the compact form for Newton’s step Equation (6) at the beginning of the optimization. Let one rewrite the righthand term ${d}^{\left(k\right)}={\left({d}^{\left(k\right)}\right)}^{0}+{\left({d}^{\left(k\right)}\right)}^{1}$, where ${\left({d}_{i}^{\left(k\right)}\right)}^{0}=\mathtt{lead}\_\mathtt{mon}\left({d}_{i}^{\left(k\right)}\right).$ Call ${d}_{r}$ the first $n+m$ entries of ${\left({d}_{i}^{\left(k\right)}\right)}^{0}$ and ${d}_{\mu}$ the last n, i.e., ${\left({d}_{i}^{\left(k\right)}\right)}^{0}=({d}_{r},\phantom{\rule{0.166667em}{0ex}}{d}_{\mu})$. Then, $\exists k\in \mathbb{N}$ such that ${d}_{{r}_{i}}\le \epsilon \mathcal{O}(\parallel ({r}_{b}^{\left(0\right)},\phantom{\rule{0.166667em}{0ex}}{r}_{c}^{\left(0\right)})\parallel )$ and ${d}_{{\mu}_{i}}<n\epsilon \mathcal{O}(\parallel ({r}_{b}^{\left(0\right)},\phantom{\rule{0.166667em}{0ex}}{r}_{c}^{\left(0\right)})\parallel )$ and the Newton–Raphson method reaches that iteration in $\mathcal{O}\left({n}^{2}\rightlog\epsilon \left\right)$. Proof. As usual, the central path neighborhood is
Lemma 4 and μ’s positivity implies that
$\mathcal{O}\left({\mu}^{(k+1)}\right)\le \mathcal{O}\left({\mu}^{\left(k\right)}\right)$$\forall k\in \mathbb{N}$. The application of the transfer principle to Theorem 6.2 in [
27] guarantees that
$\exists {k}^{\prime}\in \mathbb{N}$ such that
${\mu}^{\left({k}^{\prime}\right)}\le \epsilon \mathcal{O}\left({\mu}^{\left(0\right)}\right)$ holds true and the Newton–Raphson algorithm reaches that iteration in
$\mathcal{O}\left({n}^{2}\rightlog\epsilon \left\right)$. Together, Lemma 4 and
$({x}^{\left(k\right)},\phantom{\rule{0.166667em}{0ex}}{\lambda}^{\left(k\right)},\phantom{\rule{0.166667em}{0ex}}{s}^{\left(k\right)})\in {\mathcal{N}}_{\infty}(\gamma ,\phantom{\rule{0.166667em}{0ex}}\beta )$ guarantee that
$\exists {k}^{\u2033}$ (reached in polynomial time as well) such that
$\parallel ({r}_{b}^{\left({k}^{\u2033}\right)},\phantom{\rule{0.166667em}{0ex}}{r}_{c}^{\left({k}^{\u2033}\right)})\parallel \le \epsilon \mathcal{O}(\parallel ({r}_{b}^{\left(0\right)},\phantom{\rule{0.166667em}{0ex}}{r}_{c}^{\left(0\right)})\parallel )$ too. Set
$k=max({k}^{\prime},\phantom{\rule{0.166667em}{0ex}}{k}^{\u2033})$. Then, one has
${\mu}^{\left(k\right)}\le \epsilon \mathcal{O}\left({\mu}^{\left(0\right)}\right)\u27f9\mathtt{lead}\_\mathtt{mon}\left({\mu}^{\left(k\right)}\right)\le \epsilon \mathcal{O}\left({\mu}^{\left(0\right)}\right)$ and
$\parallel ({r}_{b}^{\left(k\right)},\phantom{\rule{0.166667em}{0ex}}{r}_{c}^{\left(k\right)})\parallel \le \epsilon \mathcal{O}(\parallel ({r}_{b}^{\left(0\right)},\phantom{\rule{0.166667em}{0ex}}{r}_{c}^{\left(0\right)})\parallel )\u27f9\mathtt{lead}\_\mathtt{mon}(\parallel ({r}_{b}^{\left(k\right)},\phantom{\rule{0.166667em}{0ex}}{r}_{c}^{\left(k\right)})\parallel )\le \epsilon \mathcal{O}(\parallel ({r}_{b}^{\left(0\right)},\phantom{\rule{0.166667em}{0ex}}{r}_{c}^{\left(0\right)})\parallel )$. Moreover, by construction it holds
$\parallel {d}_{r}^{\left(k\right)}\parallel \le \mathtt{lead}\_\mathtt{mon}(\parallel ({r}_{b}^{\left(k\right)},\phantom{\rule{0.166667em}{0ex}}{r}_{c}^{\left(k\right)})\parallel )$ and
$\parallel {d}_{\mu}^{\left(k\right)}\parallel <n\mathtt{lead}\_\mathtt{mon}\left({\mu}^{\left(k\right)}\right)$ (the latter comes from Lemma 5). Therefore, the following two chains of inequalities hold true:
as stated in the thesis. □
Corollary 1. Let k satisfy Lemma 6, then either Proof. The result comes straightforwardly from three facts: (i) ${x}_{i}{s}_{i}<n\mu $$\forall i=1,\phantom{\rule{0.166667em}{0ex}}\dots ,\phantom{\rule{0.166667em}{0ex}}n$; (ii) $\mu \le \epsilon \mathcal{O}\left({\mu}^{\left(0\right)}\right)$; (iii) the leading term of entry of x, s, and λ is never zeroed since the full optimizing step is never taken (see lines 19–20 and 31–32 in Algorithm 2). □
We are now ready to provide the convergence theorem for the NAIPM.
Theorem 2 (NAIPM convergence). The NAIPM converges to the solution of an NAQP problem in $\mathcal{O}\left(l{n}^{2}\rightlog\epsilon \left\right)$, where $n\in \mathbb{N}$ is the primal space dimension, $\epsilon \in {\mathbb{R}}^{+}$ is the relative tolerance, and $l\in \mathbb{N}$ is the number of consecutive monosemia used in the problem optimization.
Proof. For the sake of simplicity, assume to represent all the Euclidean numbers in the NAQP problem by means of the same function of powers $g:\mathbb{N}\to \mathbb{Q}$. From the approximation up to l consecutive monosemia, one can rewrite $\parallel ({r}_{b}^{\left(k\right)},\phantom{\rule{0.166667em}{0ex}}{r}_{c}^{\left(k\right)})\parallel ={\sum}_{i=1}^{l}{r}_{i}^{\left(k\right)}{\alpha}^{g\left(i\right)}$ and ${\mu}^{\left(k\right)}={\sum}_{i=1}^{l}{\mu}_{i}^{\left(k\right)}{\alpha}^{g\left(i\right)}$. Lemma 6 guarantees that $\exists \phantom{\rule{0.166667em}{0ex}}k$ for which ${\left({d}^{\left(k\right)}\right)}^{0}$ is εsatisfied. Now, update the temporary solution substituting each entry of x and s which satisfies Corollary 1 with any feasible value one order of magnitude smaller, e.g., ${x}_{i}$ satisfying Corollary 1 is replaced with a positive value of the order ${\alpha}^{g(j+1)}$, where $j\in \mathbb{N}$ is such that $\mathcal{O}\left({x}_{i}\right)={\alpha}^{g\left(j\right)}$. Actually, they are those variables that are not active at the optimal solution, at least considering the zeroing of ${\left({d}^{\left(k\right)}\right)}^{0}$ only. Then, recompute ${d}^{\left(k\right)}$, which by construction satisfies $\parallel {d}_{\mu}^{\left(k\right)}\parallel =0$, $\parallel {d}_{r}^{\left(k\right)}\parallel <\epsilon \mathcal{O}(\parallel ({r}_{b}^{\left(0\right)},\phantom{\rule{0.166667em}{0ex}}{r}_{c}^{\left(0\right)})\parallel )$. All these operations have polynomial complexity and do not affect the overall result. Updating the righthand term as ${d}^{\left(k\right)}\leftarrow {d}^{\left(k\right)}{\left({d}^{\left(k\right)}\right)}^{0}$ and zeroing the leading term of those entries whose magnitude is still $\mathcal{O}(\parallel ({r}_{b}^{\left(0\right)},\phantom{\rule{0.166667em}{0ex}}{r}_{c}^{\left(0\right)})\parallel )$. Next, algorithm iterations are forced to consider the previous ${\left({d}^{\left(k\right)}\right)}^{0}$ as already fully satisfied. What is actually happening is that the problem now tolerates an infeasibility error whose norm is equal to $\epsilon \mathcal{O}(\parallel ({r}_{b}^{\left(0\right)},\phantom{\rule{0.166667em}{0ex}}{r}_{c}^{\left(0\right)})\parallel )$. Therefore, one can apply Lemma 6 again to obtain one solution which is εoptimal on the second monosemia of ${\left({d}^{\left(0\right)}\right)}^{0}$ too, and this result is achieved with a finite number of iterations and in polynomial complexity. Repeating the updateoptimization procedure for all the l monosemia by means of which ${\left({d}^{\left(0\right)}\right)}^{0}$ is represented, one obtains one εoptimal solution on all of them. Since each of the lεsatisfactions is achieved in $\mathcal{O}\left({n}^{2}\rightlog\epsilon \left\right)$, then the whole algorithm converges in $\mathcal{O}\left(l{n}^{2}\rightlog\epsilon \left\right)$. □
The next proposition highlights a particular property of the NAIPM when solving lexicographic QP problems. Actually, it happens that every time $\mu $ decreases by one order of magnitude, then one objective is $\epsilon $optimized.
Proposition 4. Consider an NAQP problem generated from a standard lexicographic one in accordance with Theorem 1 and ${\beta}_{i}={\alpha}^{1i}$$\forall i=1,\phantom{\rule{0.166667em}{0ex}}\dots ,\phantom{\rule{0.166667em}{0ex}}l$. Then, each of the l objectives is εoptimized in polynomial time and when the ith one is εoptimized the magnitude of μ decreases from $\mathcal{O}\left({\alpha}^{1i}\right)$ to $\mathcal{O}\left({\alpha}^{i}\right)$ in the next iteration.
Proof. Assume to start the algorithm with a sufficiently good and wellcentered solution, as the one produced by Algorithm 1, then,
$\mathcal{O}(\parallel ({r}_{b}^{\left(0\right)},\phantom{\rule{0.166667em}{0ex}}{r}_{c}^{\left(0\right)})\parallel )=\mathcal{O}\left({\mu}^{\left(0\right)}\right)=\mathcal{O}\left({\alpha}^{0}\right)$. Since
${x}^{\left(k\right)}\in {\mathbb{R}}^{+}$ by construction, one can interpret each monosemia in
${r}_{c}^{\left(k\right)}$ as the satisfaction of the corresponding objective function at the kth iteration, that is if
${r}_{c}^{\left(k\right)}={\sum}_{i=1}^{l}{r}_{{c}_{i}}{\alpha}^{1i}$ then the first objective lacks
${r}_{{c}_{1}}$ to be fully optimized, the second one lacks
${r}_{{c}_{2}}$, and so on. Because of Lemma 6, in polynomial time,
${\left({d}_{i}\right)}^{0}$ is εoptimized, that is, the KKT conditions (
3) are εsatisfied. In fact, this means that primaldual feasibility is close to finite satisfaction and centrality is finitely close to zero. There is a further interpretation nevertheless. Interpreting the KKT conditions from a primal perspective their εsatisfaction testifies that: (i) the primal solution if feasible (indeed, the primal is a standard polyhedron and, therefore, is enough to consider the leading terms of x only, getting rid of the infinitesimal infeasibility
${r}_{{b}_{2}}$); (ii) the objective function is finitely εoptimized (which means that the first objective is εoptimized since the original problem was a lexicographic one and the highpriority objective is the only one associated with finite values of the nonArchimedean objective function); (iii) the approximated solution is very close to the optimal surface of the first objective, roughly speaking it is εfinitely close. Moreover, the fact that
$\parallel {d}_{\mu}\parallel =0$ after the updating procedure used in Theorem implies that the magnitude of μ will be one order of magnitude smaller in the next iteration, i.e., it will decrease from
$\mathcal{O}\left({\alpha}^{0}\right)$ to
$\mathcal{O}\left({\alpha}^{1}\right)$. Since what was just said holds for all the l monosemia (read priority levels of, i.e., objectives in the lexicographic cost function), the proposition is proved true. □
4.3. Numerical Considerations and Implementation Issues
The whole field
$\mathbb{E}$ cannot be used in practice since it is too big to fit in a machine. However, the algorithmic field
$\widehat{\mathbb{E}}$ presented below is enough to represent and solve many realworld problems:
where
$g:\mathbb{N}\to \mathbb{Z}$ is a monotone decreasing function and the term “algorithmic field” refers to finite approximations of theoretical fields realized by computers [
12]. Similarly to IEEE754 floating point numbers, which is the standard encoding for real numbers within a machine, a finite dimension encoding for Euclidean numbers in
$\widehat{\mathbb{E}}$ is needed. In [
11,
12], the bounded algorithmic number (BAN) representation is presented as a sufficiently flexible and informative encoding to cope with this task. The BAN format is a fixedlength approximation of a Euclidean number. An example of BAN is
${\alpha}^{1}(2.4+3.9\eta 2.89{\eta}^{2})$, where the “precision” in this context is given by the degree of the polynomial in
$\eta $ plus 1 (three in this case). The BAN encoding with degree three is indicated as BAN3.
The second detail to take care of when attempting to do numerical computations with Euclidean numbers is the effect of lowermagnitude monosemia on them. For instance, consider a twoobjective lexicographic QP whose first objective is degenerate with respect to some entries of
x. When solving the problem by means of the NAIPM, the following phenomenon (which can also be proved theoretically) occurs: the information of the optimizing direction for the secondary objective is stored as an infinitesimal gradient in the solution of Newton’s step Equation (
6). As an example, assume that
$x\in {\mathbb{R}}^{3}$ and the entries
${x}_{2}$ and
${x}_{3}$ are degenerate with respect to the first objective. Then, at each iteration the infinitesimal monosemium in the optimizing direction of
${x}_{1}$ assumes a negligible value, while for
${x}_{2}$ and
${x}_{3}$ this is not true: it is significant and grows exponentially in time. In fact, the infinitesimal gradient represents the optimizing direction which must be followed along the optimal (and degenerate) surface of the first objective in order to also reach optimality for the second one. However, such infinitesimal directions do not significantly contribute to the optimization, since the major role is played by the finite entries of the gradient. Therefore, the effect of this infinitesimal information in the gradient only generates numerical instabilities. As soon as the first objective is
$\epsilon $optimized, i.e., the first objective surface is reached, the optimizing direction still assumes finite values but this time oriented in order to optimize the second objective keeping the first one fixed, while all the infinitesimal monosemia of the gradient assume negligible values. Roughly speaking, it happens as a sort of “gradient promotion” as a result of the change in the objective to optimize. To cope with the issue of noisy and unstable infinitesimal entries in the gradient, two details need to be implemented: (i) after the computation of the gradients (both the predictor and the corrector step), only the leading term of each entry must be preserved, zeroing the remaining monosemia; (ii) after having computed the starting point according to Algorithm 1, again only the leading term of each entry of
x,
s, and
$\lambda $ must be preserved. These variations do not affect convergence nor the generality of the discussion since the leading terms of the primaldual solution are the only ones that impact the zeroing of
${\left({d}^{\left(k\right)}\right)}^{0}$.
The choice of dealing with only the leading terms of the gradients comes in handy to solve another issue during the computations: a good choice for the value to assign to the zeroed entries of x and s during the updating phase discussed in Theorem 2. Actually, it is enough to add one monosemium whose magnitude is such that the following equality holds true: $\mathcal{O}\left({x}_{i}{s}_{i}\right)=\mathcal{O}\left(\mu \right)\eta $. For instance, assume again a twoobjective lexicographic QP scenario after having completed the optimization of the first one. It holds true that $\mathcal{O}\left(\mu \right)={\alpha}^{0}$ and either ${x}_{i}$ or ${s}_{i}$ are smaller than $n\epsilon {\alpha}^{0}$, say ${x}_{i}$. Then, the updating phase of Theorem 2 sets ${x}_{i}$ to a value having magnitude one order smaller. A reasonable approach is to set ${x}_{i}$ equal to a monosemium, say $\xi $, such that $\mathcal{O}\left(\xi {s}_{i}\right)=\mathcal{O}\left(\mu \right)\eta =\eta $. Since $\mathcal{O}\left({s}_{i}\right)$ is finite because of Corollary 1, one has $\mathcal{O}\left(\xi \right)=\eta $. The naive choice is $\xi =\eta $, but it may be not the best one. Indeed, this approach does not guarantee the generation of a temporary solution with the highest possible centrality, i.e., ${x}_{i}{s}_{i}={\mu}^{\prime}$, where ${\mu}^{\prime}$ is the centrality measure after the update. To do it, the value to opt for must be ${\xi}_{i}={\textstyle \frac{{\mu}^{\prime}}{{z}_{i}}}$, where ${z}_{i}=max({x}_{i},{s}_{i})$ and ${\mu}^{\prime}$ is a monosemium one order of magnitude smaller than $\mu $ and sufficiently (but arbitrarily) far from zero.
Another numerical issue to care about is the computation accuracy due to the number of monosemia used. As a practical example, consider the task to invert the matrix
A reported below. Since its entries have magnitudes from
$\alpha $ to
$\eta $, one may think BAN3 encoding is enough to properly compute an approximation of its inverse, which is reported below as
${A}_{3}^{1}$. However, the product
$A{A}_{3}^{1}$ testifies the presence of an error whose magnitude is
${\eta}^{2}$, quite close to
$\mathit{o}\left(A\right)=\eta $. Depending on the application, such noise could negatively affect further computations and the final result. Therefore, a good practice is to foresee some additional slots in the nonArchimedean number encoding. For instance, adopting the BAN5 standard, the approximated inverse
${A}_{5}^{1}$ manifests an error with magnitude
${\eta}^{4}$ (see matrix
$A{A}_{5}^{1}$), definitely a safer choice even if at the expense of extra computational effort.
Finally, the last detail concerns terminal conditions. In the standard IPM, execution stops when the three convergence measures
${\rho}_{1}$,
${\rho}_{2}$, and
${\rho}_{3}$ in (
9) are smaller than the threshold
$\epsilon $. However, in a nonArchimedean context optimality means to be
$\epsilon $optimal on each of the
l monosemia in the objective function, i.e., to satisfy the KKT condition (
3) on the first
l monosemia in
b and
c and
$\mu $. This means that the terminal condition needs to be modified in order to cope with such a convergence definition. The convergence measures need to be redefined as
with the convention
$\mathcal{O}\left(0\right)={\alpha}^{0}=1$. In this way, one has the guarantee that
${\rho}_{1}$,
${\rho}_{2}$, and
${\rho}_{3}$ are finite values when close to optimality. To better clarify this concept, consider the case in which
b has only infinitesimal entries. This implies that the norm of
b is infinitesimal too. In case the convergence measures in (
9) are used, the denominator of
${\rho}_{1}$ is a finite number. Therefore, any finite approximation of
b, i.e., any primal solution
x such that
$\parallel Axb\parallel $, is finite induces a finite value for
${\rho}_{1}$. This is definitely a bad behavior since it is natural to expect that: (i) the convergence measures are finite numbers only if the optimization is close to optimality; (ii) their leading monosemium is smaller than
$\epsilon $ when the leading monosemium of the residual norm (
$\parallel Axb\parallel $ in this case) is small as well. However, this is not the case in the current example as
${\rho}_{1}$ assumes finite values for approximation errors of
b which are infinitely larger than its norm. Using the definitions in (19) instead, the issue is solved since now finite approximations of
b are mapped into infinite values of
${\rho}_{1}$, while infinitesimal errors are mapped into finite ones. In fact, the introduction of the magnitude of the constant term vectors in the definitions avoids the bias which would have been introduced by an a priori choice of the magnitude of the constant term added to the denominator of the convergence measures.
Then, feasibility on the primal problem is $\epsilon $achieved when the absolute value of the first ${l}_{b}$ monosemia in ${\rho}_{1}$ are smaller than $\epsilon $, i.e., assuming $b={\sum}_{i=1}^{{l}_{b}}{b}^{i}{\alpha}^{g\left(i\right)}$ and ${\rho}_{1}={\sum}_{i=1}^{l}{\rho}_{1}^{i}{\alpha}^{1i}$, a sufficient level of optimality is reached when ${\rho}_{1}^{i}<\epsilon $$\forall i=1,\phantom{\rule{0.166667em}{0ex}}\dots ,\phantom{\rule{0.166667em}{0ex}}{l}_{b}$, $l\ge {l}_{b}$. Similar considerations hold for c and $\mu $ as well.