The Pareto Tracer for General Inequality Constrained Multi-Objective Optimization Problems

: Problems where several incommensurable objectives have to be optimized concurrently arise in many engineering and ﬁnancial applications. Continuation methods for the treatment of such multi-objective optimization methods (MOPs) are very efﬁcient if all objectives are continuous since in that case one can expect that the solution set forms at least locally a manifold. Recently, the Pareto Tracer (PT) has been proposed, which is such a multi-objective continuation method. While the method works reliably for MOPs with box and equality constraints, no strategy has been proposed yet to adequately treat general inequalities, which we address in this work. We formulate the extension of the PT and present numerical results on some selected benchmark problems. The results indicate that the new method can indeed handle general MOPs, which greatly enhances its applicability.


Introduction
In many real-world applications, the problem occurs that several conflicting and incommensurable objectives have to be optimized concurrently. As general example, in the design of basically any product, both cost (to be minimized) and quality (to be maximized) are relevant objectives, among others. Problems of that kind are termed multi-objective optimization problems (MOPs). In the case all of the objectives are continuous and in conflict with each other, it is known that there is not one single solution to be expected (as it is the case for scalar optimization problems, i.e., problems where one objective is considered) but an entire set of solutions. More precisely, one can expect that the solution set-the Pareto set, and, respectively its image, the Pareto front-forms at least locally an object of dimension k − 1, where k is the number of objectives involved in the problem. Due to this, "curse of dimensionality" problems with more than, e.g., four objectives are also called many objective optimization problems (MaOPs).
In the literature, many different methods for the numerical treatment of MOPs and MaOPs can be found (see also the discussion in the next section). One class of such methods is given by specialized continuation methods that take advantage of the fact that the solution set forms-at least locally and under certain mild assumption on the model as discussed in [1]-a manifold. Continuation methods start with one (approximate) solution of the problem and perform a movement along the Pareto set/front of the given M(a)OP via considering the underdetermined system of equations that is developed out of the Karush-Kuhn-Tucker (KKT) equations of the problem. By construction, continuation methods are of local nature. That is, if the Pareto set consists of different connected components, such methods will have to be fed with several starting points in order to obtain approximations of the entire solution set. On the other hand, continuation methods are probably most effective locally (i.e., within each connected component). Thus far, several multi-objective continuation methods have been proposed. Most of these continuation methods, however, are designed for or

Background and Related Work
In this section, we briefly state the main concepts and notations that are used for the understanding of this work (for details, we refer to, e.g., [5,6]).
We consider here continuous multi-objective optimization problem (MOPs) that can be defined mathematically as min where F : . . , f k (x)) T is the map of the k individual objectives f i : Q ⊂ R n → R. We assume that all objectives and constraint functions are twice continuously differentiable. The domain Q of the functions is defined by the equality and inequality constraints of (1): If a point x ∈ R n satisfies all constraints of (1), i.e., if x ∈ Q, we call this point feasible. Points x ∈ Q are called infeasible. If k = 2 objectives are considered, the problem is also termed a bi-objective optimization problem (BOP).
We say that a point x ∈ Q dominates a point y ∈ Q (in short: . . , k, and there exists an index j such that f j (x) < f j (y). A point x * is called Pareto optimal or simply optimal if there does not exist a vector y ∈ Q that dominates x * . A point x * ∈ Q is called locally optimal if there does not exist a vector y ∈ Q ∩ N(x * ) that dominates x * , where N(x * ) is a neighborhood of x * . The set P Q of all Pareto optimal solutions is called the Pareto set, and its image F(P Q ) the Pareto front. In [1], it has been shown that one can expect that both Pareto set and front typically form (k − 1)-dimensional objects under certain (mild) conditions on the problem.
If all objectives and constraint functions are differentiable, local optimal solutions can be characterized by the Karush-Kuhn-Tucker (KKT) equations [7,8]: Suppose that x * is locally optimal with respect to (1). Then, there exist Lagrange multipliers α ∈ R k , λ ∈ R p and γ ∈ R m such that the following conditions are satisfied Multi-objective optimization is an active field of research, and thus far many numerical methods have been proposed for the treatment of such problems. There exist for instance many methods that are designed to compute single solutions such as the weighted sum method [9], the -constraint method [5,10], the weighted metric and weighted Tchebycheff method [5,11,12], as well as reference point problems [13][14][15]. All of these methods transform the given MOP into a scalar optimization problem (SOP) that can to a certain extent to include users' preferences. These methods can either be used as standalone algorithm (i.e., for the computation of single solutions) or be used to obtain a finite size approximation of the entire Pareto set/front of the given MOP via utilizing a clever sequence of these SOPs [5,[16][17][18][19].
Finally, a third class of numerical solvers for MOPs is given by specialized continuation methods that take advantage of the fact that the Pareto set/front of a given problem forms at least locally a manifold of a certain dimension. Methods of this kind start with a given (approximate) solution and perform a movement along the Pareto set/front of the problem. The first such method is proposed in [1], which can be applied to unconstrained and equality constrained MOPs of any number k of objectives, while no strategies are reported on how to treat inequalities. ParCont [42,43] is a rigorous predictor-corrector method that is based on interval analysis and parallelotope domains. The method can deal with equality and inequality constraints, but it is restricted to bi-objective problems. This restriction also holds for the method presented in [44], which has been designed to provide an equispaced approximation of the Pareto front. The Zigzag method [45][46][47] obtains Pareto front approximations via alternating optimizing one of the objectives. This approach is also limited to the treatment of bi-objective problems.
In [48], a continuation method is presented that is applicable to box-constrained BOPs. In [49], a variant of the method of Hillermeier is presented that is designed for the treatment of highdimensional problems.
Recently, the Pareto Tracer (PT) was proposed by Martin and Schütze [2]. Similar to the method of Hillermeier, PT addresses the underdetermined nonlinear system of equations that is induced by the KKT equations. However, unlike the method of Hillermeier, the PT aims to separate the decision variables from the associated weight (or Lagrange) vectors whenever possible, leading to significant changes. The latter is due to the fact that the nonlinearity of the equation system can be significantly higher in the compound space compared to the corresponding system that is only defined in decision variable space. As a by-product, the chosen approach allows to compute the tangent space of both Pareto set and front at every given regular point x. In [50], elements of the PT are used to treat many objective optimization problems (i.e., MOPs with more than, e.g., five objectives). Thus far, PT is only applicable to box and equality constrained problems which limits its application. In the following, we propose and discuss an extension of this method to adequately treat general MOPs, i.e., MOPs that in particular contain general inequalities.

Adapting the Pareto Tracer for General Inequality Constrained MOPs
In this section, we adapt the PT so that is can handle general inequality constraints. The core is the predictor-corrector step that generates from a given candidate solution x i the following candidate x i+1 that satisfies the KKT conditions, and so that F(x i+1 ) − F(x i ) defines a pre-described movement in objective space along the set of KKT points.
Assume we are given a MOP of form (1) and a feasible point x 0 that satisfies the KKT conditions (3), where α i > 0, i = 1, . . . , k. Let > 0 and define by the set of indices corresponding to the nearly active inequalities at Further, let and α ∈ R n , λ ∈ R p , and γ ∈ R s be the solution of min α,λ,γ Note that (7) yields the Lagrange multipliers at x 0 for = 0 if x 0 is a KKT point and if all active inequalities are regarded as equalities. Using α, λ and γ, define the matrix To compute a predictor direction ν µ ∈ R n , we solve the the system Note that system (9) depends on µ ∈ R k . Before we specify this vector, we first simplify (9). Denote by then (9) is equivalent to Let d ∈ R k . It is straightforward to show that for a vector ν µ d that solves (11) it holds That is, (infinitesimal) small steps from x 0 into direction ν µ d (in decision variable space) will lead to a movement from F(x 0 ) into direction d (in objective space). It remains to select a suitable choice for d. Since α is orthogonal to the linearized Pareto front at F(x 0 ) [1], a suggesting choice is hence by (13) to take d orthogonal to α. For this, let where Q ∈ R k×k is orthogonal and R ∈ R k×1 , be a QR-factorization of α. Then, any vector can be chosen so that a movement in direction ν µ d (in decision variable space) leads to a movement from F(x 0 ) along the Pareto front. Note that the second equation in (12) reads as ∑ k i=1 µ i = 0. Hence, for the special case of a bi-objective optimization problem (i.e., k = 2), there are-after normalization-only two choices for µ: Analog to Martin and Schütze [2], one can show that µ (1) corresponds to a "right down" movement along the Pareto front while µ (2) corresponds to a "left up" movement along the Pareto front.
After selecting the predictor direction ν µ , the question is how far to step in this direction. Here, we follow the suggestion made by Hillermeier [1] and use the step size for a (small) value τ > 0 so that For the computations presented below, we make the following modifications: instead of W α,β,γ , we use the matrix More precisely, for the computation of ν µ , we use the system and to obtain µ d we solve We have observed similar performance for both approaches, while the usage of W α compared to W α,β,γ comes with the advantage that no Hessians for any of the constraint functions have to be computed.
Given a predictor pointx the task of the upcoming corrector step is to find a KKT point x 1 that is ideally near tox 1 . For this, we suggest to apply the multi-objective Newton method proposed in [51]. In particular, we first compute the solution (ν 1 ,δ) of the following problem ν 1 is indeed the Newton direction for equality constrained MOPs as suggested in [2]. To adequately treat the involved inequalities, however, we propose to use the solution of the following problem: Note that problem (24) is identical to problem (23) except that |I c ( )| inequalities are treated as equalities atx 1 . In particular, we propose to add an index i to I c ( ) if (a) g i (x 1 ) > , i.e., ifx 1 significantly violates the constraint g i ; or (b) g i (x 1 ) ∈ (− , ) and ∇g(x 1 ) Tν 1 > 0, i.e., if x i is either active but g i nearly active at x i or if x i already slightly violates g i and a step into directionν 1 would lead to (further) violation of this constraint, indicated by ∇g(x 1 ) Tν 1 > 0.
Algorithm 1 shows the pseudo code to build the index set I c ( ) at a predictor pointx i . Given the Newton direction, the Newton step can then be performed via using the Armijo rule described in [51], as done in our computations. The set I c ( ) is only computed once, it and remains fixed during the Newton iteration in the corrector step.
Algorithm 2 shows the pseudo code of one predictor-corrector step of the PT for general (equality and inequality constrained) MOPs. For bi-objective problems, µ can be chosen as in (16) leading either to a "left up" or "right down" movement, as discussed above. The algorithm has to be stopped if α is either close enough to (1, 0) T or (0, 1) T , depending of course on the chosen search direction. For k > 2, one can use the box partition in objective space as described in [2] in order to mark the regions of the Pareto front that have already been "covered" during the run of the algorithm.
For the realization of the predictor-corrector step several linear systems of equations have to be solved, the largest one being (20). The cost is hence O((n + p + s) 3 ) in terms of flops and O((n + p + s) 2 ) in terms of storage. Further, for the corrector step the SOP (7) has to be solved that contains k + p + s decision variables. For the computation of the Newton direction, the SOPs (23) and (24) have to be solved for the first Newton iteration that contains both n + 1 decision variables. For further Newton iterations, only SOP (24) has to be solved since the index set I c ( ) remains fixed within a corrector step.
Finally, note that, if the method is realized as described above, the Hessians of all individual objectives have to be computed at each candidate solution (including at each Newton iteration). Using ideas from quasi-Newton methods, one can approximate the Hessians so that only gradient information is needed at each candidate solution, as described in [2].  As a demonstration example, we consider the problem (25) Figure 1a shows the Pareto set of the above problem where the two inequalities have been left out (i.e., the line segment connecting (−3, 2) T and (0, −3) T ), the sets g i (x) = 0, i = 1, 2, as well as the Pareto set of this problem which is indeed the result of the PT. As starting point, we chose a point which significantly violates both constraints (and, hence, |I c ( )| = 2 for = 1e − 4). An application of the above-described Newton method leads to the point on the Pareto set with the smallest x 1 -value, which is in fact the initial point for the PT. During the run of PT, first only g 2 is "active" in the corrector step (i.e., I c ( ) = {2}), later none of the constraints (in the intersection of the Pareto fronts of the constrained and the unconstrained MOP), and finally only g 1 .

Numerical Results
In this section, we further demonstrate the behavior of the PT on five benchmark problems that contain inequality constraints. For all problems, we used the quasi-Newton variant of PT that only required function and Jacobian information (and no Hessians). To compare the results, we also show the respective results obtained by the normal boundary intersection (NBI, [16]), the -constraint method [5], and the multi-objective evolutionary algorithm NSGA-II. For NBI and the -constraint method, we used the code that is available at [52], and for NSGA-II the implementation of PlatEMO [53]. Regrettably, no comparison to a multi-objective continuation method can be presented since none of the respective codes are publicly available. For a comparison of the PT and the method of Hillermeier on box and equality constrained MOPs, we refer to [2]. We chose also to include a comparison to the famous NSGA-II since it is widely used and state-of-the-art for two-and three-objective problems as we consider here. We stress that the comparisons only show (on the first four test problems) that PT outperforms NSGA-II on these particular cases where the Pareto front consists of one connected component. For highly multi-modal functions where the Pareto set/front falls into several connected components, NSGA-II will certainly outperform the (standalone) PT. A fair comparison can only be obtained when integrating PT into a global heuristic (as, e.g., done in [41]). This is certainly an interesting task, however, beyond the scope of this work.
To compare the results, we compare the total number of function evaluations used for each algorithm on each problem. For this, each Jacobian call is counted as four function calls assuming that the derivative is obtained via automatic differentiation [54]. To measure the quality of the approximations, we used the averaged Hausdorff distance ∆ 2 [55][56][57]. Since NSGA-II has stochastic components, we applied this algorithm for each problem 10 times and present the median result (measured by ∆ 2 ).

Binh and Korn
Our first test example is a modification of the box-constrained BOP from Binh and Korn [58], where we add two inequality constraints as follows: Table 1 shows the design parameters that have been used by NSGA-II for this problem, Table 2 shows the computational efforts and the obtained approximation quality for each algorithm, and Figures 2 and 3 show the obtained Pareto set and front approximations, respectively. For PT, we chose τ = 0.6 leading to 52 solutions along the Pareto set/front in 4.48 s (the computations have been done on a Ubuntu 20.04.1 LTS system with an Intel Core i7-855OU 1.80 GHz x 8 CPU and 12 GB of RAM). We then applied NBI and the -constraint model using this number of sub-problems. For NSGA-II, we took the population size 100, which is a standard value for this algorithm. The results show nearly perfect Pareto front approximations (at least from the practical point of view) for all algorithms, which is also reflected by the low ∆ 2 values that are very close to the optimal value of 0.6 (at least for PT, defined by τ). In terms of function evaluations, PT clearly wins over NBI and the -constraint method. A comparison to NSGA-II is not possible due to the choice of the population size.

Chakong and Haimes
Next, we consider the bi-objective problem of Chankong and Haimes [59], which contains next to the box constraints one linear and one nonlinear inequality. Table 3 shows the parameter values used for the application of NSGA-II, Table 4 the computational efforts and the approximation qualities, and Figures 4 and 5 the obtained approximations. We used τ = 1 for PT, and proceeded as for the previous example for the other methods. The results are also similar to the previous example: all methods are capable of detecting a nearly perfect Pareto front approximation, and the overall cost is significantly less for PT, in 5.96 s.   (27).
Population Size 100 Number of generations 30 Probability of crossover 0.9 probability of mutation 0.5 Table 4. Computational efforts and approximation qualities for problem (27).

Tamaki
Next, we considered a MOP with three objectives (28): Both the Pareto set and front for this problem are a part of the unit sphere. Table 5 shows the design parameters for NSGA-II, Table 6 shows the computational effort and the approximation quality for each algorithm, and Figure 6 shows the Pareto front approximations (the respective Pareto set approximations will look identically, albeit in x-space). For this problem, τ = 0.05 was used. The implementation of the -constrained method did not yield a result. On the Tamaki problem, PT performs better than the other algorithms both in approximation quality and in the overall computational cost.  Table 5. Parameters used by NSGA-II for MOP (28).
Population Size 300 Number of generations 150 Probability of crossover 0.9 Probability of mutations 0.5 Table 6. Computational efforts and approximation qualities for problem (28).

BCS
We next considered a second three-objective problem that contains next to one inequality also a linear equality constraint: Table 7 presents the design parameters used by NSGA-II, Table 8 shows the computational effort and the approximation quality for each algorithm, and Figures 7 and 8 present the Pareto front approximation of PT (using τ = 2), which took 16.79 s. For this example, none of the other methods were able to yield feasible solutions, where we counted a solution x to be feasible if |x 1

Osykzka and Kundu
As last example, we considered the bi-objective problem of Osykzka and Kundu [60], which has six decision variables and contains six inequality constraints in addition to the box constraints: While the Pareto front of this problem is connected, its Pareto set consists of three different connected components. Hence, PT is not able to compute an approximation of the entire Pareto front with only one starting point. Figure 9a shows the result of PE for τ = 2 using the three starting points x 0,1 = (0.60, 1.50, 1.0, 0.00, 1.00, 0.04) T , x 0,2 = (0.00, 2.00, 2.20, 0.00, 1.00, 0.00) T , x 0,3 = (5.00, 1.00, 5.00, 0.00, 1.00, 0.01) T .
The computational time to obtain this result was 12.98 s. Figure 9b shows a numerical result of NSGA-II using the design parameters shown in Table 9. The obtained solutions "under" the Pareto front can be explained by the tolerance of 1 × 10 −4 that was used to measure feasibility (while 1 × 10 −8 was used for PT). Table 10 shows the computational effort for both methods. Needless to say, this represents by no means a comparison of the two methods. Instead, this should be rather seen as a motivation to hybridize PT with a global search strategy in order to obtain a fast and reliable multi-objective solver, which we leave for future studies. Table 9. Parameters used by NSGA-II for MOP (30).
Population Size 435 Number of generations 50 Probability of crossover 0.9 Probability of mutations 0.5 Table 10. Computational efforts and approximation qualities for problem (30).

Conclusions and Future Work
In this paper, we extend the multi-objective continuation method Pareto Tracer (PT) for the treatment of general inequality constraints. To this end, the predictor-corrector step is modified as follows: in the predictor, all nearly active inequalities are treated as equalities. In the following corrector step, the main challenge is to identify the inequalities for which the predictor solution is either nearly active or slightly violates the constraint that has to be considered, namely the equality constraint in the Newton method, and this is done in a bootstrap manner. We formulate the resulting algorithm and show some numerical results on several benchmark problems, indicating that it can reliably handle inequality (and equality) constrained MOPs. We further present comparisons to some other numerical methods. The results show that the extended PT can indeed reliably handle general MOPs (and in particular general inequalities). However, the method is-by construction-of local nature and restricted to the connected component of the solution set for which one initial solution is available. One interesting task is certainly to hybridize PT with a global solver such as a multi-objective evolutionary algorithm and to compare the resulting hybrid against other methods with respect to their ability to compute the entire global Pareto set/front of a given MOP. This is beyond the scope of this work and has been left for future work.