A Set Based Newton Method for the Averaged Hausdorff Distance for Multi-Objective Reference Set Problems

: Multi-objective optimization problems (MOPs) naturally arise in many applications. Since for such problems one can expect an entire set of optimal solutions, a common task in set based multi-objective optimization is to compute N solutions along the Pareto set/front of a given MOP. In this work


Introduction
Multi-objective optimization problems (MOPs), i.e., problems where multiple incommensurable and conflicting objectives have to be optimized concurrently, arise in many fields such as engineering and finance (e.g., [1][2][3][4][5]). One important characteristic is that there is typically not one single solution to be expected for such problems (as it is the case for "classical" scalar optimization problems (SOPs)), but rather an entire set of solutions. More precisely, if the MOP contains k conflicting objectives, one can expect the solution set (the Pareto set respectively its image, the Pareto front) to form at least locally a manifold of dimension k − 1 [6]. Many numerical methods take this fact into account and generate an entire (finite) set of candidate solutions so that the decision maker (DM) obtains an overview of the possible realizations of his/her project. For such set based multi-objective optimization algorithms a natural question that arises is the goodness of the obtained solution set A (i.e., the relation of A to the Pareto set/front of the underlying MOP). For this, several performance indicators have been proposed over the last decades such as the Hypervolume indicator (HV, [7]), the Generational Distance (GD, [8]), the Inverted Generational Distance (IGD, [9]), R2 [10], DOA [11], and the averaged Hausdorff distance ∆ p [12,13]. Each such indicator assigns to a given set of candidate solutions an indicator value where I denotes the chosen performance indicator (to be minimized), Q ⊂ R n the domain of the objective functions, and N the size of the candidate solution set. Since A ⊂ R n contains N elements, it is also a vector in R N·n . Problem (1) can hence be regarded as a SOP with N · n decision variables. A popular and actively researched class of set based multi-objective algorithms is given by specialized evolutionary algorithms, called multi-objective evolutionary algorithms (MOEAs, e.g., [14][15][16][17]). MOEAs evolve entire sets of candidate solutions (called populations or archives) and are hence capable of computing finite size approximations of the entire Pareto set/front in one single run of the algorithm. Further, they are of global nature, very robust, and require only minimal assumptions on the model (e.g., no differentiability on the objective or constraint functions). MOEAs have caught the interest of many reseachers and practitioners during the last decades, and have been applied to solve many real-world problems coming from science and engineering. It is also known, however, that none of the existing MOEAs converges in the mathematical sence which indicates that they are not yet tapping their full potential. In [18], it has been shown that for any strategy where λ < µ children are chosen from µ parents, there is no guarantee for convergence w.r.t. the HV indicator. Studies coming from mathematical programming (MP) indicate similar results for any performance indicator (e.g., [19,20]) since λ < µ strategies in evolutionary algorithms are equivalent to what is called cyclic search in MP.
In this work, we propose the set based Newton method for Problem (1), where we will address the averaged Hausdorff distance ∆ p as indicator. Since ∆ p is defined via GD and IGD, we will also consider the respective set based GD and IGD Newton methods. To this end, we will first derive the (set based) gradients and Hessians for all indicators, and based on this define and discuss the resulting set based Newton methods for unconstrained MOPs. Numerical results on some benchmark test problems indicate that the method indeed yields local quadratic convergence on the entire set of candidate solutions in certain cases. The Newton methods are tested on aspiration set problems (i.e., the problem to minimize the distance of a set of solutions toward a given utopian reference set Z and the given unconstrained MOP). Further, we will show how the ∆ p Newton method can be used in a bootstrap manner to compute finite size approximations of the entire Pareto front of a given problem in certain cases. The method can hence in principle be used as standalone algorithm for the treatment of unconstrained MOPs. On the other hand, the results also show that the Newton methods-as all Newton variants-are of local nature and require good initial solutions. In order to obtain a fast and reliable solver a hybridization with a global strategy-e.g., with MOEAs since the proposed Newton methods can be viewed as particular "λ = µ" strategies-seems to be most promising which is, however, beyond the scope of this work.
The remainder of this work is organized as follows: In Section 2, we will briefly present the required background needed for the understanding of this work. In Sections 3-5, we will present and discuss the set based GD, IGD and ∆ p Newton methods, respectively. Finally, we will draw our conclusions and will give possible paths for future work in Section 6.

Background and Related Work
Continuous unconstrained multi-objective optimization problems are expressed as where F : R n → R k , F(x) = ( f 1 (x), . . . , f k (x)) T denotes the map that is composed of the individual objectives f i : R n → R, i = 1, . . . , k, which are to be minimized simultaneously.
If k = 2 objectives are considered, the resulting problem is termed a bi-objective optimization problem (BOP).
For the definition of optimality in multi-objective optimization, the notion of dominance is widely used: for two vectors a, b ∈ R k we say that a is less than b (in short: a < p b), if a i < b i for all i ∈ {1, . . . , k}. The definition of ≤ p is analog. Let x, y ∈ R n , then we say that x dominates y (x ≺ y) w.r.t (2) if F(x) ≤ p F(y) and F(x) = F(y). Else, we say that y is non-dominated by x. Now we are in the position to define optimality of a MOP. A point x * ∈ R n is called Pareto optimal (or simply optimal) w.r.t. (2) if there exists no y ∈ R n that dominates x * . We denote by P the set of all optimal solutions, also called Pareto set. Its image F(P) is called the Pareto front. Under mild conditions on the MOP one can expect that both sets form at least locally objects of dimension k − 1 [6].
The averaged Hausdorff distance ∆ p for discrete or discretized sets is defined as follows: let A = {a 1 , . . . , a N } and B = {b 1 , . . . , b M }, where A, B ⊂ R n , be finite sets. The values GD p (A, B) and IGD p (A, B) are defined as where p is an integer and where the distance of a point a i to a set B is defined by dist(a i , B) := min b∈B ||a i − b|| 2 . The averaged Hausdorff distance ∆ p is simply the maximum of these two values, We refer to [21] for an extension of the indicators to continuous sets. We stress that all of these three indicators are entirely distance based and are in particularly not Pareto compliant. A variant of IGD that is weakly Pareto compliant is the indicator DOA. Here, we are particularly interested in multi-objective reference set problems. That is, given a finite reference set Z ⊂ R k , we are interested in solving the problem min where I is one of the indicators GD p , IGD p , or ∆ p , and N is the size of the approximation.
Probably the most important reference set in our context is the Pareto front itself. For this case, ∆ p prefers, roughly speaking, evenly spread solutions along the Pareto front and is hence e.g., in accord with the terms spread and convergence as used in the evolutionary multi-objective optimization (EMO) community for a "suitable" performance indicator. As an example, Figure 1 shows some "best approximations" in the ∆ 2 sense (i.e., when using p = 2) for MOPs with different shapes of the Pareto front. More precisely, each subfigure shows a fine grain (M = 200) approximation of the Pareto front of the underlying problem (using dots), as well as the best approximations in the ∆ 2 sense (using diamonds). The latter are (numerical) solutions of (5) for N = 20, and where Z has been chosen as the Pareto front approximation. If A = {a 1 , . . . , a N } is a subset of the R n it means that each of its element a i is an element of the R n . Hence, the set A = {a 1 , . . . , a N } ⊂ R n can in a natural way also be identified as a point or vector in the higher dimensional space R N·n , i.e., A ∈ R N·n . That is, the optimization problem (5) can be identified as a "classical" scalar optimization problem that is defined in N · n-dimensional search space. A necessary condition for optimality is hence given by the Karush-Kuhn-Tucker conditions, e.g., for unconstrained problems we are seeking for sets A for those the (set based) gradient vanishes. In order to solve this root finding problem, one can e.g., utilize the Newton method. If we are given a performance indicator I together with the derivatives ∇I(A) and ∇ 2 I(A) on a set A, the Newton function is hence given by There exist many methods for the computation of Pareto optimal solutions. For example, there are mathematical programming (MP) techniques such as scalarization methods that transform the MOP into a sequence of scalar optimization problems (SOPs) [22][23][24][25][26]. These methods are very efficient in finding a single solution or even a finite size discretization of the solution set. Another sub-class of the MP techniques is given by continuation-like methods that take advantage of the fact that the Pareto set forms-at least locally-a manifold. Methods of this kind start from a given initial solution and perform a search along the solution manifold [6,[27][28][29][30][31][32][33].
Next there exist also set oriented methods that are capable of obtaining the entire solution set in a global manner. Examples for the latter are subdivision [34][35][36] and cell mapping techniques [37][38][39]. Another class of set based methods is given by multi-objective evolutionary algorithms (MOEAs) that have proven to be very effective for the treatment of MOPs [14,16,[40][41][42][43]. Some reasons for this include that are very robust, do not require hard assumptions on the model, and allow to compute a reasonable finite size representation of the solution set already in a single run.
Methods that deal with single reference points for multi-objective problems can be found in [26,44,45]. The first work that deals with a set based approach using a problem similar to the one in (5) can be found in [46], where the authors apply the steepest descent method on the Hypervolume indicator [47]. In [48], the Newton method is defined where as well the Hypervolume indicator has been used. In [49], a multi-objective Newton method is proposed that detects single Pareto optimal solutions for a given MOP. In [50], a set based Newton method is proposed for general root finding problems and for convex sets.

GD p Newton Method
In the following sections we will investigate the set based Newton methods for GD p , IGD p , and ∆ p . More precisely, we will consider the p-th powers, p > 1, of these indicators as this does not change the optimal solutions. In all cases, we will first derive the (set based) derivatives, and then investigate the resulting Newton method. For the derivatives, we will focus on p = 2 which is related to the Euclidean norm, and which hence represents the most important performance indicator of the indicator families. However, we will also state the derivatives for general integers p.
Let A = {a 1 , . . . , a N } ⊂ R n be a candidate set for (2), and Z = {z 1 , . . . , z M } ⊂ R k be a given reference set. The indicator GD p measures the averaged distance of the image of A and Z: Hereby, we have used the notation and assume Z to be fixed for the given problem (and hence, it does not appear as input argument). In the following, we have to assume that for every point F(a i ) there exists exactly one closest element in Z. That is, ∀ i = 1, . . . , N there exists an index j i ∈ {1, . . . , M} such that:

Derivatives of GD
Otherwise, the gradient of GD p is not defined at A. If condition (9) is satisfied, then (7) can be written as follows: and for the special case p = 2 we obtain The gradient of GD 2 2 at A is hence given by where J(a i ) denotes the Jacobian matrix of F at a i for i = 1, . . . , N. We call the vector the i-th sub-gradient ( The sub-gradient is defined here as part of the gradient that is associated to an element a of A, and is not equal to the notion of the sub-gradient known in non-smooth optimization. ) of GD 2 2 with respect to a i ∈ A. Note that the sub-gradients are completely independent of the location of the other archive elements a j ∈ A.
If the given MOP is unconstrained, then the first order necessary condition for optimality is that the gradient of GD 2 2 vanishes. This is the case for a set A if all sub-gradients vanish This happens if for each a i either of a i is equal to one of the elements of the reference set. This is for instance never the case if Z is chosen utopian.
, if Z is again utopian) then a i is even a Karush-Kuhn-Tucker point. See Figure 2 for a geometrical interpretation of this scenario.  2 2 We first define the map g : R n → R n as

Hessian ofGD
where α (i) is as in (15). In order to find an expression of the Hessian matrix, we now derive Equation (16) as follows: where Thus, the Hessian matrix of GD 2 2 is which is a block diagonal matrix.

Gradient and Hessian for General p > 1
As mentioned above, we focus here on the special case p = 2. The above derivatives, however, can be generalized for p > 1 as follows (assuming that Z is an utopian finite set to avoid problems when p < 4): the gradient is given by and the Hessian by where for i = 1, 2, . . . , N.
3.3. GD 2 2 -Newton Method After having derived the gradient and the Hessian we are now in the position to state the set based Newton method for the GD 2 2 indicator: The Newton iteration can in practice be stopped at a set A f if for a given tolerance tol > 0. In order to speed up the computations one may proceed due to the structure of the (sub-)gradient as follows: for each element a i of a current archive A with one can continue the Newton iteration with the smaller setĀ = A\{a i } (and later insert a i into the final archive).
We are particularly interested in the regularity of ∇ 2 GD 2 2 at the optimal set, i.e., at a set A * that solves problem (5) for I = GD 2 2 . This is the case since if the Hessian is regular at A * -and if the objective function is sufficiently smooth-we can expect the Newton method to converge locally quadratically [51].
Since the Hessian is a block diagonal matrix it is regular if all of its blocks are regular. From this we see already that if Z is not utopian, we cannot expect quadratic convergence: assume that one point z ∈ Z is feasible, i.e., that there exists one x ∈ Q such that F(x) = z. We can assume that x is also a member of the optimal set A * , say a i = x. Then, we have that the weight vector l ∇ 2 f l (a i ) = 0. Thus, the block matrix reduces to J(a i ) T J(a i ) those rank is at most k. The block matrix is hence singular, and so is the Hessian of GD 2 2 at A * .
In the case all individual objectives are strictly convex, the GD 2 2 Hessian is positive definite (and hence regular) at every feasible set A, and we can hence expect local quadratic convergence. Proposition 1. Let a MOP of the form (2) be given whose individual objectives are strictly convex, and let Z be a discrete utopian set. Then, the matrix ∇ 2 GD 2 2 (A) is positive definite for all feasible sets A.
Since Z is utopian, it is α (i) = 0, and all of its elements are non-negative. Further, since all individual objectives f l are strictly convex, the matrices ∇ 2 f l (a i ) are positive definite, and hence also the matrix W α (a i ).

Example
We consider the following convex bi-objective problem (27) Figure 3 shows the Pareto front of this problem together with the reference set Z that contains 30 elements (black dots). The set Z is a discretization of the convex hull of individual minima (CHIM, [23]) of the problem that has been shifted left down. Further, it shows the images of the Newton steps of an initial set A 0 that contains 21 elements. As it can be seen, all images converge toward three solutions that are placed in the middle of the Pareto front (which is owed to the fact that Z is discrete. If Z would be continuous, all images would converge toward one solution). This example already shows that the GD 2 2 Newton method is of restricted interest as standalone algorithm. The method will, however, become important as part of the ∆ p -Newton method as it will become apparent later on. Table 1 shows the respective GD 2 2 values plus the norms of the gradients which indicate quadratic convergence. The second column indicates that the images of the archives converge toward the Pareto front as anticipated.   (27).

IGD p Newton Method
The indicator IGD p computes how far, on average, the discrete reference set Z is from a given archive A, and is defined as where d(z i , F(A)) is given by

Gradient of IGD p
Similar to GD, we will also have to assume that for all i = 1, . . . , M there exists an index j i ∈ {1, . . . , N} such that: since otherwise the gradient of IGD p is not defined. Then, using Equation (30), Equation (28) can be written as follows: From now on we will consider IGD 2 2 which is given by In order to derive the gradient of IGD 2 2 , let I l := {i : j i = l}, l ∈ {1, . . . , N}, be the set formed by the indexes i ∈ {1, 2, . . . , M} that are related to j i . In other words, this set gives us the relation of the elements of Z related to each image F(a l ) (an example of this relation can be found in Figure 4). Then, the sub-gradient of IGD 2 2 at point a l is given by where m l =| I l | . Finally, the gradient of IGD 2 2 can be expressed as It is worth to notice that the sub-gradients depend on the location of the other archive elements which implies a "group motion" (which is in contrast to the gradient of GD 2 2 ). We next consider under which conditions the gradient of IGD 2 2 vanishes. If ∇IGD 2 2 (A) = 0, then for all l = 1, . . . , N we have that where C is the centroid of z i 's. Then, note that if: 1. rank(J(a l )) = k, then F(a l ) = ∑ i∈I l z i m l = C l .

2.
rank(J(a l )) = k − 1, then F(a l ) − C l is orthogonal to the linearized image of F at F(a l ), and orthogonal to the linearized Pareto front at F(a l ) in case F(a l ) − C l ≥ p 0 and F(a l ) − C l = 0 (see Figure 5 for such a scenario).

Hessian Matrix of IGD p
Analog to the derivation of GD p − Hessian, we first define the map g : R n → R n as g(a l ) := J(a l ) T (m l F(a l ) − ∑ i∈I l z i ). (37) Now, let ∑ i∈I l z i = y = (y1, . . . , y k ) T . Then Then, we derive Equation (38) as follows: Thus, the Hessian matrix of IGD 2 2 is given by which is a block diagonal matrix.

Gradient and Hessian for General p > 1
The above derivatives can be generalized for p > 1 as follows: the gradient is given by and the Hessian by where 4.4. IGD 2 2 Newton Method After having derived the gradient and the Hessian of IGD 2 2 we are now in the position to state the respective set based Newton method.
Similarly to the GD Newton method, the iteration can be stopped at an set A f if for a given tolerance tol > 0, and the iteration for an element a i can be stopped when One important special case is when the image F(a l ) of a point a l of a set A is not the nearest point of any element of Z, i.e., if m l = 0 (see Figure 6). In that case, the l-th sub-gradient vanishes, which means that the point a l will remain fixed under further iterations of the IGD Newton method. One possibility is hence to neglect such points in subsequent iterations, and to continue with the reduced set. Note also that dominance and distance are two different concepts. That is, if all points of a set A are mutually non-dominated, this does give an implication on m l , see Figure 7 for two examples.  Similar to GD, we are interested in the regularity of ∇ 2 IGD 2 2 at the optimal set since we can in that case expect local quadratic convergence. By the structure of the Hessian we have singularity in the following cases:
if one element z l of Z is feasible (since then Dg(a l ) = J(a l ) T J(a l ) which has a rank ≤ k, and under the assumption that k < n).
Similar as for GD, the IGD-Hessian is positive definite for strictly convex problems and utopian reference sets if in addition m l ≥ 1 for all l ∈ {1, . . . , N}.

Proposition 2.
Let a MOP of the form (2) be given whose individual objectives are strictly convex, and let Z be a discrete utopian set. Further, let m l ≥ 1 for all l ∈ {1, . . . , N}, and A be feasible. Then, the matrix ∇ 2 IGD 2 2 (A) is positive definite.
Proof. Let l ∈ {1, . . . , N}, and for ease of notation i∈I l z i = {Z 1 , . . . , Z m l }. Since all Z i 's are utopian we have as well as m l F(a l ) = ∑ i∈I l z i (and hence α (l) = 0). The rest is analog to the proof of Proposition 1.

Examples
We first consider again the convex BOP (27) as for the previous example (see Figure 8), but now using the IGD-Newton method. Already after one iteration step for 8 out of the 21 elements it is m l = 0 (denoted by red dots), and we continue the Newton method with the resulting 13-element subset. For this, we obtain quadratic convergence toward the ideal set (for N = 13) as it can be observed from Table 2.   (27), see Figure 8. We next consider the following BOP [52] f 1 , those Pareto front is concave. We apply the IGD-Newton method on the sets A 0 with |A 0 | = 21 and Z with |Z| = 30 as shown in Figure 9. For this example, only six elements of A 0 are closest to one element of Z. Table 3 shows that the convergence is much slower than for the previous example. Finally, we consider the BOP [53] f 1 , f 2 : For λ = 0.85 and Q = [−1.5, 1.5] 2 the Pareto front containts a "dent" and is hence convex-concave. Figure 10 shows the setting and Table 4 the respective convergence behavior. Again, we "lose" elements from A 0 since for them there are no elements of Z that are closest to them.  Table 3. Numerical results of IGD 2 2 -Newton method for BOP (50), see Figure 9.  Table 4. Numerical results of IGD 2 2 -Newton method for BOP (51), see Figure 10.

∆ 2 -Newton Method
Based on the results of the previous two sections we are now in the position to state the set based Newton method for the ∆ 2 2 indicator.

∆ 2 -Newton Method
Since ∆ 2 2 (A i , Z) is defined by the maximum of GD 2 2 (A i , Z) and IGD 2 2 (A i , Z) we simply check in each iteration which of the latter two values is larger, and apply either the GD or the IGD-Newton step accordingly.
The properties and the realization of the method are in principle as for the GD and the IGD-Newton method. The only difference, at least theoretically, is a possible loss of smoothness since ∆ p is defined by the maximum of two functions. Such issues, at least for convergence, however, are only to be expected in case GD(A * , Z) is equal to IGD(A * , Z) for a reference set Z and the respective optimal archive A * . The cost for the realization one Newton step for each of the three indicators is O(Nn 2 ) in storage when taking into account the block structure of the Hessians since N matrices of dimension n × n have to be stored, and O(Nn 3 ) in terms of flops since N linear equation systems of dimension n have to be solved.

Examples
We will in the following demonstrate the applicability of the ∆ 2 2 -Newton method on several methods. For this, we first consider the same three examples and settings as for the IGD 2 2 -Newton method presented above. Figures 11-13 show some numerical results of the ∆ 2 -Newton method using the same initial archives A 0 and reference sets Z as in the previous section. As it can be seen, in all cases the method achieved much better approximations as for the sole usage of the IGD-Newton method (as well as the GD-Newton method). The convergence behaviors can be seen in Tables 5-7. In all cases, the GD value is the greater one in the initial steps of the method. After some iterations (and switches from GD to IGD and vice versa), however, the IGD value becomes eventually the largest one so that the ∆ p -Newton method eventually coincides with the IGD-Newton method. In comparison to the results of the IGD-Newton method, however, it becomes apparent that the GD-Newton steps are in fact important to obtain better overall approximations.  (27), see Figure 11.   Table 6. Numerical results of the ∆ 2 2 -Newton method on BOP (50), see Figure 12.    (51), see Figure 13. 0.000000000000000 0.000782533238004 0.325312858713118 IGD We next consider an example where we use the unconstrained three-objective problem that is defined by the following map For this problem, we consider the following scenario: assume there are two decision makers which each have their own preference vector (denote here by z 1 = (6.08, −2.08, 2.72) T and z 2 = (0.32, 3.68, −3.04) T ). As compromise it could be interesting to consider the line segment that connects z 1 and z 2 (denote by Z) and to compute a set of solutions along the Pareto front that is near (in the Hausdorff sense) to this aspiration set. Figure 14 and Table 8 show the numerical result of the Newton method for an initial set consisting of 7 elements. As anticipated, the final set resembles a curve along the Pareto front with minimal distance to Z, and may be used for the decision making process. This concept can of course be extended to general sets. For instance, one can choose the triangle Z that is defined by the three vertices z 1 = (6.08, −2.08, 2.72) T , z 2 = (−2.56, 6.56, −0.16) T and z 3 = (0.20, 3.79, −2.92) T (e.g., if a third decision maker is involved). Figure 15 and Table 9 show such a numerical result. For sake of better visualization, we only show the edges Z instead of the complete triangle. As it can be seen, the obtained solutions resemble to a certain extent a bended triangle along the Pareto front. From Table 9 it follows that for the final iteration the value ∇∆ 2 2 (A 6 ) is already very close to zero which indicates that a local solution has been computed. The solution, however, does not seem to be perfectly shaped which is due to the fact that the problem to locate solutions along the Pareto front with respect to the given reference set is highly multi-modal (the "perfect" shape is associated to the global solution of Problem (5)). In order to obtain better results it is hence imperative to hybridize the set based Newton methods with global multi-objective solvers (as e.g., multi-objective evolutionary algorithms) which is beyond the scope of this work.   (53), see Figure 14 .  Table 9. Numerical results of ∆ 2 2 -Newton method for MOP (53) when Z is a triangle, see Figure 15.

A Bootstrap Method for the Computation of the Pareto Front
It is known that the proper choice of reference points/sets is a non-trivial task for the use of performance indicators in general when targeting at the entire solution set (e.g., [54][55][56]). In the following we show some numerical results of a bootstrap method that allows to a certain extent to compute approximations of the entire Pareto fronts of a given MOP without prior knowledge of this set. For this, we adapt the idea proposed in [57] to the context of the set based Newton method: given a performance indicator and a set based SOP of the form (5), one can iteratively approximate the Pareto front of a given problem using the Newton method via the following steps:

1.
Compute the minima x * i of the individual objectives f i , i = 1, . . . , k. Let y * i = F(x * i ), and letZ 0 be the convex hull of the y * i 's (also called convex hull of individual minima (CHIM) [23]). Let δ 0 > 0 and set where δ 0 is ideally large enough so that Z 0 is utopian. Compute a Newton step using Z 0 leading to the set of candidate solutions A (0) .

2.
In step l of the iteration, use the set A (l−1) computed in the previous iteration to compute a setZ l . This can be done via interpolation of the elements of A (l−1) so thatZ l only contains mutually non-dominated elements. As new reference set use where δ l < δ l−1 . Compute a Newton step using Z l leading to A (l) .
For k = 2 objectives, the CHIM is simply the line segment that connects y * 1 and y * 2 . Figures 16-18 and Tables 10-12 show the results of this bootstrapping method on the MOPs (27), (50), and (51), respectively. Table 13 shows the number of function, Jacobian, and Hessian calls that have been spent for each problem. In our computations, we have chosen δ 0 sufficiently large, and δ l = 1 2 δ l−1 for the shift parameter. The results show that the entire Pareto fronts can for these examples be approximated via using the Newton method together with the bootstrapping method. While the final approximations can considered to be "good" for the problems with the convex and convex/concave Pareto fronts, the final solution for MOP (50) that contains a concave Pareto front is not yet satisfying. Table 11 indicates that the solutions do not even converge toward a local solution (even if more iteration steps are performed). We conjecture that the problem results from the multi-modality of the test function which encourages us further to hybridize the set based Newton method with a global strategy in the future.    Figure 17. Different iterations of the ∆ 2 2 -Newton method to obtain the Pareto front of MOP (50) via the bootstrapping method.  We finally consider a BOP with higher dimensional decision variable space: the bi-objective problem minus DTLZ2 [58] is defined as where we have chosen n = 20. Figure 19 shows an initial candidate set as well as the final result of the Newton method together with the bootstrapping. In order to obtain the final result, 8 iteration steps were needed using 1415 function, 700 Jacobian, and 350 Hessian calls. Note that the initial set contains some dominated solutions that do not have an influence on the IGD 2 value. Hence, also in this case the use of GD 2 helped to push these solutions toward the Pareto front. Theoretical PF Newton method Figure 19. Initial candidate set (left) and numerical result of the ∆ 2 -Newton method on minus DTLZ2 (right).

Conclusions and Future Work
In this work, we have considered a set based Newton Method for the ∆ p indicator for unconstrained multi-objective optimization problems. Since ∆ p is constructed by GD p and IGD p we have also considered the set based Newton method for these two indicators. To this end, we have first derived the set based gradients and Hessians, and have based on this formulated and analyzed the Newton methods. Numerical results on selected test problems have revealed the strengths and weaknesses of the resulting methods. For this, we have mainly considered aspiration set problems (i.e., the problem to minimize the indicator distance of a set to a given utopian reference set) but also shown a bootstapping method that allows to a certain extend to compute finite size approximation of the entire Pareto front without prior knowledge of this set. The results have shown that the method may indeed converge quadratically to the desired set, however, also-and as anticipated-that the Newton method is only applicable locally. It is hence imperative to hybridize the method with a global search strategy such as an evolutionary multi-objective optimization algorithm in order to obtain a fast and reliable algorithm for the treatment of such problems. The latter, however, is beyond the scope of this work and left for future investigations. Another interesting path for future research would be to extend the proposed Newton methods to constrained multi-objective optimization problems.