## 1. Introduction

Multi-objective optimization problems (MOPs), i.e., problems where multiple incommensurable and conflicting objectives have to be optimized concurrently, arise in many fields such as engineering and finance (e.g., [

1,

2,

3,

4,

5]). One important characteristic is that there is typically not one single solution to be expected for such problems (as it is the case for “classical” scalar optimization problems (SOPs)), but rather an entire set of solutions. More precisely, if the MOP contains

k conflicting objectives, one can expect the solution set (the Pareto set respectively its image, the Pareto front) to form at least locally a manifold of dimension

$k-1$ [

6]. Many numerical methods take this fact into account and generate an entire (finite) set of candidate solutions so that the decision maker (DM) obtains an overview of the possible realizations of his/her project. For such set based multi-objective optimization algorithms a natural question that arises is the goodness of the obtained solution set

A (i.e., the relation of

A to the Pareto set/front of the underlying MOP). For this, several performance indicators have been proposed over the last decades such as the Hypervolume indicator (HV, [

7]), the Generational Distance (GD, [

8]), the Inverted Generational Distance (IGD, [

9]), R2 [

10], DOA [

11], and the averaged Hausdorff distance

${\mathsf{\Delta}}_{p}$ [

12,

13]. Each such indicator assigns to a given set of candidate solutions an indicator value according to the given MOP. Hence, if the MOP and the size of the candidate solution set are fixed, the detection of the “best” candidate solution can be expressed by the problem

where

I denotes the chosen performance indicator (to be minimized),

$Q\subset {\mathbb{R}}^{n}$ the domain of the objective functions, and

N the size of the candidate solution set. Since

$A\subset {\mathbb{R}}^{n}$ contains

N elements, it is also a vector in

${\mathbb{R}}^{N\xb7n}$. Problem (

1) can hence be regarded as a SOP with

$N\xb7n$ decision variables.

A popular and actively researched class of set based multi-objective algorithms is given by specialized evolutionary algorithms, called multi-objective evolutionary algorithms (MOEAs, e.g., [

14,

15,

16,

17]). MOEAs evolve entire sets of candidate solutions (called populations or archives) and are hence capable of computing finite size approximations of the entire Pareto set/front in one single run of the algorithm. Further, they are of global nature, very robust, and require only minimal assumptions on the model (e.g., no differentiability on the objective or constraint functions). MOEAs have caught the interest of many reseachers and practitioners during the last decades, and have been applied to solve many real-world problems coming from science and engineering. It is also known, however, that none of the existing MOEAs converges in the mathematical sence which indicates that they are not yet tapping their full potential. In [

18], it has been shown that for any strategy where

$\lambda <\mu $ children are chosen from

$\mu $ parents, there is no guarantee for convergence w.r.t. the HV indicator. Studies coming from mathematical programming (MP) indicate similar results for any performance indicator (e.g., [

19,

20]) since

$\lambda <\mu $ strategies in evolutionary algorithms are equivalent to what is called cyclic search in MP.

In this work, we propose the set based Newton method for Problem (

1), where we will address the averaged Hausdorff distance

${\mathsf{\Delta}}_{p}$ as indicator. Since

${\mathsf{\Delta}}_{p}$ is defined via

$GD$ and

$IGD$, we will also consider the respective set based

$GD$ and

$IGD$ Newton methods. To this end, we will first derive the (set based) gradients and Hessians for all indicators, and based on this define and discuss the resulting set based Newton methods for unconstrained MOPs. Numerical results on some benchmark test problems indicate that the method indeed yields local quadratic convergence on the entire set of candidate solutions in certain cases. The Newton methods are tested on aspiration set problems (i.e., the problem to minimize the distance of a set of solutions toward a given utopian reference set

Z and the given unconstrained MOP). Further, we will show how the

${\mathsf{\Delta}}_{p}$ Newton method can be used in a bootstrap manner to compute finite size approximations of the entire Pareto front of a given problem in certain cases. The method can hence in principle be used as standalone algorithm for the treatment of unconstrained MOPs. On the other hand, the results also show that the Newton methods—as all Newton variants—are of local nature and require good initial solutions. In order to obtain a fast and reliable solver a hybridization with a global strategy—e.g., with MOEAs since the proposed Newton methods can be viewed as particular “

$\lambda =\mu $” strategies—seems to be most promising which is, however, beyond the scope of this work.

The remainder of this work is organized as follows: In

Section 2, we will briefly present the required background needed for the understanding of this work. In

Section 3,

Section 4 and

Section 5, we will present and discuss the set based

$GD$,

$IGD$ and

${\mathsf{\Delta}}_{p}$ Newton methods, respectively. Finally, we will draw our conclusions and will give possible paths for future work in

Section 6.

## 2. Background and Related Work

Continuous unconstrained multi-objective optimization problems are expressed as

where

$F:{\mathbb{R}}^{n}\to {\mathbb{R}}^{k}$,

$F\left(x\right)={({f}_{1}\left(x\right),\dots ,{f}_{k}\left(x\right))}^{T}$ denotes the map that is composed of the individual objectives

${f}_{i}:{\mathbb{R}}^{n}\to \mathbb{R}$,

$i=1,\dots ,k$, which are to be minimized simultaneously.

If $k=2$ objectives are considered, the resulting problem is termed a bi-objective optimization problem (BOP).

For the definition of optimality in multi-objective optimization, the notion of dominance is widely used: for two vectors

$a,b\in {\mathbb{R}}^{k}$ we say that

a is

less thanb (in short:

$a{<}_{p}b$), if

${a}_{i}<{b}_{i}$ for all

$i\in \{1,\dots ,k\}$. The definition of

${\le}_{p}$ is analog. Let

$x,y\in {\mathbb{R}}^{n}$, then we say that

x dominates

y (

$x\prec y$) w.r.t (

2) if

$F\left(x\right){\le}_{p}F\left(y\right)$ and

$F\left(x\right)\ne F\left(y\right)$. Else, we say that

y is non-dominated by

x. Now we are in the position to define optimality of a MOP. A point

${x}^{*}\in {\mathbb{R}}^{n}$ is called Pareto optimal (or simply optimal) w.r.t. (

2) if there exists no

$y\in {\mathbb{R}}^{n}$ that dominates

${x}^{*}$. We denote by

P the set of all optimal solutions, also called Pareto set. Its image

$F\left(P\right)$ is called the Pareto front. Under mild conditions on the MOP one can expect that both sets form at least locally objects of dimension

$k-1$ [

6].

The averaged Hausdorff distance

${\mathsf{\Delta}}_{p}$ for discrete or discretized sets is defined as follows: let

$A=\{{a}_{1},\dots ,{a}_{N}\}$ and

$B=\{{b}_{1},\dots ,{b}_{M}\}$, where

$A,B\subset {\mathbb{R}}^{n}$, be finite sets. The values

$G{D}_{p}(A,B)$ and

$IG{D}_{p}(A,B)$ are defined as

where

p is an integer and where the distance of a point

${a}_{i}$ to a set

B is defined by

$dist({a}_{i},B):={min}_{b\in B}\Vert {a}_{i}-b{\Vert}_{2}$. The averaged Hausdorff distance

${\mathsf{\Delta}}_{p}$ is simply the maximum of these two values,

We refer to [

21] for an extension of the indicators to continuous sets. We stress that all of these three indicators are entirely distance based and are in particularly not Pareto compliant. A variant of IGD that is weakly Pareto compliant is the indicator DOA. Here, we are particularly interested in multi-objective reference set problems. That is, given a finite reference set

$Z\subset {\mathbb{R}}^{k}$, we are interested in solving the problem

where I is one of the indicators

$G{D}_{p}$,

$IG{D}_{p}$, or

${\mathsf{\Delta}}_{p}$, and

N is the size of the approximation.

Probably the most important reference set in our context is the Pareto front itself. For this case,

${\mathsf{\Delta}}_{p}$ prefers, roughly speaking, evenly spread solutions along the Pareto front and is hence e.g., in accord with the terms spread and convergence as used in the evolutionary multi-objective optimization (EMO) community for a “suitable” performance indicator. As an example,

Figure 1 shows some “best approximations” in the

${\mathsf{\Delta}}_{2}$ sense (i.e., when using

$p=2$) for MOPs with different shapes of the Pareto front. More precisely, each subfigure shows a fine grain (

$M=200$) approximation of the Pareto front of the underlying problem (using dots), as well as the best approximations in the

${\mathsf{\Delta}}_{2}$ sense (using diamonds). The latter are (numerical) solutions of (

5) for

$N=20$, and where

Z has been chosen as the Pareto front approximation.

If

$A=\{{a}_{1},\dots ,{a}_{N}\}$ is a subset of the

${\mathbb{R}}^{n}$ it means that each of its element

${a}_{i}$ is an element of the

${\mathbb{R}}^{n}$. Hence, the

set $A=\{{a}_{1},\dots ,{a}_{N}\}\subset {\mathbb{R}}^{n}$ can in a natural way also be identified as a point or vector in the higher dimensional space

${\mathbb{R}}^{N\xb7n}$, i.e.,

$A\in {\mathbb{R}}^{N\xb7n}$. That is, the optimization problem (

5) can be identified as a “classical” scalar optimization problem that is defined in

$N\xb7n$-dimensional search space. A necessary condition for optimality is hence given by the Karush–Kuhn–Tucker conditions, e.g., for unconstrained problems we are seeking for sets

A for those the (set based) gradient vanishes. In order to solve this root finding problem, one can e.g., utilize the Newton method. If we are given a performance indicator

I together with the derivatives

$\nabla I\left(A\right)$ and

${\nabla}^{2}I\left(A\right)$ on a set

A, the Newton function is hence given by

There exist many methods for the computation of Pareto optimal solutions. For example, there are mathematical programming (MP) techniques such as scalarization methods that transform the MOP into a sequence of scalar optimization problems (SOPs) [

22,

23,

24,

25,

26]. These methods are very efficient in finding a single solution or even a finite size discretization of the solution set. Another sub-class of the MP techniques is given by continuation-like methods that take advantage of the fact that the Pareto set forms—at least locally—a manifold. Methods of this kind start from a given initial solution and perform a search along the solution manifold [

6,

27,

28,

29,

30,

31,

32,

33].

Next there exist also set oriented methods that are capable of obtaining the entire solution set in a global manner. Examples for the latter are subdivision [

34,

35,

36] and cell mapping techniques [

37,

38,

39]. Another class of set based methods is given by multi-objective evolutionary algorithms (MOEAs) that have proven to be very effective for the treatment of MOPs [

14,

16,

40,

41,

42,

43]. Some reasons for this include that are very robust, do not require hard assumptions on the model, and allow to compute a reasonable finite size representation of the solution set already in a single run.

Methods that deal with single reference points for multi-objective problems can be found in [

26,

44,

45]. The first work that deals with a set based approach using a problem similar to the one in (

5) can be found in [

46], where the authors apply the steepest descent method on the Hypervolume indicator [

47]. In [

48], the Newton method is defined where as well the Hypervolume indicator has been used. In [

49], a multi-objective Newton method is proposed that detects single Pareto optimal solutions for a given MOP. In [

50], a set based Newton method is proposed for general root finding problems and for convex sets.

## 3. GD_{p} Newton Method

In the following sections we will investigate the set based Newton methods for $G{D}_{p}$, $IG{D}_{p}$, and ${\mathsf{\Delta}}_{p}$. More precisely, we will consider the p-th powers, $p>1$, of these indicators as this does not change the optimal solutions. In all cases, we will first derive the (set based) derivatives, and then investigate the resulting Newton method. For the derivatives, we will focus on $p=2$ which is related to the Euclidean norm, and which hence represents the most important performance indicator of the indicator families. However, we will also state the derivatives for general integers p.

Let

$A=\{{a}_{1},\dots ,{a}_{N}\}\subset {\mathbb{R}}^{n}$ be a candidate set for (

2), and

$Z=\{{z}_{1},\dots ,{z}_{M}\}\subset {\mathbb{R}}^{k}$ be a given reference set. The indicator

$G{D}_{p}$ measures the averaged distance of the image of

A and

Z:

Hereby, we have used the notation

and assume

Z to be fixed for the given problem (and hence, it does not appear as input argument).

#### 3.1. Derivatives of $G{D}_{2}^{2}$

#### 3.1.1. Gradient of $G{D}_{2}^{2}$

In the following, we have to assume that for every point

$F\left({a}_{i}\right)$ there exists exactly one closest element in

Z. That is,

$\forall \phantom{\rule{0.277778em}{0ex}}\phantom{\rule{0.277778em}{0ex}}i=1,\dots ,N$ there exists an index

${j}_{i}\in \{1,\dots ,M\}$ such that:

Otherwise, the gradient of

$G{D}_{p}$ is not defined at

A. If condition (

9) is satisfied, then (

7) can be written as follows:

and for the special case

$p=2$ we obtain

The gradient of

$G{D}_{2}^{2}$ at

A is hence given by

where

$J\left({a}_{i}\right)$ denotes the Jacobian matrix of

F at

${a}_{i}$ for

$i=1,\dots ,N$. We call the vector

the

i-th sub-gradient ( The sub-gradient is defined here as part of the gradient that is associated to an element

a of

A, and is not equal to the notion of the sub-gradient known in non-smooth optimization. ) of

$G{D}_{2}^{2}$ with respect to

${a}_{i}\in A$. Note that the sub-gradients are completely independent of the location of the other archive elements

${a}_{j}\in A$.

If the given MOP is unconstrained, then the first order necessary condition for optimality is that the gradient of

$G{D}_{2}^{2}$ vanishes. This is the case for a set

A if all sub-gradients vanish

This happens if for each ${a}_{i}$ either

- (i)
$F\left({a}_{i}\right)={z}_{{j}_{i}},$ that is, if the image of ${a}_{i}$ is equal to one of the elements of the reference set. This is for instance never the case if Z is chosen utopian.

- (ii)
If

$F\left({a}_{i}\right)\ne {z}_{{j}_{i}}$, we have

for a vector

${\alpha}^{\left(i\right)}\in {\mathbb{R}}^{k}\backslash \left\{0\right\}$. The point

${a}_{i}$ is hence a critical point since

$rank\left(J\left({a}_{i}\right)\right)<k$. Furthermore, if

$F\left({a}_{i}\right)-{z}_{{j}_{i}}{\ge}_{p}0$ (e.g., if

Z is again utopian) then

${a}_{i}$ is even a Karush–Kuhn–Tucker point. See

Figure 2 for a geometrical interpretation of this scenario.

#### 3.1.2. Hessian of $G{D}_{2}^{2}$

We first define the map

$g:{\mathbb{R}}^{n}\to {\mathbb{R}}^{n}$ as

where

${\alpha}^{\left(i\right)}$ is as in (

15). In order to find an expression of the Hessian matrix, we now derive Equation (

16) as follows:

where

Thus, the Hessian matrix of

$G{D}_{2}^{2}$ is

which is a block diagonal matrix.

#### 3.2. Gradient and Hessian for General $p>1$

As mentioned above, we focus here on the special case

$p=2$. The above derivatives, however, can be generalized for

$p>1$ as follows (assuming that

Z is an utopian finite set to avoid problems when

$p<4$): the gradient is given by

and the Hessian by

where

for

$i=1,2,\dots ,N.$#### 3.3. $G{D}_{2}^{2}$-Newton Method

After having derived the gradient and the Hessian we are now in the position to state the set based Newton method for the $G{D}_{2}^{2}$ indicator:

The Newton iteration can in practice be stopped at a set

${A}^{f}$ if

for a given tolerance

$tol>0$. In order to speed up the computations one may proceed due to the structure of the (sub-)gradient as follows: for each element

${a}_{i}$ of a current archive

A with

one can continue the Newton iteration with the smaller set

$\overline{A}=A\backslash \left\{{a}_{i}\right\}$ (and later insert

${a}_{i}$ into the final archive).

We are particularly interested in the regularity of

${\nabla}^{2}G{D}_{2}^{2}$ at the optimal set, i.e., at a set

${A}^{*}$ that solves problem (

5) for

$I=G{D}_{2}^{2}$. This is the case since if the Hessian is regular at

${A}^{*}$—and if the objective function is sufficiently smooth—we can expect the Newton method to converge locally quadratically [

51].

Since the Hessian is a block diagonal matrix it is regular if all of its blocks

are regular. From this we see already that if

Z is not utopian, we cannot expect quadratic convergence: assume that one point

$z\in Z$ is feasible, i.e., that there exists one

$x\in Q$ such that

$F\left(x\right)=z$. We can assume that

x is also a member of the optimal set

${A}^{*}$, say

${a}_{i}=x$. Then, we have that the weight vector

${\alpha}^{\left(i\right)}$ is zero, and hence that

${W}_{{\alpha}^{\left(i\right)}}={\sum}_{l=1}^{k}{\alpha}_{l}^{\left(i\right)}{\nabla}^{2}{f}_{l}\left({a}_{i}\right)=0$. Thus, the block matrix reduces to

$J{\left({a}_{i}\right)}^{T}J\left({a}_{i}\right)$ those rank is at most

k. The block matrix is hence singular, and so is the Hessian of

$G{D}_{2}^{2}$ at

${A}^{*}$.

In the case all individual objectives are strictly convex, the $G{D}_{2}^{2}$ Hessian is positive definite (and hence regular) at every feasible set A, and we can hence expect local quadratic convergence.

**Proposition** **1.** Let a MOP of the form (2) be given whose individual objectives are strictly convex, and let Z be a discrete utopian set. Then, the matrix ${\nabla}^{2}G{D}_{2}^{2}\left(A\right)$ is positive definite for all feasible sets A. **Proof.** Since

${\nabla}^{2}G{D}_{2}^{2}\left(A\right)$ is block diagonal, it is sufficient to consider the block matrices

$J{\left({a}_{i}\right)}^{T}J\left({a}_{i}\right)+{W}_{{\alpha}^{\left(i\right)}}\left({a}_{i}\right),\phantom{\rule{1.em}{0ex}}i=1,\dots ,N.$ Let

$i\in \{1,\dots ,N\}$. Since

Z is utopian, it is

${\alpha}^{\left(i\right)}\ne 0$, and all of its elements are non-negative. Further, since all individual objectives

${f}_{l}$ are strictly convex, the matrices

${\nabla}^{2}{f}_{l}\left({a}_{i}\right)$ are positive definite, and hence also the matrix

${W}_{\alpha}\left({a}_{i}\right)$. Since

${J}^{T}\left({a}_{i}\right)J\left({a}_{i}\right)$ is positive semi-definite, we have for all

$x\in {\mathbb{R}}^{n}\backslash \left\{0\right\}$
since

${x}^{T}J{\left({a}_{i}\right)}^{T}J\left({a}_{i}\right)x\ge 0$ and

${x}^{T}{W}_{{\alpha}^{\left(i\right)}}x>0$. Therefore, each

$\mathcal{D}g\left({a}_{i}\right)$,

$i=1,\dots ,N$, is positive definite and hence also the matrix

${\nabla}^{2}G{D}_{2}^{2}\left(A\right)$. □

#### 3.4. Example

We consider the following convex bi-objective problem

Figure 3 shows the Pareto front of this problem together with the reference set

Z that contains 30 elements (black dots). The set

Z is a discretization of the convex hull of individual minima (CHIM, [

23]) of the problem that has been shifted left down. Further, it shows the images of the Newton steps of an initial set

${A}_{0}$ that contains 21 elements. As it can be seen, all images converge toward three solutions that are placed in the middle of the Pareto front (which is owed to the fact that

Z is discrete. If

Z would be continuous, all images would converge toward one solution). This example already shows that the

$G{D}_{2}^{2}$ Newton method is of restricted interest as standalone algorithm. The method will, however, become important as part of the

${\mathsf{\Delta}}_{p}$-Newton method as it will become apparent later on.

Table 1 shows the respective

$G{D}_{2}^{2}$ values plus the norms of the gradients which indicate quadratic convergence. The second column indicates that the images of the archives converge toward the Pareto front as anticipated.